Engineering_Computations_M2.pdf (1.87 MB)
Download file

Engineering Computations Module 2: Take off with stats

Download (1.87 MB)
online resource
posted on 05.12.2017, 23:18 by Lorena A. BarbaLorena A. Barba, Natalia C. ClementiNatalia C. Clementi
Engineering Computations
—Original material written as Jupyter Notebooks for an undergraduate engineering course, Fall 2017

Module 2: Take off with stats

The first module of this course, "Get data off the ground," assumed no coding experience and created a foundation with Python programming constructs and data structures. You learned to play with strings, lists and NumPy arrays, using indexing, slicing, for- and if-statements, and functions. The second course module explores practical statistical analysis with Python.

Lesson 1: Cheers! Stats with beers

Exploratory analysis using a data set of canned craft beers in the US. Introduces the pandaslibrary and its data types: Data Frames and Series. Use pandas to read a data file, extract selected columns, and remove null values. Descriptive statistics: measures of central tendency and variability. Distribution plots: histograms with Matplotlib. Comparing with a normal distribution.

Lesson 2: Seeing stats in a new light

Continuing with the data set of canned craft beers, this lesson focuses on visualizing statistics. For quantitative data: histograms and box plots; for categorical data: bar plots. Visualizing multiple data with scatter plots and bubble charts.

Lesson 3: Lead in lipstick

A full worked example using what you learned in lessons 1 and 2: using data from studies by the US Food and Drug Administration on the lead content in lipstick, we fact-check alarming news headlines. Based on Prof. Kristin Sainani's lecture, "Exploring real data: lead in lipstick," of her Stanford Online course "Statistics in Medicine."

Lesson 4: Life expectancy and wealth

Deeper dive into pandas, using data for life expectancy and per-capita income over time, across the world. Inspired by the work of Hans Rosling. Pandas methods: head(), info(), value_counts(), groupby(), describe(), groupby.first(), groupby.get_group(), idxmin() Categorical data type. Bubble plots, spaghetti plots, and interactive widgets.


Note—If you have suggestions for changes or improvements to this material, please open an issue on the GitHub repository.


NSF Award OAC #1730170