Engineering Computations Module 2: Take off with stats
online resourceposted on 05.12.2017, 23:18 by Lorena A. BarbaLorena A. Barba, Natalia C. ClementiNatalia C. Clementi
—Original material written as Jupyter Notebooks for an undergraduate engineering course, Fall 2017
Module 2: Take off with stats
The first module of this course, "Get data off the ground," assumed no coding experience and created a foundation with Python programming constructs and data structures. You learned to play with strings, lists and NumPy arrays, using indexing, slicing, for- and if-statements, and functions. The second course module explores practical statistical analysis with Python.
Lesson 1: Cheers! Stats with beers
Exploratory analysis using a data set of canned craft beers in the US. Introduces the pandaslibrary and its data types: Data Frames and Series. Use pandas to read a data file, extract selected columns, and remove null values. Descriptive statistics: measures of central tendency and variability. Distribution plots: histograms with Matplotlib. Comparing with a normal distribution.
Lesson 2: Seeing stats in a new light
Continuing with the data set of canned craft beers, this lesson focuses on visualizing statistics. For quantitative data: histograms and box plots; for categorical data: bar plots. Visualizing multiple data with scatter plots and bubble charts.
Lesson 3: Lead in lipstick
A full worked example using what you learned in lessons 1 and 2: using data from studies by the US Food and Drug Administration on the lead content in lipstick, we fact-check alarming news headlines. Based on Prof. Kristin Sainani's lecture, "Exploring real data: lead in lipstick," of her Stanford Online course "Statistics in Medicine."
Lesson 4: Life expectancy and wealth
Deeper dive into pandas, using data for life expectancy and per-capita income over time, across the world. Inspired by the work of Hans Rosling. Pandas methods: head(), info(), value_counts(), groupby(), describe(), groupby.first(), groupby.get_group(), idxmin() Categorical data type. Bubble plots, spaghetti plots, and interactive widgets.
Note—If you have suggestions for changes or improvements to this material, please open an issue on the GitHub repository.