Guiding Students through Histograms
An Instructor's Guide to Histograms
Freddi-Jo Bruschke, California State University Fullerton
Alejandra Ortiz, Colby College
What Should Students Get out of this Module?
After completing this module, a student should be able to:
- Construct a histogram with meaningful bin sizes from a given set of data
- Plot histograms by hand and using a spreadsheet
- Compare descriptive statistics of two histograms (i.e., is the average of histogram A larger or smaller than the average of histogram B)
- Identify a unimodal versus bimodal histogram and a left-skew versus right-skew histogram.
Why Are These Math Skills Challenging to Incorporate into Courses?
- Students do not understand the reason for plotting data and have never experienced the need for visualization of a lot of numbers.
- Often in math classes, students are asked to generate points from a function in order to determine what a graph of that function looks like; plotting points from data (which rarely is a perfect function!) may be newer for them than we realize.
- These are generally never constructed by hand, so using software is imperative and can entail a first introduction to the basic software environment before even talking about histograms.
- How to make plots can vary depending on the software used, and frequently bar graphs may be the only easy option to create.
- This may be a completely new concept. While students may have seen them before, it's rare to have had histograms explained prior to this.
- Understanding the relationships between basic descriptive statistics and histograms (e.g. mean vs. median) can entail understanding what those statistics actually mean.
- Realize that when extracting information from histograms, students typically need to look for patterns on the y-axis and then pull out the bin value from the x-axis (the opposite order of operations typically asked of students when interpreting graphs)
What Don't We Include in the Page?
Although histograms are related to probability distributions, we do not include a strong or in-depth discussion of probability nor of the wide range of distribution types. We do briefly touch upon unimodal vs. bimodal distributions and the role of skew and symmetry in normal distributions. We have provided links to more in-depth discussions of different probability distributions. If the instructor desired to talk more about probability, we would recommend discussing how histograms can be transformed from counts to frequency of occurrence. This would also naturally segue into probability density functions and work well with the advanced section, including cumulative distributions.
We do not discuss frequency analysis of categorical data such as composition of sediment plotted as a bar graph, nor do we explicitly discuss the difference between a histogram and a bar graph. While we do discuss briefly the impact of varying bin sizing on the visualization of data in a histogram, this is an area that can be very important and involves more subfield-specific conventions. This might extend to unevenly sized bins or to dealing with outliers that are far beyond the range of the data (aka one data point that is an order of magnitude larger than all the other data). This module additionally does not discuss more advanced visualizations of probability, such as stem and leaf plots or box and violin plots.
Likewise, we do not include a detailed discussion of graphing basics or definitions of basic statistics. Instructors may refer to the module Introductory Statistics (opens in a new window) for explanation of mean, median, and standard deviation and Basic Graphing Skills(opens in a new window) for a more general discussion of plotting data.
Lastly, we do not get into cumulative histograms or circular histograms (rose diagrams). These are very important types of specialized histogram plots in Earth and environmental science but are beyond the scope of this module. We do feel that the existing examples could be expanded rather easily to segue into cumulative histograms (aka take problem XX and explain how to make cumulative in Excel).
Instructor Resources
Support for teaching this quantitative skill
- The Math You Need, When You Need It: Introductory Statistics
- Khan Academy Histograms
- One or more other resources that can help instructors teach about this topic. SERC collections such as Teaching Quantitative Skills can be a helpful place to look.
- This site from labxchange.org has a strong description of reading and interpreting histograms.
- This Complete Guide to Histograms does a strong job of walking through understanding and creating histograms.
Examples of activities that use this quantitative skill
- In this example, students use histograms to determine the frequency of Old Faithful (the geyser at Yellowstone) eruptions. It uses Excel and is part of the Geology of National Parks modules.
- In this lab or in-class exercise, students are introduced to probability and statistics by plotting the weights of pennies based on different ages (aka pennies from 1960s to 1981 have a different weight than modern pennies). They plot the histograms by hand.
- In this activity, students use carbon dioxide data from the DOE to explore the role of different visualization methods in Excel (i.e. histogram vs. pie chart) and track CO2 emissions from different countries.
- In this exercise, students use histograms in Excel to visualize VOCs from different geographic locations in the United States.
- In this advanced exercise, students use MATLAB to plot the organic content of sediment data, fit curves to the data, and use statistical tests (i.e. T-Test) to interpret the results.
- In this complex exercise, students are introduced to using MATLAB to visualize Pokemon Go data. It includes assessing distribution fits, descriptive statistics, and normalizing data.
- In this advanced example, students use histograms in Excel to visualize the macroevolution of Cambrian Fauna in Excel.