# Guiding students through histograms

*An instructor's guide to Histograms*

Freddi-Jo Bruschke (California State University Fullerton)

Alejandra Ortiz (Colby College)

## What should students get out of this module?

After completing this module, a student should be able to:

- Construct a histogram with meaningful bin sizes given a set of data
- Plot histograms by hand and using a spreadsheet
- Compare descriptive statistics of two histograms (i.e., is average of histogram A larger or smaller than average of histogram B)
- Identify a unimodal versus bimodal histogram and a left skew versus right skew histogram.

## Why are these math skills challenging to incorporate into courses?

- They do not understand the reason for plotting data and have never experienced the need for visualization of a lot of numbers.
- Often in math classes students are asked to generate points from a function in order to determine what a graph of that function looks like; plotting points from data (which rarely is a perfect function!) may be newer for them than we realize.
- These are generally never constructed by hand, so using software is imperative and can entail a first introduction to the basic software environment before even talking about histograms
- How to make plots can vary depending on software used, and frequently bar graphs may be the only easy option to create
- This maybe a completely new concept - while students may have seen them before, it's rare to have had histograms explained prior to this.
- Understanding the relationships between basic descriptive statistics and histograms (i.e. mean vs. median) can entail understanding what those statistics actually mean
- Realize that when extracting information from histograms, students typically need to look for patterns on the y-axis and then pull out the bin value from the x-axis (the opposite order of operations typically asked of students when interpreting graphs)

## What we don't include in the page?

Although histograms are related to probability distributions, we do not include a strong or in-depth discussion of probability nor of the wide range of distribution types. We do briefly touch upon unimodal vs. bi-modal distributions and the role of skew and symmetry in normal distributions. We have provided links to more in-depth discussions of different probability distributions. If the instructor desired to talk more about probability, we would recommend discussing how histograms can be transformed from counts to frequency of occurrence. This would also naturally segue into probability density functions and work well with the advanced section including cumulative distributions.

We do not discuss frequency analysis of categorical data such as composition of sediment plotted as a bar graph, nor do we explicitly discuss the difference between a histogram and bar graph. While we do discuss briefly the impact of varying bin sizing on the visualization of data in a histogram, this is an area that can be very important and involve more sub-field specific conventions. This might extend unevenly sized bins or dealing with outliers that are far beyond the range of the data (aka one point that is an order of magnitude larger than all the other data). This module additionally does not discuss more advanced visualizations of probability such as stem and leaf plots or box and violin plots.

Likewise, we do not include a detailed discussion of graphing basics or definitions of basic statistics. Instructors may refer to the module Introductory Statistics for explanation of mean, median, and standard deviation and Basic Graphing Skills for a more generally discussion of plotting data.

Lastly we do not get into cumulative histograms or circular histograms (rose diagrams). These are very important types of specialized histogram plots in earth and environmental science but beyond the scope of this module. We do feel that the existing examples could be expanded to segue into cumulative histograms rather easily (aka take problem XX and explain how to make cumulative in excel).

## Instructor resources

### Support for teaching this quantitative skill

- The Math You Need, When You Need It: Introductory Statistics
- Khan Academy Histograms
- One or more other resources that can
**help instructors teach**about this topic. SERC collections such as Teaching Quantitative Skills can be a helpful place to look. - This site does a strong description of reading and interpreting histograms
- This Complete Guide to Histograms does a strong job of walking through understanding and creating histograms

### Examples of activities that use this quantitative skill

- In this example, students use histograms to determine the frequency of Old Faithful (the geyser at Yellowstone) eruptions. It uses excel and is part of the Geology in National Parks
- In this lab or in-class exercise, students are introduced to probability and statistics by plotting the weights of pennies based on different ages (aka pennies from 1960s-1981 have a different weight than modern pennies). They plot the histograms by hand.
- In this activity, students use Carbon Dioxide data from the DOE to explore the role of different visualization methods in excel (i.e. histogram vs. pie chart) and track CO2 emissions from different countries.
- In this exercise, students use histograms in excel to visualize VOCs from different geographic locations in the US.
- In this advanced exercise, students use MatLab to plot the organic content of sediment data, fit curves to the data, and use statistical tests (i.e. T-Test) to interpret the results.
- In this complex exercise, students are introduced to using MatLab to visualize Pokemon Go data. It includes assessing distribution fits, descriptive statistics, and normalizing data
- In this advanced example, students use histograms in excel to visualize the macroevolution of Cambrian Fauna in excel.