Histograms  Practice Problems
Solving Earth science problems with data binning
This module is available for public use, but it is undergoing revision after classroom implementation with the Math Your Earth Science Majors Need project.
Creating histograms by hand
Problem 1: In fluvial geomorphology, it is frequently important to understand what is the dominant sediment size in a river bed, so in the field, students might do pebble counts. Using the data below collected at Mill Creek in the San Bernardino Mountains, CA, calculate three different histograms about pebble counts at three transects on this creek.
Figure X: Geology students conducting a pebble count on Mill Creek, CA.
Provenance: FreddiJo Bruschke, California State UniversityFullerton
Reuse: This item is offered under a Creative Commons AttributionNonCommercialShareAlike license http://creativecommons.org/licenses/byncsa/3.0/ You may reuse this item for noncommercial purposes as long as you provide attribution and offer any derivative works under a similar license.

Size (cm) 
Transect 1 
Transect 2 
Transect 3 

9.6 cm 12.8 cm 
17 
5 
14 
Cobbles 
12.8 cm 19.2 cm 
9 
20 
11 

19.2 cm  25.6 cm 
10 
14 
5 

25.6 cm  38.4 cm 
6 
4 
4 
Boulders 
38.4 cm  51.2 cm 
2 
3 
6 

51.2 cm  102.4 cm 
0 
0 
1 
Can you draw a histogram for each transect and determine what is the most common grain size in each transect given the table of pebble counts?
 Determine the bin size for your histogram text
In pebble counts, the bins are predefined by the sizes measured, so you can just use each row of size as a bin (i.e. 9.612.8 cm,. 12.819.2 cm, 25.638.4 cm, etc).
 Determine the count of values in each bin
Because the data are prebinned, we already have the counts in each size bin. So if we plan to use the six bins found in step 1, then we can use the values already calculated, i.e. for Transect 1, the largest grain size (38.451.2 cm) has a count of 2.
 Plot the histogram for each transect
Now create three graphs with the same xaxis with the bins labeled from 9.6 cm to 102.4 cm. The yaxes should range from zero to the maximum value in your dataset (~20). For transect 1, make a bar for each row where the height of the bar (Figure 1). Repeat this for each transect.
Figure 1: Hand drawn histograms of 3 transects of pebble counts for a stream in San Bernadino Mts.
Provenance: Alejandra Ortiz, Colby College
Reuse: This item is offered under a Creative Commons AttributionNonCommercialShareAlike license http://creativecommons.org/licenses/byncsa/3.0/ You may reuse this item for noncommercial purposes as long as you provide attribution and offer any derivative works under a similar license.
 Calculate the modal grain size for each transect.
Remember the mode is defined as the value that occurs most often in the dataset. For histograms, we look at which bar is the highest (or has the largest frequency of occurrence or counts) and then read the
xvalue (bin) for the mode. In this case for Transect 1 the most common sediment is the smallest size of cobbles 9.6  12.8 cm. For Transect 2, it is medium sized cobbles from 12.819.2 cm, and for Transect 3, it is again the smallest size of cobbles (9.612.8 cm) same as transect 1 (Figure 2).
Figure 2: Modal grain size for each transect
Provenance: Alejandra Ortiz, Colby College
Reuse: This item is offered under a Creative Commons AttributionNonCommercialShareAlike license http://creativecommons.org/licenses/byncsa/3.0/ You may reuse this item for noncommercial purposes as long as you provide attribution and offer any derivative works under a similar license.
Problem 2: Sedimentologists use grain size distributions to help identify the possible origins of sediment samples. Using the sieve data below, draw a histogram for each sample and decide which sediment sample has the smallest average grain size?

Grain Size (mm) 
Sample 1 
Sample 2 
Sample 3 
Sample 4 
Sample 5 

> 2 mm 
5.00 
0.03 
2.86 
5.00 
0.00 

12 mm 
2.00 
39.98 
25.47 
5.00 
1.00 

0.5  1 mm 
1.92 
11.63 
20.40 
27.27 
6.45 

0.25  0.5 mm 
43.77 
10.00 
4.38 
0.04 
41.74 

0.125  0.25 mm 
4.16 
0.11 
0.23 
19.38 
8.00 

.0630.125 mm 
1.00 
0.00 
0.02 
1.69 
5.00 








Total Sample Mass (g) 
57.85 
61.75 
53.36 
58.37 
62.19 
 Determine the bin size for your histogram text
In sediment sieves, the bins are predefined by the sizes of sieves used, so you can just use each row of grain size as a bin (i.e. >2 mm,. 12 mm, 0.5  1 mm, etc).
 Determine the frequency of values in each bin
Now the data are prebinned by mass, however, as each total sample mass varies, we need to normalize our data and plot by frequency of occurrence. To do this, you want to take each measurement of mass per sieve size and divide it by the total sample mass (the last row). Your table should look like the one below. Now we will plot our yvalues based on these % mass of the sample (you can also think about it as the frequency of occurrence in sample 1 that the sediment is X% percent in a specific grain size).
Percent Mass 







Grain Size (mm) 
Sample 1 
Sample 2 
Sample 3 
Sample 4 
Sample 5 

> 2 mm 
8.64 
0.05 
5.35 
8.57 
0.00 

12 mm 
3.46 
64.74 
47.74 
8.57 
1.61 

0.5  1 mm 
3.32 
18.84 
38.23 
46.71 
10.37 

0.25  0.5 mm 
75.66 
16.19 
8.21 
0.06 
67.12 

0.125  0.25 mm 
7.19 
0.18 
0.43 
33.20 
12.86 

.0630.125 mm 
1.73 
0.00 
0.04 
2.89 
8.04 







Total Mass 

100 
100 
100 
100 
100 
 Plot the histogram for each transect
Now create five graphs with the same xaxis with the bins labeled from 0.063 mm to 2 mm. The yaxes should range from 0 to 100. For sample 1, make a bar for each row where the height of the bar (Figure XX). Repeat this for each sample.
Handdrawn histograms of 5 sieved sediment samples
Provenance: Alejandra Ortiz, Colby College
Reuse: This item is offered under a Creative Commons AttributionNonCommercialShareAlike license http://creativecommons.org/licenses/byncsa/3.0/ You may reuse this item for noncommercial purposes as long as you provide attribution and offer any derivative works under a similar license.
 Determine the center of the dataset (the average)
For these histograms, we know that the smallest sediment size is on the left side of the xaxis and increases in grain size moving right. So we are looking for a histogram with most of the %mass on the left. Sample 1 clearly has the smallest average size (with the mode of 0.250.5 mm)
Creating histograms in Excel
Problem 3: Geologists can classify volcanic eruptions based on the VEI, volcanic explosivity index, a way to measure the relative explosiveness of volcanic eruptions. It measures how much volcanic material is ejected, the height of the material thrown into the atmosphere, and how long the eruptions last. The scale is logarithmic, or based on 10; therefore, an increase of 1 on the scale indicates an eruption 10 times more powerful than the number before it on the scale.
Using the attached excel file VEI_1600_2023_AllData.xlsx (Excel 2007 (.xlsx) 41kB Jun7 23), find the modal elevation of all the volcanic eruptions recorded in the dataset. Also, is the VEI right or leftskewed? Given the skew of the VEI data, do we expect the mean to be larger or smaller than the median?
 Open the file in excel
Download the excel file and save it to your computer. Double click on the excel file to open it.
 Create a histogram of the volcano elevations
Select the elevation column, then click insert > choose the histogram chart type.
 Edit the histogram for readability
For Macs: Right click the histogram chart on a bar, select the
format data series. In the options, switch from
auto bins to
bin width and input a reasonable bin width (perhaps 500 m) and use the overflow and underflow bins to clean up the image.
For PCs: Click on the plus sign on the upper right hand corner of the new plot to open plot options, then select axes, then more axis options.
Histogram of Elevation of volcanos
Provenance: Alejandra Ortiz, Colby College
Reuse: This item is offered under a Creative Commons AttributionNonCommercialShareAlike license http://creativecommons.org/licenses/byncsa/3.0/ You may reuse this item for noncommercial purposes as long as you provide attribution and offer any derivative works under a similar license.
 Find the modal elevation.
Look for the largest peak (the tallest bar) on the histogram, then read down to the xaxis and find the elevation range. If you use a bin width of 500 m, the modal range is 1,500  2,000 m, with 120 volcanoes in that bin.
 Create a new histogram of the VEI values
Select the VEI column, then click insert > choose the histogram chart type.
VEI histogram
Provenance: Alejandra Ortiz, Colby College
Reuse: This item is offered under a Creative Commons AttributionNonCommercialShareAlike license http://creativecommons.org/licenses/byncsa/3.0/ You may reuse this item for noncommercial purposes as long as you provide attribution and offer any derivative works under a similar license.
 Edit the VEI value histogram for readability
For Macs: Right click the histogram chart on a bar, select the format data series. In the options, switch from auto bins to bin width and input a reasonable bin width (1) and use the overflow and underflow bins to clean up the image. For PCs: Click on the plus sign on the upper right hand corner of the new plot to open plot options, then select axes, then more axis options.
 Interpret the histogram
Looking at the graph of VEI, we see that the graph is unimodal (one dominant peak at 2), but the data are not symmetrically distributed around this peak. There are more values (a longer tail) on right side of the peak. So these data are rightskewed.
 Descriptive Statistics
In our histogram, we can clearly see that the the mode is 2 for VEI, but we know that for right skewed histograms, the median is smaller than the mean, because the mean value will be "pulled" up by the extra larger VEI values that skews our average.
Problem 4: Through automated analyses of satellite Landsat imagery, the size of over 1500 atoll islands was collected and saved as an excel file. When we look at the atoll island width, are the data uniform or skewed? Can you tell if the median or the mean atoll island width is the smaller value based on the histogram? AtollIslands_ACO_Landsat.xlsx (Excel 2007 (.xlsx) 75kB Jun7 23)
 Open the file in excel
Download the excel file and save it to your computer. Double click on the excel file to open it.
 Create a histogram of the atoll island width
Select the atoll island width column, then click insert > choose the histogram chart type.
 Edit the histogram for readability
For Macs: Rightclick the histogram chart on a bar, select the
format data series. In the options, switch from
auto bins to
bin width and input a reasonable bin width (perhaps 25 m) and use the overflow and underflow bins to clean up the image.
For PCs: Click on the plus sign on the upper righthand corner of the new plot to open plot options, then select axes, then more axis options.
Histogram of Atoll Island Width
Provenance: Alejandra Ortiz, Colby College
Reuse: This item is offered under a Creative Commons AttributionNonCommercialShareAlike license http://creativecommons.org/licenses/byncsa/3.0/ You may reuse this item for noncommercial purposes as long as you provide attribution and offer any derivative works under a similar license.
 Interpret the histogram
Looking at the graph of the atoll island width, we see that the graph is unimodal (one dominant peak at 2), but the data are not symmetrically distributed around this peak. There are more values (a longer tail) on the right side of the peak. So these data are rightskewed.
 Descriptive Statistics
In our histogram, we can clearly see that the mode is 170196 m for atoll island width, but we know that for right skewed histograms, the median is smaller than the mean, because the mean value will be "pulled" up by the larger atoll island width values that skew our average.
Reading histograms
Problem 5: In the early 2000s, hydraulic fracturing became a common method to retrieve fossil fuels "trapped" in rocks like shale. One concern about this practice was the potential to induce earthquakes. Examine and compare the two histograms graphically displaying the magnitude of earthquakes in Oklahoma before and after 2008.
Earthquakes from 19842008
Provenance: IRIS/SAGE https://www.iris.edu/hq/inclass/lesson/can_humans_cause_earthquakes
Reuse: This item is in the public domain and maybe reused freely without restriction.
Histogram of earthquake magnitude in Oklahoma 20092017
Provenance: IRIS/SAGE https://www.iris.edu/hq/inclass/lesson/can_humans_cause_earthquakes
Reuse: This item is in the public domain and maybe reused freely without restriction.
 What is the shape of the distribution of data? Is it symmetric, skewed, uniform, or bimodal?
Do you see one or two "highest" bars? Are the data distributed evenly on each side of the highest bars?
Each of these histograms is unimodal (one highest value). The pre2008 histogram shows earthquake magnitudes skewed toward the highest values (right skewed), but this is likely because earthquakes below magnitude 2.5 are not presented in the data set. The post2008 histogram shows earthquake magnitudes distributed symmetrically about the most frequent value. Note that you would expect the distribution of earthquake magnitudes to be skewed as the frequency of earthquakes should decrease as the magnitude increases.
 Where is the center of the data? What is the average value of the data?
The median (center) of the data is found on the histogram by determining how many data points you have and locating the bin value of the data point in the middle.
 What is the most common earthquake magnitude before 2008 and after 2008?
Identify the largest bars on each histogram
Prefracking there were ~72 earthquakes of magnitude 2.53; after 2008 there were over 3000 earthquakes of magnitude 2.53. You can see that magnitude 2.53 earthquakes are the most common in both time periods (the mode), but the number of earthquakes has increased significantly.
 What is the spread of the data? What is the range of earthquake magnitudes before and after 2008?
Identify the highest and lowest value of each histogram.
Before 2008, earthquake magnitudes ranged from 2.5 to 4.5. After 2008, earthquake magnitudes ranged from 0.5 to 6. Although they are not as frequent as the lower magnitude earthquakes, the larger earthquakes began occurring only after 2008.
Problem 6: Below, we have two histograms of the measured pH of coalmine discharges in Pennsylvania. How would you describe the shape of these histograms?
USGS report on histogram of pH measurements on discharge from coal mines in PE
Provenance: USGS https://pubs.usgs.gov/wri/1999/4018c/report.pdf
Reuse: This item is in the public domain and maybe reused freely without restriction.
Coal Mine drainage into Shamokin Creek
Provenance: By Jakec  Own work, CC BYSA 4.0, https://commons.wikimedia.org/w/index.php?curid=42792987
Reuse: This item is offered under a Creative Commons AttributionNonCommercialShareAlike license http://creativecommons.org/licenses/byncsa/3.0/ You may reuse this item for noncommercial purposes as long as you provide attribution and offer any derivative works under a similar license.
 Is it a uniform distribution?
do the bars all have roughly the same height or is there variation for each bin in the frequency (yaxis)? In this case, we can see that there are clear variations from 020% frequency across all the pH values shown, so it is NOT a uniform distribution.
 Is it unimodal, bimodal, or something else?
How many peaks do you see? In both graph A and B, we see two distinct peaks. We would call this bimodal
 What is the skew?
How are the data distributed around the peaks? In this case, it seems like it's about an even distribution of data on either end of both peaks in both graphs, so this is a roughly symmetric graph.
 What is the most common measured pH? Find the mode of each histogram
Remember the mode is the most repeated value in the dataset, on a histogram that is the bin with the tallest bar. In this case, we can see that the anthracite coalfield (A) has a mode of 3.253.75 pH and the surface coal mines (B) has a mode of 6.256.75 pH.
Histograms of first 4 bands of Landsat 7 image.
Provenance: EGU journal  CC 4.0 license https://npg.copernicus.org/articles/24/141/2017/
Reuse: This item is offered under a Creative Commons AttributionNonCommercialShareAlike license http://creativecommons.org/licenses/byncsa/3.0/ You may reuse this item for noncommercial purposes as long as you provide attribution and offer any derivative works under a similar license.
Problem 7: Scientists use satellite data to classify land cover and track changes to earth's landscape at the global scale. Histograms are a useful tool to automatically classify land cover to highlight changes in a landscape (say post fire or flood). In this Landsat 7 image, which histogram is unimodal? Which histogram do you expect to have the smallest mode?
 Is the histogram for each band uniform or are there distinct peaks?
Check to see for each histogram how many peaks there are. Only the nearinfrared band (the bottom histogram) has a single dominant peak, all the other bands (blue, green, and red) are bimodal or multimodal.
 Find the mode of each histogram
Remember the mode is the most repeated value in the dataset, on a histogram that is the bin with the tallest bar. In this case, we can see that the unimodal nearinfrared band has the smallest mode.
Hypsometric interval of Earth
Provenance: FreddiJo Bruschke, California State UniversityFullerton
Reuse: This item is in the public domain and maybe reused freely without restriction.
Problem 8: The hypsometric interval of the entire Earth presents the percentage of the Earth's surface in a given elevation range. You can see in this histogram the percentage of the Earth's surface above (positive values) and below (negative values) mean sea level, as well as the distribution of elevations. Note that this histogram displays the
percent of values in each bin, rather than the count.
 What is the shape of this distribution? Is this a uniform distribution? Is there a single most likely value? Are the data skewed?
If you're having trouble visualizing the shape of the histogram, trace an outline of the histogram by connecting the top of each bar. There should be two peaks visible, so we know this histogram is bimodal and it is not uniform. The single, most likely value, or mode, is found at the tallest peak (4.5 km). By looking at the distribution, the histogram is skewed slightly to the right (it is easier to see this if you outline your histogram!)
 Can you estimate the median elevation of the Earth? The average of the data?
The median elevation of the Earth is 1 km (or 1 km below sea level). The average of the data is likely similar and also close to 1 km. You may notice that the median and mean are not close to either peak. An alternative approach to this problem would be to separate the bimodal distributions into two separate unimodal distributions and then estimate the median and mean of each unimodal distribution. With this approach, the median and mean for each unimodal distribution would be a similar value to the mode for each unimodal distribution.
 What is the most common value (mode)?
Following similar logic to the median and mode in a bimodal histogram (above), we report the mode of each peak. The mode is the most repeated value in the data set, or the tallest bar. Here, the modes can be found at 4.5 km and 0.5 km. If strictly considering the whole histogram, the mode is 4.5 km.
 What is the range of elevations on the Earth?
The range is found by subtracting the lowest number on the xaxis of the histogram from the highest number. Here, the highest number is 4.5 km and the lowest is 6.5 km. Thus, the range is found by the following equation: 4.5 km  (6.5 km) = 11 km.
Bulk heat in place in three different geologic units.
Provenance: TMYN  Majors Project, Adapted from Figure 6 in Probabilistic Assessment of Geothermal Resource Bases(Wellmann,Poh, &amp; RegenauerLieb. 2012.)
Reuse: This item is offered under a Creative Commons AttributionNonCommercialShareAlike license http://creativecommons.org/licenses/byncsa/3.0/ You may reuse this item for noncommercial purposes as long as you provide attribution and offer any derivative works under a similar license.
Problem 9: These three histograms display the bulk heat in place of three different geologic units (pink is the Early Permian, orange is the Lesueur Sandstone, and blue is the Yarragadee). The stored heat energy in these units is potentially recoverable as geothermal energy is measured in petajoules. One petajoule is equal to 278 million kilowatthours of energy! By comparing the histograms, you may characterize the differences in each unit. Note: 2.6E+5 =
`2.6xx10^5`.
 Compare the distribution of values in each unit. Are they the same shape? If they are skewed, are they skewed in the same direction?
Each of these three histograms are unimodal, however they differ in skew. The first (a) is skewed to the left, (b) is skewed to the right, and (c) is symmetric.
 If you wanted to use the median as a single value to represent each unit, would you expect the medians to be the same or different?
We would expect the medians to be different due to this difference in skew. While the median is typically the same value as the mode in a symmetric distribution. Histograms that skew right have medians that fall to the right of the mode, and histograms that skew left have medians that fall to the left of the mode.
 What is the mode of each unit? Are they the same?
The mode is the tallest bar of each histogram. For (a), this is `2.6xx10^5`, for (b) it is `6xx10^4` and for (c) it is `2xx10^5`. They are not the same.
 Is the spread of the data the same for each geologic unit or different? What is the range?
The spread is visualized by the xaxis. We can see that the spread is different for each. The range is calculated by taking the smallest value on the xaxis and subtracting this from the greatest value. For (a) the greatest value is `2.9xx10^5` and the smallest value is `2.2xx10^5`. We can calculate the range for (a) using the following equation: `2.9xx10^5  2.2xx10^5 = 0.7xx10^5`. We can repeat this process for (b): `1.4xx10^5  2xx10^4 = 1.2xx10^5`. We do this once more for (c): `3xx10^5  1xx10^5 = 2xx10^5`.
Next Steps
TAKE THE QUIZ!!
I think I'm competent with histograms and I am ready to take the quiz! This link takes you to WAMAP. If your instructor has not given you instructions about WAMAP, you may not have to take the quiz.Or you can go back to the Histogram explanation page.