Initial Publication Date: August 11, 2023
DOI | Cite this

Histograms - Practice Problems
Solving Earth science problems with data binning

This module is available for public use, but it is undergoing revision after classroom implementation with the Math Your Earth Science Majors Need project.

Creating histograms by hand

Problem 1: In fluvial geomorphology, it is frequently important to understand what is the dominant sediment size in a river bed, so in the field, students might do pebble counts. Using the data below collected at Mill Creek in the San Bernardino Mountains, CA, calculate three different histograms about pebble counts at three transects on this creek.

	Size (cm)	Transect 1	Transect 2	Transect 3
	9.6 cm -12.8 cm	17	5	14
Cobbles	12.8 cm -19.2 cm	9	20	11
	19.2 cm - 25.6 cm	10	14	5
	25.6 cm - 38.4 cm	6	4	4
Boulders	38.4 cm - 51.2 cm	2	3	6
	51.2 cm - 102.4 cm	0	0	1

Can you draw a histogram for each transect and determine what is the most common grain size in each transect given the table of pebble counts?

Show me how

Determine the bin size for your histogram text

Show me how
Determine the count of values in each bin

Show me how
Plot the histogram for each transect

Show me how

Now create three graphs with the same x-axis with the bins labeled from 9.6 cm to 102.4 cm. The y-axes should range from zero to the maximum value in your dataset (~20). For transect 1, make a bar for each row where the height of the bar (Figure 1). Repeat this for each transect.

Figure 1: Hand drawn histograms of 3 transects of pebble counts for a stream in San Bernadino Mts.
Provenance: Alejandra Ortiz, Colby College
Reuse: This item is offered under a Creative Commons Attribution-NonCommercial-ShareAlike license http://creativecommons.org/licenses/by-nc-sa/3.0/ You may reuse this item for non-commercial purposes as long as you provide attribution and offer any derivative works under a similar license.
Calculate the modal grain size for each transect.

Show me how

Remember the mode is defined as the value that occurs most often in the dataset. For histograms, we look at which bar is the highest (or has the largest frequency of occurrence or counts) and then read the x-value (bin) for the mode. In this case for Transect 1 the most common sediment is the smallest size of cobbles 9.6 - 12.8 cm. For Transect 2, it is medium sized cobbles from 12.8-19.2 cm, and for Transect 3, it is again the smallest size of cobbles (9.6-12.8 cm) same as transect 1 (Figure 2).

Figure 2: Modal grain size for each transect
Provenance: Alejandra Ortiz, Colby College
Reuse: This item is offered under a Creative Commons Attribution-NonCommercial-ShareAlike license http://creativecommons.org/licenses/by-nc-sa/3.0/ You may reuse this item for non-commercial purposes as long as you provide attribution and offer any derivative works under a similar license.

Problem 2: Sedimentologists use grain size distributions to help identify the possible origins of sediment samples. Using the sieve data below, draw a histogram for each sample and decide which sediment sample has the smallest average grain size?

Grain Size (mm)	Sample 1	Sample 2	Sample 3	Sample 4	Sample 5
> 2 mm	5.00	0.03	2.86	5.00	0.00
1-2 mm	2.00	39.98	25.47	5.00	1.00
0.5 - 1 mm	1.92	11.63	20.40	27.27	6.45
0.25 - 0.5 mm	43.77	10.00	4.38	0.04	41.74
0.125 - 0.25 mm	4.16	0.11	0.23	19.38	8.00
.063-0.125 mm	1.00	0.00	0.02	1.69	5.00

Total Sample Mass (g)	57.85	61.75	53.36	58.37	62.19

Show me how

Determine the bin size for your histogram text
Show me how
Determine the frequency of values in each bin
Show me how

Now the data are pre-binned by mass, however, as each total sample mass varies, we need to normalize our data and plot by frequency of occurrence. To do this, you want to take each measurement of mass per sieve size and divide it by the total sample mass (the last row). Your table should look like the one below. Now we will plot our y-values based on these % mass of the sample (you can also think about it as the frequency of occurrence in sample 1 that the sediment is X% percent in a specific grain size).

Percent Mass
	Grain Size (mm)	Sample 1	Sample 2	Sample 3	Sample 4	Sample 5
	> 2 mm	8.64	0.05	5.35	8.57	0.00
	1-2 mm	3.46	64.74	47.74	8.57	1.61
	0.5 - 1 mm	3.32	18.84	38.23	46.71	10.37
	0.25 - 0.5 mm	75.66	16.19	8.21	0.06	67.12
	0.125 - 0.25 mm	7.19	0.18	0.43	33.20	12.86
	.063-0.125 mm	1.73	0.00	0.04	2.89	8.04

Total Mass		100	100	100	100	100

Plot the histogram for each transect
Show me how

Now create five graphs with the same x-axis with the bins labeled from 0.063 mm to 2 mm. The y-axes should range from 0 to 100. For sample 1, make a bar for each row where the height of the bar (Figure XX). Repeat this for each sample.

Hand-drawn histograms of 5 sieved sediment samples
Provenance: Alejandra Ortiz, Colby College
Reuse: This item is offered under a Creative Commons Attribution-NonCommercial-ShareAlike license http://creativecommons.org/licenses/by-nc-sa/3.0/ You may reuse this item for non-commercial purposes as long as you provide attribution and offer any derivative works under a similar license.
Determine the center of the dataset (the average)
Show me how

Creating histograms in Excel

Problem 3: Geologists can classify volcanic eruptions based on the VEI, volcanic explosivity index, a way to measure the relative explosiveness of volcanic eruptions. It measures how much volcanic material is ejected, the height of the material thrown into the atmosphere, and how long the eruptions last. The scale is logarithmic, or based on 10; therefore, an increase of 1 on the scale indicates an eruption 10 times more powerful than the number before it on the scale.

Using the attached excel file VEI_1600_2023_AllData.xlsx (Excel 2007 (.xlsx) 41kB Jun7 23), find the modal elevation of all the volcanic eruptions recorded in the dataset. Also, is the VEI right or left-skewed? Given the skew of the VEI data, do we expect the mean to be larger or smaller than the median?

Show me how

Open the file in excel
Show me how
Create a histogram of the volcano elevations
Show me how
Edit the histogram for readability
Show me how

For Macs: Right click the histogram chart on a bar, select the format data series. In the options, switch from auto bins to bin width and input a reasonable bin width (perhaps 500 m) and use the overflow and underflow bins to clean up the image. For PCs: Click on the plus sign on the upper right hand corner of the new plot to open plot options, then select axes, then more axis options.

Histogram of Elevation of volcanos
Provenance: Alejandra Ortiz, Colby College
Reuse: This item is offered under a Creative Commons Attribution-NonCommercial-ShareAlike license http://creativecommons.org/licenses/by-nc-sa/3.0/ You may reuse this item for non-commercial purposes as long as you provide attribution and offer any derivative works under a similar license.
Find the modal elevation.
Show me how
Create a new histogram of the VEI values
Show me how

Select the VEI column, then click insert --> choose the histogram chart type.

VEI histogram
Provenance: Alejandra Ortiz, Colby College
Reuse: This item is offered under a Creative Commons Attribution-NonCommercial-ShareAlike license http://creativecommons.org/licenses/by-nc-sa/3.0/ You may reuse this item for non-commercial purposes as long as you provide attribution and offer any derivative works under a similar license.
Edit the VEI value histogram for readability
Show me how
Interpret the histogram
Show me how
Descriptive Statistics
Show me how

Problem 4: Through automated analyses of satellite Landsat imagery, the size of over 1500 atoll islands was collected and saved as an excel file. When we look at the atoll island width, are the data uniform or skewed? Can you tell if the median or the mean atoll island width is the smaller value based on the histogram? AtollIslands_ACO_Landsat.xlsx (Excel 2007 (.xlsx) 75kB Jun7 23)

Show me how

Open the file in excel
Show me how
Create a histogram of the atoll island width
Show me how
Edit the histogram for readability
Show me how

For Macs: Right-click the histogram chart on a bar, select the format data series. In the options, switch from auto bins to bin width and input a reasonable bin width (perhaps 25 m) and use the overflow and underflow bins to clean up the image. For PCs: Click on the plus sign on the upper right-hand corner of the new plot to open plot options, then select axes, then more axis options.

Histogram of Atoll Island Width
Provenance: Alejandra Ortiz, Colby College
Reuse: This item is offered under a Creative Commons Attribution-NonCommercial-ShareAlike license http://creativecommons.org/licenses/by-nc-sa/3.0/ You may reuse this item for non-commercial purposes as long as you provide attribution and offer any derivative works under a similar license.
Interpret the histogram
Show me how
Descriptive Statistics
Show me how

Reading histograms

Problem 5: In the early 2000s, hydraulic fracturing became a common method to retrieve fossil fuels "trapped" in rocks like shale. One concern about this practice was the potential to induce earthquakes. Examine and compare the two histograms graphically displaying the magnitude of earthquakes in Oklahoma before and after 2008.

What is the shape of the distribution of data? Is it symmetric, skewed, uniform, or bimodal?
Show me how to describe the shape of distribution

Do you see one or two "highest" bars? Are the data distributed evenly on each side of the highest bars?
Show me how

Each of these histograms is unimodal (one highest value). The pre-2008 histogram shows earthquake magnitudes skewed toward the highest values (right skewed), but this is likely because earthquakes below magnitude 2.5 are not presented in the data set. The post-2008 histogram shows earthquake magnitudes distributed symmetrically about the most frequent value. Note that you would expect the distribution of earthquake magnitudes to be skewed as the frequency of earthquakes should decrease as the magnitude increases.
Where is the center of the data? What is the average value of the data?
Show me how
What is the most common earthquake magnitude before 2008 and after 2008?
Show me how to find most common value
What is the spread of the data? What is the range of earthquake magnitudes before and after 2008?
Show me how to find the range

Problem 6: Below, we have two histograms of the measured pH of coal-mine discharges in Pennsylvania. How would you describe the shape of these histograms?

Is it a uniform distribution?
Show me how
Is it unimodal, bi-modal, or something else?
Show me how
What is the skew?
Show me how
What is the most common measured pH? Find the mode of each histogram
Show me how

Problem 7: Scientists use satellite data to classify land cover and track changes to earth's landscape at the global scale. Histograms are a useful tool to automatically classify land cover to highlight changes in a landscape (say post fire or flood). In this Landsat 7 image, which histogram is unimodal? Which histogram do you expect to have the smallest mode?

Is the histogram for each band uniform or are there distinct peaks?
Show me how
Find the mode of each histogram
Show me how

Problem 8: The hypsometric interval of the entire Earth presents the percentage of the Earth's surface in a given elevation range. You can see in this histogram the percentage of the Earth's surface above (positive values) and below (negative values) mean sea level, as well as the distribution of elevations. Note that this histogram displays the percent of values in each bin, rather than the count.

What is the shape of this distribution? Is this a uniform distribution? Is there a single most likely value? Are the data skewed?
Show me how

If you're having trouble visualizing the shape of the histogram, trace an outline of the histogram by connecting the top of each bar. There should be two peaks visible, so we know this histogram is bimodal and it is not uniform. The single, most likely value, or mode, is found at the tallest peak (-4.5 km). By looking at the distribution, the histogram is skewed slightly to the right (it is easier to see this if you outline your histogram!)
Can you estimate the median elevation of the Earth? The average of the data?
Show me how

The median elevation of the Earth is -1 km (or 1 km below sea level). The average of the data is likely similar and also close to -1 km. You may notice that the median and mean are not close to either peak. An alternative approach to this problem would be to separate the bimodal distributions into two separate unimodal distributions and then estimate the median and mean of each unimodal distribution. With this approach, the median and mean for each unimodal distribution would be a similar value to the mode for each unimodal distribution.
What is the most common value (mode)?
Show me how
What is the range of elevations on the Earth?
Show me how

Problem 9: These three histograms display the bulk heat in place of three different geologic units (pink is the Early Permian, orange is the Lesueur Sandstone, and blue is the Yarragadee). The stored heat energy in these units is potentially recoverable as geothermal energy is measured in petajoules. One petajoule is equal to 278 million kilowatt-hours of energy! By comparing the histograms, you may characterize the differences in each unit. Note: 2.6E+5 = `2.6xx10^5`.

Compare the distribution of values in each unit. Are they the same shape? If they are skewed, are they skewed in the same direction?
Show me how
If you wanted to use the median as a single value to represent each unit, would you expect the medians to be the same or different?
Show me how
What is the mode of each unit? Are they the same?
Show me how
Is the spread of the data the same for each geologic unit or different? What is the range?
Show me how

The spread is visualized by the x-axis. We can see that the spread is different for each. The range is calculated by taking the smallest value on the x-axis and subtracting this from the greatest value. For (a) the greatest value is `2.9xx10^5` and the smallest value is `2.2xx10^5`. We can calculate the range for (a) using the following equation: `2.9xx10^5 - 2.2xx10^5 = 0.7xx10^5`. We can repeat this process for (b): `1.4xx10^5 - 2xx10^4 = 1.2xx10^5`. We do this once more for (c): `3xx10^5 - 1xx10^5 = 2xx10^5`.