Histograms - Practice Problems
Solving Earth science problems with data binning

Students, you can download [insert file here] the questions if you would like to work through them on a separate sheet of paper.

Creating histograms by hand

Problem 1: In fluvial geomorphology, frequently it is important to understand what is the dominant sediment size in a river bed, so in the field students might do pebble counts. Using the below data collected at Mill Creek in the San Bernardino Mountains, CA calculate three different histograms about pebble counts at three transects on this creek.

Size (cm) Transect 1 Transect 2 Transect 3
  9.6 cm -12.8 cm 17 5 14
Cobbles 12.8 cm -19.2 cm 9 20 11
  19.2 cm - 25.6 cm 10 14 5
  25.6 cm - 38.4 cm 6 4 4
Boulders 38.4 cm - 51.2 cm 2 3 6
  51.2 cm - 102.4 cm 0 0 1

 

Can you draw a histogram for each transect and determine what is the most common grain size in each transect given the table of pebble counts?

Problem 2: Sedimentologists use grain size distributions to help identify the possible origins of sediment samples. Using the sieve data below, draw a histogram for each sample and decide which sediment sample has the smallest average grain size?

  Grain Size (mm) Sample 1 Sample 2 Sample 3 Sample 4 Sample 5
  > 2 mm 5.00 0.03 2.86 5.00 0.00
  1-2 mm 2.00 39.98 25.47 5.00 1.00
  0.5 - 1 mm 1.92 11.63 20.40 27.27 6.45
  0.25 - 0.5 mm 43.77 10.00 4.38 0.04 41.74
  0.125 - 0.25 mm 4.16 0.11 0.23 19.38 8.00
  .063-0.125 mm 1.00 0.00 0.02 1.69 5.00
             
  Total Sample Mass (g) 57.85 61.75 53.36 58.37 62.19


Creating histograms in Excel

Problem 3: Geologists can classify volcanic eruptions based on the VEI, volcanic explosivity index, a way to measure the relative explosiveness of volcanic eruptions. It measures how much volcanic material is ejected, the height of the material thrown into the atmosphere, and how long the eruptions last. The scale is logarithmic, or based on 10; therefore, an increase of 1 on the scale indicates an eruption 10 times more powerful than the number before it on the scale.

 Using the attached excel file VEI_1600_2023_AllData.xlsx (Excel 2007 (.xlsx) 41kB Jun7 23), find the modal elevation of all the volcanic eruptions recorded in the dataset. Also, is the VEI right or left-skewed? Given the skew of the VEI data, do we expect the mean to be larger or smaller than the median?

Problem 4: Through automated analyses of satellite Landsat imagery, the size of over 1500 atoll islands was collected and saved as an excel file. When we look at the atoll island width, are the data uniform or skewed? Can you tell if the median or the mean atoll island width is the smaller value based on the histogram? AtollIslands_ACO_Landsat.xlsx (Excel 2007 (.xlsx) 75kB Jun7 23)

Reading histograms

 

Problem 5: In the early 2000s, hydraulic fracturing became a common method to retrieve fossil fuels "trapped" in rocks like shale. One concern about this practice was the potential to induce earthquakes. Examine and compare the two histograms graphically displaying magnitude of earthquakes in Oklahoma before and after 2008.


  1. What is the shape of the distribution of data? Is it symmetric, skewed, uniform, or bimodal?
  2. Where is the center of the data? What is the average value of the data?
  3. What is the most common earthquake magnitude before 2008 and after 2008?    
  4. What is the spread of the data? What is the range of earthquake magnitudes before and after 2008?  

 

Problem 6: Below, we have two histograms of measured pH of coal-mine discharges in Pennsylvania. How would you describe the shape of these histograms?


  1. Is it a uniform distribution?
  2. Is it unimodal, bi-modal, or something else?
  3. What is the skew?
  4. Is the histogram for each band uniform or are there distinct peaks?
  5. What is the most common measured pH? Find the mode of each histogram

 

Problem 7: Scientists use satellite data to classify landcover and track changes to earth's landscape at the global scale. Histograms are a useful tool to automatically classify landcover to highlight changes in a landscape (say post fire or flood). In this Landsat 7 image, which histogram is unimodal? Which histogram do you expect to have the smallest mode?


  1. Is the histogram for each band uniform or are there distinct peaks?
  2. Find the mode of each histogram

 

Problem 8: The hypsometric interval of the entire Earth presents the percent of the Earth surface in given elevation ranges.


You can see in this histogram the percent of the surface of the Earth above (positive values) and below (negative values) mean sea level as well as the distribution of elevations. Note that this histogram is displaying the percent of values in each bin, rather than the count.

  1. What is the shape of this distribution? Is this a uniform distribution?  Is there a single most likely value? Are the data skewed?
  2. Can you estimate the median elevation of the Earth? The average of the data?
  • What is the most common value (mode)?
  • What is the range of elevations on the Earth?
  •  

    Problem 9: These three histograms display the bulk heat in place of three different geologic units (pink is the Early Permian, orange is the Lesueur Sandstone, and blue is the Yarragadee). The stored heat energy in these units is potentially recoverable as geothermal energy measured in petajoules. One petajoule is equal to 278 million kilowatt hours of energy! By comparing the histograms, you may characterize the differences in each unit.

    1. Compare the distribution of values in each unit. Are they the same shape? If they are skewed, are they skewed in the same direction?
    2. If you wanted to use the median as a single value to represent each unit, would you expect the medians to be the same or different?
    3. What is the mode of each unit? Are they the same?
    4. Is the spread of the data the same for each geologic unit or different? What is the range?


    Next Steps

    If you feel comfortable with this topic, you can go on to the assessment. Or you can go back to the Histogram explanation page.