Introductory Statistics: Practice Problems
Solving Earth Science Problems with Mean, Median, Mode, and Standard Deviation
This module is available for public use, but it is undergoing revision after classroom implementation with the Math You Need project.
Problem 1: Strike and dip measurements along an outcrop (calculating a mean)
The strike and dip of a bedrock unit describe the orientation and inclination.
As a structural geologist, you are tasked with producing a geologic map of an area. You track a long outcrop of shale and take strike and dip measurements in 8 locations. Below is the table from your field notebook. Find the mean strike and the mean dip of the measurements.
| Strike (°) |
Dip (°) |
| 015 |
34 |
| 017 |
39 |
| 020 |
31 |
| 016 |
33 |
| 021 |
37 |
| 018 |
32 |
| 020 |
38 |
| 014 |
33 |
Part 1: Find the mean strike
Answer:
Step 1. Add all the values together (in this example, strike values).
`15 + 17 + 20 + 16 + 21 + 18 + 20 + 14 = 141`
Step 2. Divide the sum by the number of data points.
Step 3. Round (if necessary) and report the result, including correct units.
Since the original measurements were to the nearest degree, we round to the nearest degree, `18°`
To calculate the mean in Google Sheets or in Excel, click on an empty cell and type: =AVERAGE( ) then, move your cursor inside the parentheses. Highlight all the cells with the data, and then hit return. The mean will appear in the cell.
Part 2: Find the mean dip
Step 1. Add all the values together.
`34 + 39 + 31 + 33 + 37 + 32 + 38 + 33 = 277`
Step 2. Divide the sum by the number of data points.
Step 3. Round (if necessary) and report the result, including the original units.
Since the original measurements were to the nearest degree, we round to the nearest degree: `35°`
To calculate the mean in Google Sheets or in Excel, click on an empty cell and type: =AVERAGE( ) then, move your cursor inside the parentheses. Highlight all the cells with the data, and then hit return. The mean will appear in the cell.
Part 3: Report the mean strike and dip for the outcrop.
Answer: The mean strike for the outcrop is `18°` and the mean dip is `35°`
Problem 2: Most common hurricane categories (find the mode)
Hurricanes are classified on a scale of 1–5 based on the Saffir-Simpson Hurricane Wind scale. A "major" hurricane is one that is a Category 3, 4, or 5.
What is the most common major hurricane type (Category 3, 4, or 5) to have made landfall in Florida since 1851?
| Florida major hurricanes |
| Name |
Saffir-Simpson Category |
Year of landfall |
| Great Middle Florida |
3 |
1851 |
| Unnamed |
3 |
1871 |
| Unnamed |
3 |
1873 |
| Unnamed |
3 |
1877 |
| Unnamed |
3 |
1882 |
| Unnamed |
3 |
1888 |
| Unnamed |
3 |
1894 |
| Unnamed |
3 |
1896 |
| Unnamed |
3 |
1906 |
| Unnamed |
3 |
1909 |
| Unnamed |
3 |
1917 |
| Unnamed |
4 |
1919 |
| Unnamed |
4 |
1926 |
| Unnamed |
4 |
1928 |
| Unnamed |
3 |
1933 |
| Unnamed |
5 |
1935 |
| Unnamed |
3 |
1944 |
| Unnamed |
4 |
1945 |
| Unnamed |
4 |
1947 |
| Unnamed |
4 |
1948 |
| Unnamed |
4 |
1949 |
| Easy |
3 |
1950 |
| King |
4 |
1950 |
| Donna |
4 |
1960 |
| Betsy |
3 |
1965 |
| Alma |
3 |
1966 |
| Eloise |
3 |
1975 |
| Elena |
3 |
1985 |
| Andrew |
5 |
1992 |
| Opal |
3 |
1995 |
| Charley |
4 |
2004 |
| Ivan |
3 |
2004 |
| Jeanne |
3 |
2004 |
| Dennis |
3 |
2005 |
| Wilma |
3 |
2005 |
| Irma |
4 |
2017 |
| Michael |
5 |
2018 |
| Ian |
5 |
2022 |
Part 1: Determine the statistic that the question is asking for.
The question asks for the most common hurricane Category, so this question is asking for the mode of the hurricane Category variable.
Part 2: Determine the mode in the second column (hurricane category).
Step 1: Put the data points in order from lowest to highest.
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5
Step 2: Find the value that occurs the most often.
There are 23 Category-3 hurricanes, 11 Category-4 hurricanes, and 4 Category-5 hurricanes.
Here, the mode is Category-3, because there are more Category-3 hurricanes than any other in the dataset. That is, the most common type of major hurricane to make landfall in Florida since 1951 is a Category-3 hurricane.
Problem 3: Snow Water Equivalent values in the Northern Gallatin Range, Montana (calculate mean, standard deviation, median, and interpret)
Snow Water Equivalent (SWE) is a measure of precipitation derived from snow. It is the height of water per unit area if the snow were melted. An automated network of snow monitoring stations, called SNOTEL stations, record SWE in the western United States.
Below are the peak SWE values from the Lick Creek SNOTEL station in Montana for the years 2004–2023.
| Water Year |
Peak SWE (in.) |
| 2023 |
14.5 |
| 2022 |
11.9 |
| 2021 |
11.5 |
| 2020 |
14.2 |
| 2019 |
11.1 |
| 2018 |
15.5 |
| 2017 |
8.9 |
| 2016 |
10.5 |
| 2015 |
8.8 |
| 2014 |
18.5 |
| 2013 |
11.8 |
| 2012 |
11.0 |
| 2011 |
18.8 |
| 2010 |
10.9 |
| 2009 |
14.0 |
| 2008 |
14.1 |
| 2007 |
11.9 |
| 2006 |
10.1 |
| 2005 |
9.6 |
| 2004 |
9.9 |
Part 1: Calculate the mean peak SWE.
Step 1: Add all the values together (in this example, SWE)
`14.5 + 11.9 + 11.5 + 14.2 + 11.1 + 15.5 + 8.9 + 10.5 + 8.8 + 18.5 + 11.8 + 11.0 + 18.8 + 10.9 + 14.0 + 14.1 + 11.9 + 10.1 + 9.6 + 9.9 = 247.5` in
Step 2: Divide the sum by the number of data points.
The sum is calculated in Step 1 and there are 20 measurements, so the mean is `247.5 /20=12.375` in
Step 3: Round (if necessary) and report the result, including the original units.
Since the data are measured to a tenth of an inch, round to that and include units: 12.4 in.
To calculate this in Google Sheets and Microsoft Excel: make a column with the list of SWE values. In a new cell, type: "=AVERAGE( )". Within the parentheses, highlight the list of SWE values, and hit return. The average (mean) value will be produced.
Part 2: Calculate the standard deviation of the SWE data.
standard deviation `= sqrt(((value1-mean)^2+(value2-mean)+(value3-mean)+etc)/(n - 1)`
This is seldom done by hand, as it is simpler and avoids errors to use the "=STDEV( )" formula in Sheets or Excel, but the manual method is shown here.
Step 1. Determine the mean of the data values.
This was done above. Mean SWE = 12.4 in.
Step 2. Calculate the square of the difference from the mean for each data value.
Make a table of the values, determine the difference to the mean from each value, and then square of the differences.
`(value-mean)^2`
| Water Year |
Peak SWE (in) |
Difference from mean |
Square of difference from mean |
| 2023 |
14.5 |
2.13 |
4.5156 |
| 2022 |
11.9 |
-0.48 |
0.2256 |
| 2021 |
11.5 |
-0.88 |
0.7656 |
| 2020 |
14.2 |
1.83 |
3.3306 |
| 2019 |
11.1 |
-1.28 |
1.6256 |
| 2018 |
15.5 |
3.13 |
9.7656 |
| 2017 |
8.9 |
-3.48 |
12.0756 |
| 2016 |
10.5 |
-1.88 |
3.5156 |
| 2015 |
8.8 |
-3.58 |
12.7806 |
| 2014 |
18.5 |
6.13 |
37.5156 |
| 2013 |
11.8 |
-0.57 |
0.3306 |
| 2012 |
11.0 |
-1.38 |
1.8906 |
| 2011 |
18.8 |
6.43 |
41.2806 |
| 2010 |
10.9 |
-1.48 |
2.1756 |
| 2009 |
14.0 |
1.63 |
2.6406 |
| 2008 |
14.1 |
1.73 |
2.9756 |
| 2007 |
11.9 |
-0.48 |
0.2256 |
| 2006 |
10.1 |
-2.28 |
5.1756 |
| 2005 |
9.6 |
-2.78 |
7.7006 |
| 2004 |
9.9 |
-2.48 |
6.1256 |
Step 3. Add all the squared differences obtained in Step 2.
`4.5156 + 0.2256 + 0.7656 + 3.3306 + 1.6256 + 9.7656 + 12.0756 + 3.5156 + 12.7806 + 37.5156 + 0.3306 + 1.8906 + 41.2806`
` + 2.1756 + 2.6406 + 2.9756 + 0.2256 + 5.1756 + 7.7006 + 6.1256 = 156.6375`
Step 4. Divide the sum of the squared differences (from Step 3) by the number of data points minus 1.
Step 5. Take the square root of the result you obtained in Step 4.
Step 6. Round (if necessary) and report the result, including the original units.
Since the data are measured to a tenth of an inch, round to that and include units: s = 2.9 in.
To calculate this in Google Sheets and Microsoft Excel: make a column with the list of SWE values. In a new cell, type: "=STDEV( )". Place your cursor within the parentheses, highlight the list of SWE values, and hit return. The standard deviation value will be produced.
Part 3: Calculate the median SWE.
Step 1: Order the values from lowest to highest.
8.8, 8.9, 9.6, 9.9, 10.1, 10.5, 10.9, 11.0, 11.1, 11.5, 11.8, 11.9, 11.9, 14.0, 14.1, 14.2, 14.5, 15.5, 18.5, 18.8 in
Step 2: Find the value in the center of the ordered values.
Because there are 20 values, the middle is between values #10 (11.5) and #11 (11.8). Take the average of those two values
`(11.5 + 11.8)/2` to get the median.
Median = 11.65 in
To calculate this in Google Sheets and Microsoft Excel: make a column with the list of SWE values. In a new cell, type: "=MEDIAN( )". Place your cursor within the parentheses, highlight the list of SWE values, and hit return. The median value will be produced.
Part 4: Interpret your findings. What does the standard deviation tell us about the spread of the SWE data? Would mean or median be a better measure of the central tendency of the SWE data?
The standard deviation of the SWE data is 2.8 in, which is fairly small compared to both the mean (12.4 in) and the median (11.7 in). This shows that the data points are generally close together and that variability in SWE over the 20 years from 2004 to 2023 is low.
When considering whether mean or median is a better measure of the central point of the data, it would be best to graph the distribution of the data in a
histogram and see if it looks like a normal distribution (with a classic bell-shaped curve). Mean and standard deviation are appropriate to use with normal distributions, while medians are appropriate for skewed or bimodal data. The histogram is shown here. It does not show a normal distribution and is slightly
right-skewed. This suggests we probably want to use median instead of the mean to describe this data. The mean (12.4 in) is slightly higher than the median and thus does not lie in the center of the data points.
Problem 4: Permafrost thaw (calculate mean, standard deviation, median, and interpret)
Permafrost thaw is an issue of great concern in the arctic and subarctic regions of the world. The uppermost layer of soil, called the "active layer," thaws in the summers, but the permafrost below stays frozen. The thickness of the active layer has been increasing with climate warming, as more permafrost thaws and becomes part of the active layer.
Your job is to monitor the depth of the active layer at a monitoring plot. The plot (100 m x 100 m) contains 12 measurement sites equipped with frost tubes and soil temperature cables. During peak summer each year, you obtain measurements of the thickness of the active layer to add to a long-term dataset. Your values for the active layer thicknesses at the plot in the most recent year are as follows.
| Site |
Active layer thickness (cm) |
| 1 |
30 |
| 2 |
45 |
| 3 |
120 |
| 4 |
25 |
| 5 |
63 |
| 6 |
12 |
| 7 |
54 |
| 8 |
68 |
| 9 |
55 |
| 10 |
39 |
| 11 |
46 |
| 12 |
67 |
Part 1: Calculate the mean of the data
Step 1. Add all the values together (in this example, layer thickness)
`30 + 45 + 120 + 25 + 63 + 12 + 54 + 68 + 55 + 39 + 46 + 67 = 624`
Step 2. Divide the sum by the number of data points.
Step 3. Round (if necessary) and report the result, including the original units.
Since the measurements are made to the nearest centimeter, no rounding is needed. The answer is 52 cm
To calculate the mean in Google Sheets or in Excel, click on an empty cell and type:
=AVERAGE( ). Then, move your cursor inside the parentheses. Highlight all the cells with the data, and hit return. The mean will appear in the cell.
Part 2: Calculate the standard deviation
standard deviation `= sqrt(((value1-mean)^2+(value2-mean)+(value3-mean)+etc)/(n - 1)`
Step 1: Determine the mean of the data values.
We know this from calculating the mean active layer thickness in the step above. We can reuse that value here. Mean = 52 cm
Step 2. Calculate the square of the difference to the mean for each data point
`(value-mean)^2`.
Take each data value, subtract the mean value, and then square it.
Here we use a list format, rather than a table. Either is fine:
`(30 – 52)^2 = 484`
`(45 – 52)^2 = 49`
`(120 – 52)^2 = 4624`
`(25 – 52)^2 =729`
`(63 – 52)^2 = 121`
`(12 – 52)^2 = 1600`
`(54 – 52)^2 = 4`
`(68 – 52)^2 = 256`
`(55 – 52)^2 = 9`
`(39 – 52)^2 = 169`
`(46 – 52)^2 = 36`
`(67 – 52)^2 = 225`
Step 3. Add all the squared differences obtained in Step 2.
`484 + 49 + 4624 + 729 + 121 + 1600 + 4 + 256 + 9 + 169 + 36 + 225 = 8306` cm
Step 4. Divide the sum of the squared differences (from Step 3) by the number of data points minus 1.
`8306/(12 – 1) = 755.09` cm
Step 5. Take the square root of the result you obtained in Step 4.
`sqrt(755.09) = 27.479` cm
Step 6. Round (if necessary) and report the result, including the original units.
The data is in centimeters, so round to the nearest whole centimeter.
s = 27 cm
That is, most of the data fall within 27 cm of the mean value of 52 cm
To calculate this in Google Sheets and Microsoft Excel: make a column with the list of active layer thickness values. In a new cell, type: "=STDEV( )". Place your cursor within the parentheses, highlight the list of thickness values, and hit return. The standard deviation value will be produced.
Part 3: Calculate the median of the data
Step 1. Order the values from lowest to highest.
12, 29, 30, 39, 45, 46, 54, 55, 63, 67, 68, 120 cm
Step 2. Find the value in the center of the ordered values.
Here, there are 12 temperature measurements. Because this is an even number, the median would be the mean of the two middle values.
Take the mean of the two middle numbers: `(46 + 54)/2 = 50` cm
The median thickness of the active layer is 50 cm
To calculate this in Google Sheets and Microsoft Excel: make a column with the list of active layer thickness values. In a new cell, type: "=MEDIAN( )". Place your cursor within the parentheses, highlight the list of active layer thickness values, and hit return. The median value will be produced.
Part 4: Which is a more appropriate statistic for the active layer data, mean or median?
Answer: Here, we have a small dataset with one very noticeable outlier (120 cm). We know that data sets with outliers are better described using the median as a measure of the central tendency of the data. Surprisingly, the mean and median for this data set are very similar. This happened by chance, and we would not expect that to be true of all datasets with outliers. So, using the median would be the best for the permafrost data.
Next Steps
TAKE THE QUIZ!!
I think I'm competent with introductory statistics and I am ready to take the quiz! This link takes you to WAMAP. If your instructor has not given you instructions about WAMAP, you may not have to take the quiz.Or you can go back to the Introductory Statistics explanation page.