Introductory Statistics - Practice Problems
Solving Earth science problems with mean, median, mode, and standard deviation
This module is available for public use, but it is undergoing revision after classroom implementation with the Math Your Earth Science Majors Need project.
Problem 1: Strike and dip measurements along an outcrop (Calculating a Mean)
The strike and dip of a bedrock unit describe the orientation and inclination.
As a structural geologist, you are tasked with producing a geologic map of an area. You track a long outcrop of shale and take strike and dip measurements in 8 locations. Below is the table from your field notebook. Find the mean strike and the mean dip of the measurements.
Provenance: Sonia Nagorski, University of Alaska Southeast
Reuse: This item is offered under a Creative Commons Attribution-NonCommercial-ShareAlike license http://creativecommons.org/licenses/by-nc-sa/3.0/ You may reuse this item for non-commercial purposes as long as you provide attribution and offer any derivative works under a similar license.
Provenance: From https://opengeology.org/historicalgeology/tools-of-historical-geology/geologic-maps/
Reuse: This item is in the public domain and maybe reused freely without restriction.
Strike (°) |
Dip (°) |
015 |
34 |
017 |
39 |
020 |
31 |
016 |
33 |
021 |
37 |
018 |
32 |
020 |
38 |
014 |
33 |
Part 1: Find the mean strike
Answer:
Step 1. Add all the strike values together
15 + 17 + 20 + 16 + 21 + 18 + 20 + 14 = 141
Step 2. Divide the sum by the number of measurements.
Step 3. Round (if necessary) and include original units.
Since the original measurements were to the nearest degree, we round to the nearest degree, 18°
To calculate the mean in Google Sheets or in Excel, click on an empty cell and type:
=AVERAGE( ) then, move your cursor inside the parentheses. Highlight all of the cells with the data, and then hit return. The mean will appear in the cell.
Part 2: Find the mean dip
Step 1. Add all the dip values together.
34 + 39 + 31 + 33 + 37 + 32 + 38 + 33 = 277
Step 2. Divide the sum by the number of measurements.
Step 3. Round (if necessary) and include original units.
Since the original measurements were to the nearest degree, we round to the nearest degree: 35°
To calculate the mean in Google Sheets or in Excel, click on an empty cell and type:
=AVERAGE( ) then, move your cursor inside the parentheses. Highlight all of the cells with the data, and then hit return. The mean will appear in the cell.
Part 3: Report the mean strike and dip for the outcrop.
Answer: The mean strike for the outcrop is 018° and the mean dip is 35°
Problem 2: Most common hurricane categories (find the mode)
Hurricanes are classified on a scale of 1-5 based on the Saffir-Simpson Hurricane Wind scale. A "major" hurricane is one that is a category 3, 4, or 5.
What is the most common major hurricane type (category 3, 4, or 5) to have made landfall in Florida since 1851?
Florida major hurricanes |
Name |
Saffir-Simpson Category |
Year of landfall |
Great Middle Florida |
3 |
1851 |
Unnamed |
3 |
1871 |
Unnamed |
3 |
1873 |
Unnamed |
3 |
1877 |
Unnamed |
3 |
1882 |
Unnamed |
3 |
1888 |
Unnamed |
3 |
1894 |
Unnamed |
3 |
1896 |
Unnamed |
3 |
1906 |
Unnamed |
3 |
1909 |
Unnamed |
3 |
1917 |
Unnamed |
4 |
1919 |
Unnamed |
4 |
1926 |
Unnamed |
4 |
1928 |
Unnamed |
3 |
1933 |
Unnamed |
5 |
1935 |
Unnamed |
3 |
1944 |
Unnamed |
4 |
1945 |
Unnamed |
4 |
1947 |
Unnamed |
4 |
1948 |
Unnamed |
4 |
1949 |
Easy |
3 |
1950 |
King |
4 |
1950 |
Donna |
4 |
1960 |
Betsy |
3 |
1965 |
Alma |
3 |
1966 |
Eloise |
3 |
1975 |
Elena |
3 |
1985 |
Andrew |
5 |
1992 |
Opal |
3 |
1995 |
Charley |
4 |
2004 |
Ivan |
3 |
2004 |
Jeanne |
3 |
2004 |
Dennis |
3 |
2005 |
Wilma |
3 |
2005 |
Irma |
4 |
2017 |
Michael |
5 |
2018 |
Ian |
5 |
2022 |
Part 1: Determine the statistic that the question is asking for.
This is categorical data and the question asks for the most common, so this question is asking for the mode of the data
Part 2: Determine the mode in the second column (hurricane category).
Step 1: Put the data points in order from lowest to highest.
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5
You will find 23 hurricane level threes, 11 hurricane level fours, and 4 hurricane level fives.
Step 2: Find the value that occurs the most often.
Here, the mode is 3, because hurricane category 3s are the most common in the dataset. That is, the most common type of major hurricane to make landfall in Florida since 1951 is a category 3 hurricane.
Problem 3: Snow Water Equivalent values in the Northern Gallatin Range, Montana (Calculate mean, standard deviation, median, and interpret)
Snow Water Equivalent (SWE) is a measure of precipitation derived from snow. It is the height of water per unit area if the snow were melted. An automated network of snow monitoring stations, called SNOTEL stations, record SWE in the western United States.
Below are the peak SWE values from the Lick Creek SNOTEL station in Montana for the years 2004-2023.
- Part 1: Calculate the mean peak SWE for the years 2004-2023
- Part 2: Calculate the standard deviation of the SWE data.
- Part 3: Calculate the median and mode for the twenty years of SWE data.
- Part 4: Interpret your findings. What does the standard deviation tell us about the spread of the SWE data? Would mean or median be a better measure of the typical peak SWE for these years, mean or median? Why?
Water Year |
Peak SWE (in.) |
2023 |
14.5 |
2022 |
11.9 |
2021 |
11.5 |
2020 |
14.2 |
2019 |
11.1 |
2018 |
15.5 |
2017 |
8.9 |
2016 |
10.5 |
2015 |
8.8 |
2014 |
18.5 |
2013 |
11.8 |
2012 |
11.0 |
2011 |
18.8 |
2010 |
10.9 |
2009 |
14.0 |
2008 |
14.1 |
2007 |
11.9 |
2006 |
10.1 |
2005 |
9.6 |
2004 |
9.9 |
Part 1: Calculate the mean peak SWE.
Step 1: Add all of the SWE values together
(14.5 + 11.9 + 11.5 + 14.2 + 11.1 + 15.5 + 8.9 + 10.5 + 8.8 + 18.5 + 11.8 + 11.0 + 18.8 + 10.9 + 14.0 + 14.1 + 11.9 + 10.1 + 9.6 + 9.9 ) = 247.5 in.
Step 2: Divide the sum by the number of data points (years).
The sum is calculated in step 1 and there are 20 measurements, so the mean is 247.5 in./ 20 = 12.38 in
Step 3: Round (if necessary) and report the result, including the original units.
Since the data are measured to the tenth of an inch, round to that and include units: 12.4 in.
To calculate this in Google Sheets and Microsoft Excel: make a column with the list of SWE values. In a new cell, type: "=AVERAGE( )". Within the parentheses, highlight the list of SWE values, and hit return. The average (mean) value will be produced.
Part 2: Calculate the standard deviation of the SWE data.
This is seldom done by hand, as it is simpler and avoids errors to use the "=STDEV( )" formula in Sheets or Excel, but the manual method is shown here.
Step 1. Determine the mean of the data values.
This was done above. Mean SWE = 12.4 in.
Step 2. Calculate the square of the distance to the mean.
Make a table of the values, distances to the mean value, and the square of the distances to the mean
Water Year |
Peak SWE (in.) |
Distance to Mean |
square of distance to mean |
2023 |
14.5 |
2.13 |
4.5156 |
2022 |
11.9 |
-0.48 |
0.2256 |
2021 |
11.5 |
-0.88 |
0.7656 |
2020 |
14.2 |
1.83 |
3.3306 |
2019 |
11.1 |
-1.28 |
1.6256 |
2018 |
15.5 |
3.13 |
9.7656 |
2017 |
8.9 |
-3.48 |
12.0756 |
2016 |
10.5 |
-1.88 |
3.5156 |
2015 |
8.8 |
-3.58 |
12.7806 |
2014 |
18.5 |
6.13 |
37.5156 |
2013 |
11.8 |
-0.57 |
0.3306 |
2012 |
11.0 |
-1.38 |
1.8906 |
2011 |
18.8 |
6.43 |
41.2806 |
2010 |
10.9 |
-1.48 |
2.1756 |
2009 |
14.0 |
1.63 |
2.6406 |
2008 |
14.1 |
1.73 |
2.9756 |
2007 |
11.9 |
-0.48 |
0.2256 |
2006 |
10.1 |
-2.28 |
5.1756 |
2005 |
9.6 |
-2.78 |
7.7006 |
2004 |
9.9 |
-2.48 |
6.1256 |
Step 3. Add all of the squared distances obtained in Step 2.
4.5156 + 0.2256 + 0.7656 + 3.3306 + 1.6256 + 9.7656 + 12.0756 + 3.5156 + 12.7806 + 37.5156 + 0.3306 + 1.8906 + 41.2806 + 2.1756 + 2.6406 + 2.9756 + 0.2256 + 5.1756 + 7.7006 + 6.1256 = 156.6375
Step 4. Divide the sum of the squared distances by the number of data points minus 1.
Step 5. Take the square root of the answer to step 4.
Step 6. Round (if necessary) and report the result, including the original units.
Since the data are measured to the tenth of an inch, round to that and include units: SD = 2.9 in.
To calculate this in Google Sheets and Microsoft Excel: make a column with the list of SWE values. In a new cell, type: "=STDEV( )". Within the parentheses, highlight the list of SWE values, and hit return. The standard deviation value will be produced.
Part 3: Calculate the median SWE.
Step 1: Put the data points in order from lowest to highest.
8.8 in., 8.9 in., 9.6 in., 9.9 in., 10.1 in., 10.5 in., 10.9 in., 11.0 in., 11.1 in., 11.5 in., 11.8 in., 11.9, 11.9 in., 14.0 in., 14.1, in., 14.2 in., 14.5 in., 15.5 in., 18.5 in., 18.8 in.
Step 2: Find the value in the center of the data points.
Because there are 20 values, the middle line will go through values #10 (11.5 in.) and #11 (11.8 in.). Take the average of those two values ((11.5 + 11.8)/2) to get the median.
Median = 11.65 in.
To calculate this in Google Sheets and Microsoft Excel: make a column with the list of SWE values. In a new cell, type: "=MEDIAN( )". Within the parentheses, highlight the list of SWE values, and hit return. The median value will be produced.
Part 4: Interpret your findings. What does the standard deviation tell us about the spread of the SWE data? Would mean or median be a better measure of the central tendency of the SWE data?
The standard deviation of the SWE data is 2.8 in, which is fairly small compared to both the mean (12.4 in.) and the median (11.7 in.). This shows that the data points are generally close together and that variability in SWE over the 20 years from 2004-2023 is low.
When considering whether mean or median is a better measure of the central point of the data, it would be best to graph the distribution of the data in a histogram and see if it looks like a normal distribution (with a classic bell-shaped curve). Mean and standard deviation are appropriate to use with normal distributions, while medians are appropriate for skewed or bimodal data. The histogram is shown above. It does not show a normal distribution, and is slightly right-skewed. This means we would probably want to use the median instead of the mean to describe this data. In addition, we see from the data that their even split is where there are 10 values below the median of 11.65 in. and 10 values above it. The mean (12.4 in.) is slightly higher than the median and thus does not lie in the center of the data points. Finally, this dataset only contains 20 data points, so it would be considered a small data set. So, for this data set, the median would be a more appropriate measure of the central tendency of the data.
Problem 4: Permafrost thaw (Calculate mean, median, and mode)
Permafrost thaw is an issue of great concern in the arctic and subarctic regions of the world. The uppermost layer of soil, called the "active layer," thaws in the summers, but the permafrost below stays frozen. The thickness of the active layer has been increasing with climate warming, as more permafrost thaws and becomes part of the active layer.
Your job is to monitor the depth of the active layer at a set of monitoring plots. Each plot (100m x 100m) contains 12 measurement sites equipped with frost tubes and soil temperature cables. During peak summer each year, you obtain measurements of the thickness of the active layer to add to a long-term dataset. Your values of active layer thicknesses at the plot in the most recent year are as follows.
Part 1: Calculate the mean. Part 2: calculate the standard deviation. Part 3: Calculate the median of the values. Part 4: Do you think mean or median is a better measurement of the central tendency of the data in this case?
Site |
Active layer thickness (cm) |
1 |
30 |
2 |
45 |
3 |
120 |
4 |
25 |
5 |
63 |
6 |
12 |
7 |
54 |
8 |
68 |
9 |
55 |
10 |
39 |
11 |
46 |
12 |
67 |
Part 1: Calculate the mean of the data
Step 1. Add all the active layer thickness values together
30 + 45 + 120 + 25 + 63 + 12 + 54 + 68 + 55 + 39 + 46 + 67 = 624
Step 2. Divide the sum by the number of data points.
Step 3. Round (if necessary) and report the result, including the original units..
Since the measurements are made to the nearest centimeter, no rounding is needed. The answer is 52 cm
To calculate the mean in Google Sheets or in Excel, click on an empty cell and type:
=AVERAGE( ). Then, move your cursor inside the parentheses. Highlight all of the cells with the data, and hit return. The mean will appear in the cell.
Part 2: Calculate the standard deviation
Step 1: Determine the mean of the data values.
We know this from calculating the mean active layer thickness in the step above. We can reuse that value here. Mean = 52 cm
Step 2. Calculate the square of the distance to the mean for each data point. ((value -mean)
2) That is, take each data value, subtract the mean value, and then square it.
Squared distances to the mean. Here we use a list format, rather than the table. Either is fine.:
(30 - 52)2 = 484, (45 - 52)2 = 49, (120 - 52)2 = 4624, (25 - 52)2 =729, (63 - 52)2) = 121, (12 - 52)2 = 1600, (54 - 52)2 = 4, (68 - 52)2 = 256, (55 - 52)2 = 9, (39 - 52)^2 = 169, (46 - 52)2 = 36, (67 - 52)^2 =225
Step 3. Add all of the squared values.
(484 + 49 + 4624 + 729 + 121 + 1600 + 4 + 256 + 9 + 169 + 36 + 225) = 8306
Step 4. Divide the sum of the values by the number of data points minus 1
Step 5. Take the square root of the quotient
Step 6. Round (if necessary) and report the result, including the original units.
The data is in centimeters, so round to the nearest whole centimeter. SD = 27 cm
That is, most of the data fall within 26 cm of the mean value of 52 cm
To use Excel/Sheets to calculate the standard deviation of the active layer data
Click on an empty cell, type:
=STDEV( )
Then, move your cursor inside the parentheses. Using your mouse or touchpad, highlight all of the cells with the active layer data, and then hit return. The standard deviation will appear in the empty cell.
Part 3: Calculate the median of the data
Step 1. Put the active layer thickness in order from lowest to highest.
12 cm, 29 cm, 30 cm, 39 cm, 45 cm, 46 cm, 54 cm, 55 cm, 63 cm, 67 cm, 68 cm, 120 cm
Step 2. Find the value in the center of the data points.
Here, there are 12 temperature measurements. Because this is an even number, the median would be the mean of the two middle values.
Take the mean of the two middle numbers: (46 + 54)/2 = 50
The median thickness of the active layer is 50 cm
To use Excel/Sheets to calculate the median of the active layer data
Click on an empty cell, and type
=MEDIAN( )
Then, move your cursor inside the parentheses. Highlight all of the cells with the temperature data, and hit return. The mean will appear in the cell.
Part 4: Which is a more appropriate statistic for the active layer data, mean or median?
Answer: Here, we have a small data set with one very noticeable outlier (120 cm). We know that small data sets and those with outliers are better described using the median as a measure of the central tendency of the data. Surprisingly, the mean and median for this data set are very similar. This happened by chance and we would not expect that to be true of all data sets of this size. So, using the median would be the best for the permafrost data.
Next steps
TAKE THE QUIZ!!
I think I'm competent with introductory statistics and I am ready to take the quiz! This link takes you to WAMAP. If your instructor has not given you instructions about WAMAP, you may not have to take the quiz.Or you can go back to the Introductory Statistics explanation page.