How Do I Use Mean, Median, Mode, and Standard Deviation?
Introduction to Descriptive Statistics in the Earth Sciences
This module is available for public use, but it is undergoing revision after classroom implementation with the Math You Need project.
Provenance: Sonia Nagorski, University of Alaska Southeast
Reuse: This item is offered under a Creative Commons Attribution-NonCommercial-ShareAlike license http://creativecommons.org/licenses/by-nc-sa/3.0/ You may reuse this item for non-commercial purposes as long as you provide attribution and offer any derivative works under a similar license.
An Introduction to Calculating Mean, Median, Mode, and Standard Deviation
Geoscientists collect tremendous amounts of data to describe, measure, and monitor the natural environment. It is necessary to summarize values rather than awkwardly listing long strings of data. For example, if you are planning a river rafting trip to investigate sandbar dynamics, you would look up the typical flow level for a certain week instead of looking at a huge list of the flow levels measured every fifteen minutes all year long. Summary data are also needed to compare datasets to each other; for example, the number of landslides on rainy days versus the number of landslides on dry days. All such data are dependent on multiple measures that need to be summarized.
What Are Mean, Median, Mode, and Standard Deviation?
There are several ways to describe the central tendency of the data.
Mean is the average of the values in a dataset.
Standard deviation (SD) describes how much variation there is around the mean (not the median or mode). A low SD means the data are mostly close to the mean, while a high SD means the data are spread out. Note that the SD is not the same as the full range (minimum to maximum) of the data.
Median is the midpoint of the values in a dataset. Half of the values in the data set will fall below the median, half will be above it.
Mode is the most commonly occurring value in the dataset. If all values are unique, there is no mode. There can be multiple modes in a dataset.
How Do I Calculate Mean, Median, Mode, and Standard Deviation?
Here are steps to take to calculate the mean, median, mode, and standard deviation. For these calculations, we will use minimum January temperatures in Bozeman, Montana, for the years 2016–2020.
Year |
Min. Jan. Temp. (°F) |
2020 |
21.6 |
2019 |
15.0 |
2018 |
18.8 |
2017 |
6.5 |
2016 |
16.4 |
Part 1: Calculate mean:
To calculate the mean, add all of the data points together and divide by the total by the number of data points.
Step 1. Add all the temperature values together
21.6° + 15.0° + 18.8° + 6.5° + 16.4° = 78.3°
Step 2. Divide the sum by the number of data points:
There are 5 data points so you divide 78.3°/5. 78.3°/5 = 15.66°.
Step 3. Round (if necessary) and report the result, including the original units.
The answer above has more precision than the original values, so many instructors will want you to round to the tenths, which would be 15.7. So the mean minimum temperature in January in Bozeman is 15.7° F
How to use Excel/Sheets to calculate the mean minimum January temperature.
To calculate the mean in Google Sheets or in Excel, click on an empty cell and type:
=AVERAGE( )
then, move your cursor inside the parentheses. Highlight all the cells with the temperature data, and then hit return. The mean will appear in the cell.
Part 2: Calculate standard deviation
Calculating standard deviation by hand is more difficult than calculating the mean, median, or mode. It is often more efficient to calculate this statistic using Excel or Google Sheets, but it is important to understand the math behind calculating standard deviation.
Step 1: Determine the mean of the data values.
This was done above. Mean temperature = 15.7°F
Step 2: Calculate the square of the distance to the mean.
Take each data value, subtract the mean value, and then square it. ((value -mean)^2). You can make a table of the data and the squared distances to the mean, or you can list them.
squared distances to the mean: (16.4 – 15.7)^2 = 0.49, (6.4 – 15.7)^2 = 86.49, (18.8 – 15.7)^2 = 9.61, (15 – 15.7)^2 = 0.49, (21.6 – 15.7)^2) = 34.81
squared distances to the mean: 0.49, 86.49, 9.61, 0.49, 34.81
Step 3: Add all the squared distances obtained in Step 2.
0.49 + 86.49 + 9.61+ 0.49 + 34.81 = 131.89
Step 4: Divide the sum of the squared distances by the number of data points minus 1.
Step 5: Take the square root of the result you obtained in Step 4.
sqrt(32.9725) = 5.74217°F
Step 6: Round (if necessary) and report the result, including the original units.
Since the original data went to the tenths of a degree, we round 5.74217°F to SD = 5.7°F. That is, most of the data fall within 5.1°F of the mean value of 15.7°F.
How to use Excel/Sheets to calculate the standard deviation of the temperature data
Click on an empty cell, and type:
=STDEV( )
Then, move your cursor inside the parentheses. Using your mouse or touchpad, highlight all the cells with the temperature data, and then hit return. The standard deviation will appear in the empty cell.
Standard deviation of a sample vs. a population
There are actually two ways to calculate a standard deviation. The first, the standard deviation of a sample, is for when you don't have all the data that you might have. This is typically what is used in Earth science classes and what we will use throughout this module. If you have a complete set of data, for instance the height of every single student in a class, then you would use the standard deviation of a population, which will be slightly less. In spreadsheets, these two standard deviations are calculated by the =STDEV.S() and the =STDEV.P() functions. If you just use =STDEV() you will get the sample standard deviation.
Part 3: Calculate median:
To calculate the median, put the data values in order from smallest to greatest. If there is an odd number of data points, find the value located in the middle of the data points. If there is an even number of data points, take the mean (average) of the two middle data points.
Step 1: Put the data points in order from lowest to highest.
6.5°, 15.0°, 16.4°, 18.8°, 21.6°
Step 2: Find the value in the center of the data points.
Here, there are 5 temperature measurements. Because 5 is an odd number, the median is the middle value. Here, that is 16.4°F.
If there were an even number of temperature measurements (for example, if there were 6 measurements), we would take the two center measurements (data points 3 and 4) and average those.
How to use Excel/Sheets to calculate the median of the temperature data.
To calculate the median in Google Sheets or in Excel, click on an empty cell and type:
=MEDIAN( )
Then, move your cursor inside the parentheses. Using your mouse or touchpad, highlight all the cells with the temperature data, and then hit return. The median will appear in the empty cell.
Part 4: Calculate mode:
To find the mode, put your data points in order from smallest to largest and identify the value that appears most commonly. This is the mode. It is possible that there is no mode (if all values are unique). There may also be multiple modes if there are many repeats in the data, so make sure to look carefully at the ordered data. If your data are not numerical but categorical (e.g., soil type, atmospheric pollutant type, sand dune type), provide a numeric value (e.g. 1–5) for each type, and calculate the mode according to the instructions above.
Step 1: Put the data points in order from lowest to highest.
6.5°, 15.0°, 16.4°, 18.8°, 21.6°
Step 2: Find the value that occurs the most often.
Look for the value that occurs the most often. Here, all the temperatures are different, so there is no mode.
How to use Excel/Sheets or calculator to calculate the mode of the temperature data.
Click on an empty cell, type:
=MODE( )
Then, move your cursor inside the parentheses. Using your mouse or touchpad, highlight all the cells with the temperature data, and then hit return. The mode will appear in the empty cell.
When Do I Use Mean, Median, Mode, and Standard Deviation?
Mean is the most commonly used measure of the center (or average) of a data set. Most people know it as an "average." It is most useful when you have a large dataset (>20 samples) without strong outliers.
- the mean density of a mineral based on 30 samples of it
- the mean rate of plate motion based on multiple measured and dated segments of the plate's oceanic crust
- the mean amount of rainfall each year in a particular climate based on hourly or daily data
- the mean monthly concentrations of carbon dioxide and sulfur dioxide being emitted from volcanic vents based on hourly measurements
Median is used for small datasets (< 20 samples) and/or those datasets that are not normally distributed (that is, they have large outliers). Medians don't get disproportionately influenced by particularly large or small values in the dataset. It can be helpful to look at the distribution (shape) of the data to check if it is normally distributed or if there are outliers. To learn about and practice working with histograms, see How Do I Interpret and Create Histograms.
- the median annual streamflow in a river (which may have abnormally high or low flows for short periods of time)
- median snow depth data (which can produce large outliers from unusual underlying topography)
- median concentrations of trace metals in streams (which are very expensive to analyze, and so datasets are typically small)
- median sizes of atoll islands in the Pacific Ocean (there are many more small ones than large ones)
- grain size distribution of sediment samples from a lakebed (typically, grains are well sorted in lacustrine and fluvial systems but contain outliers)
Mode is used when you need the most common value(s) in the dataset. The mode is particularly appropriate when there are a lot of repeated values in a dataset, such as when there is a binary (yes/no, present/absent, high/low, or 0/1 type of response), or when the data are not numerical but categorical. If they are categorical, the category types need to be coded with a value.
- Hurricane category level (1–5) at landfall in the United States over the past decade
- Soil texture (sand, loamy sand, sandy loam, sandy clay loam, loam, silt loam, silt, silty clay loam, clay, clay loam, sandy clay, and silty clay)
- Drainage pattern (dendritic, trellis, radial, rectangular)
- Sorting of river sediment by grain size (clay, silt, sand, gravel, pebble)
- Landslide type within a region (translational slide, rotational slide, debris flow, rockfall, mudflow . . .)
Standard deviation is used in any data set that reports a mean. It is a measure of how large the variability is around the single mean value. In a perfectly normal distribution (with a bell curve), 1 SD includes 68% of the dataset, not the entire range of data. When a median is reported, a different measure of variability, such as the Interquartile Range, is appropriate.
Example problem: Summarizing ocean depth data
Ocean depth is needed to calculate the speed at which a tsunami travels across the ocean. The speed of a tsunami is greater when the water is deeper. In this problem, you will work toward finding the depth value.
Problem: An megathrust M 9.1 earthquake strikes off of Kodiak, Alaska, and causes a massive displacement of water, generating a tsunami. To calculate the arrival time of a tsunami in Hilo, Hawaii, which is 4090 km away, one first needs to know the speed that the tsunami is traveling across the ocean, and that is dependent on the mean depth of the ocean.
This shows the tsunami arrival times (in hour) in the Pacific Ocean from a destructive tsunami generated by the 1964 Great Alaska Earthquake near Anchorage (~650km away from the tsunami in the problem presented here). Numbers represent tsunami travel arrival times (in hours). Source: https://www.ncei.noaa.gov/products/natural-hazards/tsunamis-earthquakes-volcanoes/tsunamis/travel-time-maps
Provenance: NCEI, the World Data Service for Geophysics (including Tsunamis), and the UNESCO-IOC International Tsunami Information Center https://www.ncei.noaa.gov/products/natural-hazards/tsunamis-earthquakes-volcanoes/tsunamis/travel-time-maps
Reuse: This item is in the public domain and maybe reused freely without restriction.
From Google Earth, you can obtain values of ocean depth between Kodiak and Hilo. Marking 20 evenly spaced values of ocean depth between the two locations, you find the following depths:
-4531m, -4607m, -4927m, -5051m, -5286m, -5276m, -4899m, -5609m, -5475m, -5255m, -5582m, -5564m, -5407m, -5525m, -5444m, -5525m, -5630m, -4662m, -4992m, and -1879m
a) Find the mean, standard deviation, median, and mode of the water depth measurements.
Part 1: Calculate the mean depth. The mean depth of the ocean between Kodiak and Hilo is a key value to know when to expect a tsunami to strike the Hawaiian island.
Step 1. Add up all the depth values
(-4531 + -4531 + -4607 + -4927 + -5051 + -5286 + -5276 + -4899 + -5609 + -5475 + -5255 + -5582 + -5564 + -5407 + -5525 + -5444 + -5525 + -5630 + -4662 + -4992 + -1879) = -101,126 m
Step 2: Divide by the number of data points.
There are 20 measurements, so divide -101,126 m by 20
-101,126/20 = -5056.3
Step 3: Keeping track of your significant figures, you round and report your final value including the original units.
Because the original set of values was only measured to the meter, you will round to the closest meter, so -5056.3 m rounds to -5056 m (be sure to include the units!)
To do this in Google Sheets and Microsoft Excel: make a column with the list of depths. In a new cell, type: "=AVERAGE( )". Within the parentheses, highlight the list of depth values, and hit return. The average (mean) value will be produced.
Part 2: Calculate the standard deviation of the ocean depth data. This is seldom done by hand, as it is simpler and avoids errors to use the "=STDEV( )" formula in Sheets or Excel, but the manual method is shown here.
Step 1. Determine the mean of the data values.
This was done above. Mean depth = -5056m
Step 2: Calculate the square of the distance to the mean.
Take each data value, subtract the mean value (-5056m), and then square the result. ((mean-value)
2) (Note that in this example you are subtracting a negative number, which is the same as adding it). Here a table is easier to use than a list, since there are 20 data points. Make a table of the values, distances to the mean value, and the square of the distances to the mean:
Ocean depth (m) |
Distance to the mean (-5056m) |
Square of distance to the mean |
-4531 |
525 |
275625 |
-4607 |
449 |
201601 |
-4927 |
129 |
16641 |
-5051 |
5 |
25 |
-5286 |
-230 |
52900 |
-5276 |
-220 |
48400 |
-4899 |
157 |
24649 |
-5609 |
-553 |
305809 |
-5475 |
-419 |
175561 |
-5255 |
-199 |
39601 |
-5582 |
-526 |
276676 |
-5564 |
-508 |
258064 |
-5407 |
-351 |
123201 |
-5525 |
-469 |
219961 |
-5444 |
-388 |
150544 |
-5525 |
-469 |
219961 |
-5630 |
-574 |
329476 |
-4662 |
394 |
155236 |
-4992 |
64 |
4096 |
-1879 |
3177 |
10093329 |
Step 3: Add all of the squares of the distances values obtained in Step 2.
275625 + 201601 + 16641 + 25 + 52900 + 48400 + 24649 + 305809 + 175561 + 39601 + 276676 + 258064 + 123201 + 219961 + 150544 + 219961 + 329476 + 155236 + 4096 + 10093329 = 12,971,356
Step 4: Divide the sum of the values you for in Step 3 by the number of data points minus 1.
12,971,356 /19 = 682,702.9
Step 5: Take the square root of the value you obtained in Step 4.
Step 6: Report your answer using the same number of significant figures as are in the original data and make sure to include units.
You found the value in Step 5 to be 826.26, but the original set of values was only measured to the meter, so you will round to the closest meter. SD = 826 m.
Note: to calculate this in Google Sheets and Microsoft Excel: make a column with the list of depths. In a new cell, type: "=STDEV( )". Within the parentheses, highlight the list of depth values, and hit return. The standard deviation value will be produced.
Part 3: Calculate the median depth
Step 1. Order the values from lowest to highest.
-5630m, -5609m, -5582m, -5564m, -5525m, -5525m, -5475m, -5444m, -5407m, -5286m, -5276m, -5255m, -5051m, -4992m, -4927m, -4899m, -4662m, -4607m, -4531m, -1879m
Step 2. Find the middle of the ordered values.
Because there are an even number of values, you will need to take the average of the 2 middle values.
There are 20 values, so the middle line will go through values #10 (-5286m) and #11(-5276m). Take the average of those two values ((-5286m + -5276)/2) to get the median.
Median = -5281m.
Note: to calculate this in Google Sheets and Microsoft Excel: make a column with the list of depths. In a new cell, type: "=MEDIAN( )". Within the parentheses, highlight the list of depth values, and hit return. The median value will be produced.
Part 4: Calculate the mode of the depth data
Step 1: Put the data points in order from lowest to highest.
-5630m, -5609m, -5582m, -5564m, -5525m, -5525m, -5475m, -5444m, -5407m, -5286m, -5276m, -5255m, -5051m, -4992m, -4927m, -4899m, -4662m, -4607m, -4531m, -1879m
Step 2: Look for the value that occurs the most often.
In this dataset, there are 2 depths that are identical: (-5525m). All the other depths are unique. Thus, the mode is -5525m.
Note: to calculate this in Google Sheets and Microsoft Excel: make a column with the list of depths. In a new cell, type: "=MODE( )". Within the parentheses, highlight the list of depth values, and hit return. The mode value will be produced.
b) Compare the usefulness of mean, median, and mode in this case:
The formula for calculating the tsunami velocity uses the mean ocean depth, and so the mean is most useful here. However, the median would be useful if there were some strong outliers in the depth data (e.g., if the depth data happen to include the Aleutian trench, which is very narrow and deep or a very tall seamount; these features are unusual along the transect of the seafloor and might bias the mean). In this dataset, the mode, which included only 2 of the 20 values, is not particularly informative; it just says that the same depth happened to be measured twice. The standard deviation is quite small relative to the mean, signifying that the dataset has a fairly small amount of variability and the ocean depth is fairly uniform between Alaska and Hawaii.
c) How long will it take the tsunami get to Hilo, Hawaii, from Kodiak, Alaska ?
Now that you know the mean ocean depth, you can calculate the speed at which the tsunami is traveling. The velocity of a tsunami is V= √(9.8 m/s2·D) where D is the mean ocean depth.
Since the mean depth is 5056m (you calculated that above),
V= √(9.8 m/s2·5056m) = 219 m/s, or 788 km/hr (look here for a lesson on unit conversions)
Knowing the speed and the distance, you can then calculate how long it will take the tsunami to arrive in Hilo: (using the formula, time= distance/velocity);
time= 4090 km/ 788 km/hr = 5.19 hours (5 hours, 11.4 minutes)
Where Do You Calculate Mean, Median, Mode, and Standard Deviation in Earth Science?
These calculations are used in virtually every area of Earth science, including
- Volcanology
- Oceanography
- Meteorology and atmospheric sciences
- Planetary geology
- Sedimentology
Next Steps
I am ready to PRACTICE!
If you think you have a handle on the steps above, click on this bar to try practice problems with worked answers.
Or, if you want even more practice, see 'More help' below.More Help
Pages written by Sonia Nagorski, University of Alaska Southeast, and Robyn Gotz, Montana State University.