How do I use mean, median, mode, and standard deviation?
Introduction to descriptive statistics in the Earth sciences
Robyn Gotz (Montana State University, Bozeman)
Sonia Nagorski (University of Alaska Southeast, Juneau)
An introduction to calculating mean, median, mode, and standard deviation.
Geoscientists collect tremendous amounts of data to describe, measure, and monitor the natural environment. It is necessary to summarize values rather than awkwardly listing long strings of data. For example, if you are planning a river rafting trip to investigate sandbar dynamics, you would look up the typical flow level for a certain week instead of looking at a huge list of the flow levels measured every 15 minutes all year long. Summary data are also needed to compare datasets to each other; for example, the number of landslides on rainy days versus the number of landslides on dry days. All such data are dependent on multiple measures that need to be summarized.
What are mean, median, mode, and standard deviation?
There are several ways to describe the central tendency of the data.
Mean is the average of the values in a dataset.
Standard deviation (SD) describes how much variation there is around the mean (not the median or mode). A low SD means the data are mostly close to the mean, while a high SD means the data are spread out. Note that the SD is not the same as the full range (minimum to maximum) of the data.
Median is the midpoint of the values in a dataset. Half of the values in the data set will fall below the median, half will be above it.
Mode is the most commonly occurring value in the dataset. If all values are unique, there is no mode. There can be multiple modes in a dataset.
How do I calculate mean, median, mode, and standard deviation?
Here are steps to take to calculate the mean, median, mode, and standard deviation. For these calculations, we will use minimum January temperatures in Bozeman, Montana for the years 2016-2020.
|Year||Min. Jan. Temp. (°F)|
Part 1: Calculate mean:
To calculate the mean, add all of the data points together and divide by the total by the number of data points.
Part 2: Calculate standard deviation
Calculating standard deviation by hand is more difficult than calculating the mean, median, or mode. It is often more efficient to calculate this statistic using Excel or Google Sheets, but it is important to understand the math behind calculating standard deviation.
Part 3: Calculate median:
To calculate the median, put the data values in order from smallest to greatest. If there is an odd number of data points, find the value located in the middle of the data points. If there is an even number of data points, take the mean (average) of the two middle data points.
Part 4: Calculate mode:
To find the mode, put your data points in order from smallest to largest and identify the value that appears most commonly. This is the mode. It is possible that there is no mode (if all values are unique). There may also be multiple modes if there are many repeats in the data, so make sure to look carefully at the ordered data. If your data are not numerical but categorical (e.g., soil type, atmospheric pollutant type, sand dune type), provide a numeric value (e.g. 1-5) for each type, and calculate the mode according to the instructions above.
When do I use mean, median, mode, and standard deviation?
Mean is the most commonly used measure of the center (or average) of a data set. Most people know it as an "average." It is most useful when you have a large dataset (>20 samples) without strong outliers.
Median is used for small datasets (< 20 samples) or datasets and/or those that are not normally distributed (that is, they have large outliers). Medians don't get disproportionately influenced by particularly large or small values in the dataset. It can be helpful to look at the distribution (shape) of the data to check if it is normally distributed or if there are outliers. To learn about and practice working with histograms see How Do I Interpret and Create Histograms.
Mode is used when you need the most common value(s) in the dataset. The mode is particularly appropriate when there are a lot of repeated values in a dataset, such as when there is a binary (yes/no, present/absent, high/low, or 0/1 type of response), or when the data are not numerical but categorical. If they are categorical, the category types need to be coded with a value.
Standard deviation is used in any data set that reports a mean. It is a measure of how large the variability is around the single mean value. In a perfectly normal distribution (with a bell curve), 1 SD includes 68% of the dataset, not the entire range of data. When a median is reported, a different measure of variability, such as the Interquartile Range, is appropriate.
Example problem: Summarizing ocean depth data
Ocean depth is needed to calculate the speed a which a tsunami travels across the ocean. The speed of a tsunami is greater when the water is deeper. In this problem, you will work towards finding the depth value.
Problem: An megathrust M 9.1 earthquake strikes off of Kodiak, Alaska and causes a massive displacement of water, generating a tsunami. To calculate the arrival time of a tsunami in Hilo, Hawaii, which is 4090 km away, one first needs to know speed that the tsunami is traveling across the ocean, and that is dependent on the mean depth of the ocean.
From Google Earth, you can obtain values of ocean depth between Kodiak and Hilo. Marking 20 evenly-spaced values of ocean depth between the two locations, you find the following depths:
-4531m, -4607m, -4927m, -5051m, -5286m, -5276m, -4899m, -5609m, -5475m, -5255m, -5582m, -5564m, -5407m, -5525m, -5444m, -5525m, -5630m, -4662m, -4992m, and -1879m
a) Find the mean, standard deviation, median, and mode of the water depth measurements.
b) Compare the usefulness of mean, median, and mode in this case:
c) How long will it take the tsunami get to Hilo, Hawaii from Kodiak, Alaska ?
Now that you know the mean ocean depth, you can calculate the speed at which the tsunami is traveling. The velocity of a tsunami is V= √(9.8 m/s2·D) where D is the mean ocean depth.
Where do you calculate mean, median, mode, and standard deviation in Earth science?
These calculations are used in virtually every area of Earth science including
- Meteorology and atmospheric sciences
- Planetary Geology
- Statistics Intro: Mean, median, and mode from Khan Academy is a basic video tutorial.
- Range, variance, and standard deviation: measures of spread from Khan Academy includes information about standard deviation and other measures of the dispersion of data. h
- Mode in Wolfram MathWorld has a description of mode in a very general (non-applied way) but is more rigorous than our treatment of this topic.
Pages written by Sonia Nagorski (University of Alaska Southeast) and Robyn Gotz (Montana State University).