How do I use probability to predict geologic events?
Probability in the Earth sciences
This module is undergoing classroom implementation with the Math Your Earth Science Majors Need project. The module is available for public use, but it will likely be revised after classroom testing.
Provenance: https://pixabay.com/photos/earthquake-rubble-laquila-collapse-1665870/
Reuse: This item is offered under a Creative Commons Attribution-NonCommercial-ShareAlike license http://creativecommons.org/licenses/by-nc-sa/3.0/ You may reuse this item for non-commercial purposes as long as you provide attribution and offer any derivative works under a similar license.
An introduction to probability
What is the likelihood that there will be a major earthquake in my town during my life?
What are the chances that my home will be impacted by a category 5 hurricane this year?
To answer these types of questions, we need probability. We use probability to quantify the likelihood that an event will occur. This method is particularly useful for quantifying the likelihood of hazards such as earthquakes and floods, or for determining the likelihood of success when probing the Earth for natural resources. This module will introduce the mathematic procedures needed to calculate the probability of geoscientific events.
Types of problems in this module
Probability can be used in a lot of ways, but in this module, we will focus on three specific types of probability problems:
- Determine the probability of occurrence. This type of problem is used when you want to determine the probability that event will occur, such as the probability of having a high category hurricane in a given year, or the probability that a mine will yield valuable minerals. Based on the historical record, what's the probability of having a major flood?
- Make predictions about the probability of occurrence over an interval. This type of problem is used when you want to determine the likelihood that there will be a major earthquake over a 30 year period, or that at least one of 20 boreholes will strike oil. For example, what's the probability of having at least one category 5 hurricane in the next 10 years?
- Determine the interval needed to achieve a certain probability. This type of problem is used when you want to know what length of interval will yield a certain probability or risk level. For example, over what time period can you expect to have a 99% chance of experiencing a major drought? How many rocks do you need to analyze to have a 50% chance of finding a meteorite?
How do I determine the probability of occurrence for geologic events from data?
What is the likelihood of a major earthquake, flood, meteor impact, or finding an economic mineral deposit to fuel the energy transition? To make predictions about geological events such as earthquakes, storms, landslides, and finding valuable minerals, we must first use existing observations to find the probability of these events. To do so, we need a set of observations and an outcome of interest, which we define. The set of observations could be the magnitude of the largest earthquake each year in a region or the compositions of rocks recovered from mining boreholes. The outcome of interest could be having a damaging earthquake (magnitude >7) in a given year, or finding gold in a given borehole. We then divide the number of outcomes of interest by the total number of observations:
`"Probability of occurrence" = ("outcome of interest")/("total number of observations")`
The probability of the outcome of interest is thus the proportion of observations that result in the outcome of interest. For example, in the last 150 years, there have been 23 years with a magnitude $\geq$7 earthquake in California. Using the equation above, we can determine the probability of having a magnitude $\geq$7 earthquake in a given year using the following steps:
Step 1. Determine the type of problem.
This type of problem is finding the probability of occurrence.
Step 2. Determine the probability of occurrence of the outcome of interest.
Now we can plug numbers into the probability of occurrence equation.
$P(\text{M }\geq\text{7 earthquake in one year}) = \frac{23 \text{ years with M }\geq\text{7 earthquakes}}{150 \text{ years}} = 0.15$
This means there is a 15% chance of having a magnitude $\geq$7 earthquake in a given year.
How to identify probability of occurrence problems: You can identify this type of problem by the fact that it asks about a single observation. For example, the probability of an earthquake in a single year, or the probability of finding gold in a single borehole.
A few notes about probability: Probability values range from 0 to 1. We can express a probability in terms of percent, which is intuitive for most people. For example, a probability of 0.5 is equivalent to 50%. However, when we perform calculations that involve probability, it is important to use the standard probability value between 0 and 1. If we use percentages, the calculations will not work.
Probabilities are often expressed using the notation $P(\text{some specific event or outcome})$, which can be read, "the probability of some specific event or outcome." For example, $P(\text{M }\geq\text{7 earthquake in one year})$ can be read, "the probability of a magnitude $\geq$7 earthquake in one year."
How do I make predictions about the probability of occurrence over an interval?
In the Earth sciences, we often want to determine the likelihood that an event will occur within some given time interval. For example, what are the chances of having a magnitude $\geq$7 earthquake in the next 30 years? We can use calculated probabilities, together with several mathematical techniques, to make these sorts of predictions accurately.
Two important probability tools
To make these predictions, we will make use of two rules of probability: the complement rule and the exponent rule. These rules serve as tools that will help us answer the questions above.
The complement rule states that the probabilities of all possible outcomes must sum to 1. This means that for an outcome of interest:
For the earthquake example above, where the probability of a magnitude $\geq$7 earthquake in a given year is 0.15, the probability of not having a magnitude $\geq$7 earthquake is 1 - 0.15 or 0.85.
The exponent rule relates to the probability of multiple independent events of equal probability. For events to be independent, one event occurring does not affect the probability of the other. This is true for the Earth science problems we consider here. The probability of multiple independent events is the product of the probability of each event. Therefore, if the probability of each event is equal, the total probability equals the probability of each event to the power of the number of events.
For the earthquake example, we can use the exponent rule to find the probability of having magnitude $\geq$7 earthquakes in multiple years. For example, where the probability of a magnitude $\geq$7 earthquake in a given year is 0.15, the probability of having a magnitude $\geq$7 earthquake in each of the next 3 years would be 0.153 = 0.0034. If we wanted to compute the probability of having a large earthquake every year over a 10 year period, we would calculate 0.1510 = 5.7 * 10-9.
With these two rules, let's move on to predicting geologic events.
Calculating the probability of an occurrence over a defined interval
As Earth scientists, we are often interested in the probability of an event over some defined interval. This could be a time interval, a spatial interval, or simply a set of experimental observations. For instance, how likely is it that my house will flood during the course of my 30-year mortgage? How likely is it that a new, gloriously cheesy geoscience disaster movie will be released in the next 10 years? To solve these types of questions, we will work through an example.
What are the chances of having a magnitude $\geq$7 earthquake in California in the next 30 years?
San Andreas Fault, California
Provenance: USGS, https://pubs.usgs.gov/gip/dynamic/San_Andreas.html
Reuse: This item is in the public domain and maybe reused freely without restriction.
We can use the probability rules introduced above to break these questions down in terms of the probability of the outcome of interest.
Our approach will use the fact that, by the complement rule, the probability of at least one occurrence of an event over some interval is one minus the probability of no occurrences of that event:
$P(\text{at least one M}\geq\text{7 earthquake in 30 years}) = 1 - P(\text{no M}\geq\text{7 earthquakes in 30 years})$
We can further break this down by remembering that, by the exponent rule, the probability of not observing a large earthquake over 30 years is equal to the probability of not observing a large earthquake in any given year to the power of 30:
$P(\text{at least one M}\geq\text{7 earthquake in 30 years}) = 1 - P(\text{no M}\geq\text{7 earthquakes in 1 year})^{30}$
Finally, again by the complement rule, the probability of not observing a large earthquake in a given year is one minus the probability of observing the earthquake in that year:
$P(\text{at least one M}\geq\text{7 earthquake in 30 years}) = 1 - [1 - P(\text{M }\geq\text{7 earthquake in one year})]^{30}$
The series of steps below outlines how to solve this type of problem.
Step 1. Determine the type of problem.
This type of problem is determining the probability of occurrence over an interval.
Step 2. Determine the probability of occurrence of the outcome of interest.
Based on the fact that there were 23 years with magnitude $\geq$7 earthquakes over the past 150 years , we found above that the probability of a magnitude $\geq$7 earthquake in a given year in California is $\frac{23}{150} = 0.15$.
Step 3. Determine the probability of a non-occurrence outcome using the complement rule.
By the complement rule,
$P(\text{no M}\geq\text{7 earthquakes in 1 year}) = 1 - P(\geq\text{1 M}\geq\text{7 earthquake in 1 year}) = 1 - 0.15 = 0.85$
In a given year, there is an 85% chance of not having a magnitude $\geq$7 earthquake.
Step 4. Determine the probability of non-occurence over the interval using the exponent rule.
By the exponent rule,
$P(\text{no M}\geq\text{7 earthquakes in 30 year}) = P(\text{no M}\geq\text{7 earthquakes in 1 year})^{30} = 0.85^{30} = 0.0076$
This means that there is a 0.76% chance of not experiencing a magnitude $\geq$7 earthquake in a 30 year interval.
Step 5. Determine the probability of at least one occurrence over the interval using the complement rule.
By the complement rule,
$P(\text{at least one M}\geq\text{7 earthquake in 30 years}) = 1 - P(\text{no M}\geq\text{7 earthquakes in 30 years}) = 1 - 0.0076 = 0.9924$
Thus, there is a 99.24% chance of observing a magnitude $\geq$7 earthquake in a 30 year interval. This is a very high value, close to a probability of 1 or 100% chance. This means you can be highly confident that at least one magnitude $\geq$7 earthquake will occur in the next 30 years in California.
How to identify probability of occurrence over an interval problems: These questions ask about the probability of an outcome of interest over a definite number of observations, such as the probability of observing at least one flood in an area over 10 years.
Calculating the interval needed to achieve a certain probability
It can also be useful to determine the interval it would take before a certain probability is reached. For instance, you might be interested in knowing how many mining claims you would need to establish to have an overall >50% probability of finding economic mineral deposits. Or perhaps you are a storm chaser and want to know how many days you should plan for your vacation to have a >50% probability of observing a tornado. Let's work through an example.
Over how many years is there an overall >50% chance of experiencing a magnitude $\geq$7 earthquake in California?
This type of problem is similar to finding the probability of an event over a definite interval, and we will again make use of the complement rule and the exponent rule. The main difference is that in this type of problem, we are given a probability (in this case, 50% or 0.5) and are solving for the length of the interval, which we denote using the variable N . To solve interval needed to achieve a certain probability problems, we use the exponent rule slightly differently:
Provenance: Alexander Tye, Utah Tech University
Reuse: This item is in the public domain and maybe reused freely without restriction.
From here, we can find how to solve for N.
In the example above, we were using a 30 year interval and solving for the probability of observing a magnitude
$\geq$7 earthquake:
$P(\text{at least one M}\geq\text{7 earthquake in 30 years}) = 1 - [1 - P(\text{M }\geq\text{7 earthquake in one year})]^{30}$
In this example, the interval we are interested in is unknown, rather than being fixed at 30 years, and is represented by N. In addition, whereas above we were solving for $P(\text{at least one M}\geq\text{7 earthquake in 30 years})$, we are now given $P(\text{at least one M}\geq\text{7 earthquake in N years})$, which we are told is 0.5 in our example problem. Let's work on finding a general equation for this type of problem by first replacing '30' with 'N':
$P(\text{at least one M}\geq\text{7 earthquake in N years}) = 1 - [1 - P(\text{M }\geq\text{7 earthquake in one year})]^{N}$
To solve for N, we need to rearrange this equation and take the logarithm of both sides:
$P(\text{at least one M}\geq\text{7 earthquake in N years}) - 1 = - [1 - P(\text{M }\geq\text{7 earthquake in one year})]^{N}$
$1 - P(\text{at least one M}\geq\text{7 earthquake in N years}) = [1 - P(\text{M }\geq\text{7 earthquake in one year})]^{N}$
$log_{[1 - P(\text{M }\geq\text{7 earthquake in one year})]}[1 - P(\text{at least one M}\geq\text{7 earthquake in N years})] = N$
Note that to solve for N, $P(\text{at least one M}\geq\text{7 earthquake in N years})$ must be given or assumed. $P(\text{M }\geq\text{7 earthquake in one year})$ has already been calculated from an existing dataset.
The general equation we can use to solve for $N$ in this type of problem is
$N = log_{P(\text{non-occurrence outcome})}(1 - P(\geq\text{1 outcome of interest over interval N}))$
To solve this type of problem, use the following steps:
A FEMA earthquake risk map.
Step 1. Determine the type of problem.
This type of problem is determining the interval needed to achieve a certain probability.
Step 2. Determine the probability of occurrence of the outcome of interest.
This is the same as Step 2 in the previous problem. Based on the fact that there were 23 years with magnitude $\geq$7 earthquakes over the past 150 years, we found above that the probability of at least one magnitude $\geq$7 earthquake in a given year in California is $\frac{23}{150} = 0.15$.
Step 3. Determine the probability of a non-occurrence outcome using the complement rule.
This is the same as
Step 3 in the previous problem. By the complement rule,
$P(\text{no M}\geq\text{7 earthquakes in 1 year}) = 1 - P(\geq\text{1 M}\geq\text{7 earthquake in 1 year}) = 1 - 0.15 = 0.85$
In a given year, there is an 85% chance of not having a magnitude $\geq$7 earthquake.
Step 4. Plug the probability of non-occurrence over an interval and the probability of interest into the logarithm equation we found above.
The non-occurrence probability for this problem is 0.85, meaning that there is an 85% chance of not having an earthquake with magnitude
$\geq$7 in a given year. The problem states that the target probability of observing at least one
$\geq$7 earthquake over the interval is 0.5. Substituting these values into the equation gives
$ N = log_{0.85}(0.50)$
Step 5. Evaluate the logarithm to find N.
The logarithm can be evaluated using a calculator or computer.
You can calculate logarithms using the Google calculator. For example, this is how you would solve
$log_{0.85}(0.50)$. This syntax uses the fact that
$log_ab = \frac{log_{10}b}{log_{10}a}$ (NOTE: REFERENCE LOGARITHM MODULE).
Evaluating this logarithm gives
$N = log_{0.85}(0.50) = 4.3$
Thus, there is a >50% chance of observing a magnitude $\geq$7 earthquake over time intervals of 5 years or greater. We round the decimal answer (4.3) up to the nearest integer (5) because fractional years are not meaningful for this sort of problem. Therefore, rounding the decimal answer up gives the number of whole years over which the probability of observing at least one magnitude $\geq$7 earthquake is >50%.
How to identify interval needed to achieve a certain probability problems: These questions will ask for the interval over which there is a specified probability of observing at least one outcome of interest. Importantly, the length or size of the interval is what is being solved for; it is not given by the problem.
Earth sciences application: Flood probability and prediction
Peak annual discharge measurements from Tymochtee Creek, Ohio, for each year from 1961 to 2023.
In this problem, we want to predict the likelihood of high magnitude flood events, which is important for preparing impacted communities. Flood magnitude is often measured in cubic feet per second (cfs), reflecting the river discharge or the volume of water that passes a point in a given period of time. The plot above shows the maximum (or peak) annual discharge for Tymochtee Creek, Ohio, from 1961 to 2023. You can see that only a few years had flood events that exceeded 7,000 cfs discharge, whereas many years had flood events that exceeded 2,000 cfs discharge. In general, larger floods are rare and smaller floods are relatively common.
Let's imagine that you are a planner assessing flood hazards for the community of Crawford, Ohio, located near Tymochtee Creek. You are tasked with determining some probabilities related to flood magnitudes to guide future development of the community. Assume that discharge over 7,000 cfs corresponds with damaging flooding in this area. The 10 largest peak annual discharges from the 63 year record are shown below.
Table of top 10 peak annual discharge measurements from Tymochtee Creek, Ohio, from 1961-2023. Note that the total flood record covers 63 years, but only the top 10 are shown here for brevity.
Your supervisor wants to know:
(A) What is the probability of having a flood event that is greater than 7,000 cfs in a given year?
USGS technician taking measurements during a major flood event.
Provenance: USGS, https://www.usgs.gov/media/images/usgs-hydro-tech-taking-measurement-during-major-flood
Reuse: This item is in the public domain and maybe reused freely without restriction.
Step 1. Determine the type of problem.
This problem type is the probability of occurrence. You can tell because it asks for the probability of an event over a single unit or observation (in this case, one year).
Step 2. Determine the probability of occurrence.
We calculate the probability of high magnitude flood occurrence by dividing the number of large floods by the number of years. From the table above, floods that exceeded the 7,000 cfs threshold occurred in 2008 (8,220 cfs), 2011 (7,530 cfs), and 2013 (7,280 cfs), meaning that the number of large flood occurrences is 3. We can also see that there are 3 events greater than 7,000 cfs from looking at the plot. There are 63 years in the flood record, so the number of total outcomes is 63. The probability of the outcome of interest is therefore
$P(\text{flood > 7,000 cfs in one year}) = \frac{3}{63} = 0.048$.
This can be calculated using a scientific calculator or Google by typing 3/63 into the calculator or Google search bar.
(B) What is the probability of observing at least one damaging flood ($\geq$7,000 cfs) over 10 years?
We can break this problem into several steps.
Step 1. Determine the type of problem.
This problem type is the probability of occurrence over an interval. You can tell because it asks about the probability of at least one of an event over a specified interval (in this case, 10 years).
Step 2. Determine the probability of occurrence.
This is the same as
Step 2 in the previous problem. We calculate the probability of high magnitude flood occurrence by dividing the number of large floods by the number of years. From the table above, floods that exceeded the 7,000 cfs threshold occurred in 2008 (8,220 cfs), 2011 (7,530 cfs), and 2013 (7,280 cfs), meaning that the number of large flood occurrences is 3. We can also see that there are 3 events greater than 7,000 cfs from looking at the plot. There are 63 years in the flood record, so the number of total outcomes is 63. The probability of the outcome of interest is therefore
$P(\geq\text{1 damaging flood in one year}) = \frac{3}{63} = 0.048$.
This can be calculated using a scientific calculator or Google by typing 3/63 into the calculator or Google search bar.
Step 3. Determine the probability of non-occurrence using the complement rule.
First we need to determine the probability of not observing a major flooding in one year. We can do this by subtracting the probability of a major flood from 1.
$P(\text{no damaging flood in one year}) = 1 - P(\geq\text{1 damaging flood in one year}) = 1 - 0.048 = 0.952$.
So there's a 95% chance of not having a damaging flood in a given year.
Step 4. Determine the probability of non-occurrence over an interval using the exponent rule.
We can determine the likelihood of not experiencing a major flood for 10 years using the following equation:
$P(\text{no damaging floods in 10 years}) = P(\text{no damaging flood in 1 year})^{10} = 0.95^{10} = 0.61$
There is a 61% chance that there will be no damaging flood on Tymochtee Creek in 10 years.
Step 5. Determine the probability of occurrence of at least one event using the complement rule.
Recall that the probability of an occurrence within some interval is related to the non-occurrence within that interval:
$P(\geq\text{1 damaging flood in 10 years}) = 1 - P(\text{no damaging flood in 10 years})$. Using this equation, we can solve for the probability of having a damaging flood event in 10 years:
$P(\geq\text{1 damaging flood in 10 years}) = 1 - (0.95)^{10} = 1 - 0.61 - 0.39$
Thus, there is a 39% chance that a flood of $\geq$7,000 cfs will occur in Tymochtee Creek over a 10 year period.
This equation can be solved for $P(\geq\text{1 damaging flood in 10 years})$ by calculating the successive steps shown above in a scientific calculator or the Google search bar, or the equation 1 - (1 - 0.048)^10 can be entered directly into a calculator or the Google search bar. The carrot (^) signals that the number following it is an exponent of the quantity preceding it. The parentheses ensure that the calculator or Google will evaluate 1 - 0.048 before evaluating the exponent.
A USGS stream gauge that records river discharge.
(C) How many years is the period of time over which the probability of observing at least one damaging flood is greater than $0.5$?
Step 1. Determine the type of problem.
This type problem is determining the interval needed to achieve a certain probability. You can tell because it asks about the length of the interval required to achieve a certain probability of an event.
Step 2. Determine the probability of the event of interest.
This is the same as
Step 2 in the previous problem. As before, we calculate the probability of high magnitude flood occurrence by dividing the number of large floods by the number of years. From the table above, floods that exceeded the 7,000 cfs threshold occurred in 2008 (8,220 cfs), 2011 (7,530 cfs), and 2013 (7,280 cfs), meaning that the number of large flood occurrences is 3. We can also see that there are 3 events greater than 7,000 cfs from looking at the plot. There are 63 years in the flood record, so the number of total outcomes is 63. The probability of the outcome of interest is therefore
$P(\geq\text{1 damaging flood in one year}) = \frac{3}{63} = 0.048$.
This can be calculated using a scientific calculator or Google by typing 3/63 into the calculator or Google search bar.
Step 3. Determine the probability of non-occurrence using the complement rule.
This is the same as
Step 3 in the previous problem. We need to determine the probability of not observing a major flood in one year. We can do this by subtracting the probability of a major flood from 1.
$P(\text{no damaging flood in one year}) = 1 - P(\geq\text{1 damaging flood in one year}) = 1 - 0.048 = 0.952$.
So there's a 95% chance of not having a damaging flood in a given year.
Step 4. Plug the probability of non-occurrence over an interval and the probability of interest into the logarithm equation.
In this question, we need to find the number of years N that gives us a 50% chance of having a damaging flood over N years. To do that, we use the logarithmic relationship derived earlier on this page:
$N = log_{P(\text{non-occurrence outcome})}(1 - P(\geq\text{1 outcome of interest over interval N}))$
In this case, the probability of non-occurrence, or the chance of not having a flood in a given year, is 0.952. Our probability of interest is 0.5, or 50%. We can plug in those values to solve for N:
$N = log_{0.952}(1-0.5) = log_{0.952}(0.5)$
Step 5. Evaluate the logarithm to find N.
Use your favorite calculator or computer to solve the logarithmic expression above:
$N = log_{0.952}(0.5) = 14.10$
We round the fractional answer (14.10) up because a fractional number of years is not meaningful for this problem. This means that within 15 years, there is a 50% chance of experiencing a damaging flood on Tymochtee Creek.
This equation can be solved for N by using a calculator or the Google search bar. The logarithm operation requires a specific syntax for a calculator or Google to solve. This syntax uses the fact that
$log_ab = \frac{log_{10}b}{log_{10}a}$ (NOTE: REFERENCE LOGARITHM MODULE). To solve the equation as a whole, you would enter log(0.5)/log(0.952) into a calculator or the Google search bar.
Where do you use probability in Earth science?
- Flooding, hurricane, and drought prediction
- Earthquake risk analysis
- Mineral exploration
- Geochronology
- Sea level rise scenarios
Next steps
I am ready to PRACTICE!
If you think you have a handle on the steps above, click on this bar to try practice problems with worked answers.
Or, if you want even more practice, see 'More help' below.More help (resources for students)
Pages written by Emma MacKie (University of Florida) and Alex Tye (Utah Tech University).