Loading [a11y]/accessibility-menu.js

Skip to Main Content Skip to Navigation

Linear Regression - Practice Problems

Initial Publication Date: August 11, 2023

Solving Earth science problems with regression

This module is available for public use, but it is undergoing revision after classroom implementation with the Math Your Earth Science Majors Need project.

Working with ecological data

Ecology explores relationships between organisms and other living (biotic) things or nonliving (abiotic) components in their environment. Certain factors may impact the abundance, distribution, or physiology of organisms, including abiotic factors such as temperature, moisture, or sunlight, and biotic factors such as the presence of predators or competitors. Linear regressions can be used to quantify these relationships and predict organism responses to various levels of the abiotic or biotic factor.

Problem 1: Ecologists tested the grazing pressure of green crabs on clams. They constructed `1 m^2` cages and planted 300 clams in each cage. Two days later they counted the number of remaining clams and recorded the data in the data below.

Number of Crabs	Number of Clams Remaining
2	137
4	70
2	184
5	0
4	35
0	297
3	122
5	1
1	253
3	150

Problem 1A: First, perform a linear regression using the step-by-step instructions for calculating `m` (slope) and `b` (intercept) of the regression line. What is the full equation for the regression line that you calculated?

Step 1. Decide which variable is the independent variable and which is the dependent variable.

Step 2. Calculate `x^2` for every value of `x`, and `y^2` for every value of `y`. For each pair of values, calculate `x xx y`.

Step 3. Calculate `Sigmax`, `Sigmay`, `Sigma(x^2)`, `Sigma(y^2)`, and `Sigma(x xx y)`.

`Sigmax = 2 + 4 + 2 + 5 + 4 + 0 + 3 + 5 + 1 + 3 = 29`

`Sigmay = 137 + 70 + 184 + 0 + 35 + 297 + 122 + 1 + 253 + 150 = 1249`

`Sigma(x^2) = 4 + 16 + 4 + 25 + 16 + 0 + 9 + 25 + 1 + 9 = 109`

`Sigma(y^2) = 18769 + 4900 + 33856 + 0 + 1225 + 88209 + 14884 + 1 + 64009 + 22500 = 248353`

`Sigma(x xx y) = 274 + 280 + 368 + 0 + 140 + 0 + 366 + 5 + 253 + 450 = 2136`

`Sigmax`	`Sigmay`	`Sigma(x^2)`	`Sigma(y^2)`	`Sigma(x xx y)`
29	1249	109	248,353	2136

Step 4. Familiarize yourself with the equation of a line: `y=mx+b`.

Step 5. Calculate m (slope): `(n(Sigma(x xx y)) – (Sigmax)(Sigmay))/(n(Sigma(x^2)) – (Sigmax)^2)` .

To simplify this calculation, let's start with the numerator: `(n(Sigma(x xx y)) – (Sigmax)(Sigmay))`

Remember n! In this example, `n = 10` because we have 10 values of `x` and 10 values of `y` .

numerator = `(10(2136) – (29)(1249))`

numerator = `(21360 – 36221)`

numerator = `-14861`

Now we can calculate the bottom half of the equation (this is called the denominator): `(n(Sigma(x^2)) – (Sigmax)^2)`

denominator = `(10(109) – (29)^2)`

denominator = `(1090 – 841)`

denominator = `249`

To finish the calculation, divide the numerator by the denominator:

`m = -14861/249 = -59.683`

Here, we have rounded the actual answer from a long decimal string (-59.68273092...) to -59.683.

Step 6. Calculate b (intercept): `(Sigmay – m(Sigmax))/(n)`

Again we can start on the numerator (top half of the equation): `(Sigmay – m(Sigmax))`

numerator = `(1249 – (-59.683)(29))`

numerator = `(1249 – (-1730.807))`

numerator = `2979.807`

Next, we would calculate the denominator (bottom half of the equation), but this is just `n` !

denominator = `10`

To finish the calculation, divide the numerator by the denominator:

`b = 2979.807/10 = 297.98`

Again, we have rounded the actual answer from a longer decimal (297.9807) to 297.98.

Step 7. Replace 'b' and 'm' in line equation: `y=mx+b`.

Problem 1B: Next, run the linear regression statistics using Excel's Data Analysis Toolpak. Do the values given for `m` and `b` match the values that you calculated in Part A?

Step 1. Enter your data into an Excel spreadsheet in two columns.
Optional: to help visualize the relationship between the variables, create a scatterplot using the Excel graph feature and insert a trendline. A trendline shows you a best fit line for the data on the chart.

Show me how to create a scatterplot with a trend line

Create the scatterplot: Select both columns of your data. In the Excel menu, click on "Insert" and then the chart feature and select the scatter option (with no connecting lines). In the resulting chart, click the chart area to display the Chart Design and Format tabs in the top menu. Update the chart title, axis labels, and more.

Add the trendline: Select the chart area again. Click on "Chart Design" then "Add Chart Element." Select Trendline --> Linear.

Step 2. Use the Regression tool on your data.

Show me how to use this tool

In the pop-up menu, select "Regression." Click in the box "Input Y range." Then highlight the `y` values in your spreadsheet. Then click in the box "Input X range." Then highlight the `x` values in your spreadsheet. Click the "Output Range" circle, then click in the box to the right of the words "output range." In your spreadsheet, click in the upper left corner of a blank area where you want to place the regression output. Finish by clicking "Ok."

Step 3. Determine the line equation for your data. Write it in the form of `y=mx+b` and look for the `R^2` value.

Show me how to get the line equation and `R^2`

The Summary Output will display three tables of results: Regression Statistics, ANOVA, and a third unlabeled table. The slope, `m`, is in the third table, in the row labeled X Variable and the Coefficients column. The `y` intercept is the value in the Intercept row under Coefficients. The `R^2` value is the second line in the Regression Statistics table (labeled R square).

The line equation is `y = -59.682x + 297.98`.
`R^2 = 0.96` which indicates a relatively strong correlation. It is important to note here, that the relationship is negative (we can see this if we graph the data, or determine this from the negative slope), but since `R^2` is a squared value, it will always be positive.

Step 4. Check to see if this line equation matches what you calculated by the least squares method above. Note that the values of `m` and `b` may differ slightly due to different rounding within the calculations, but they should be close. Do they match? Hurray! You now know how to do a linear regression with two different methods.

Determining a standard (calibration) curve

Earth scientists often try to measure concentrations of chemicals in waters, soils, sediments, rocks, or biological things. Often what is measured must be compared to known samples, called standards. This is accomplished by creating a standard (calibration) curve. Standard curves are not usually curves! They are graphs of data points of measurements from an instrument (on the y axis) based on known concentrations of chemicals in various samples. The concentrations and measurements will ideally have a linear relationship, which can be determined by a linear regression. The line equation can then be used to figure out the concentrations of unknown samples that are analyzed by your instrument.

Problem 2A: You want to analyze some stream water samples for copper to see if an active mine is affecting the water quality. You can use an instrument called an atomic absorption spectrometer (AAS) with a light wavelength of 420 nm for this. You create 4 standards with known amount of copper in them. The AAS then measures how much light is absorbed by each standard. Beer's Law states that the amount of light absorbed (absorbance) is linearly related to the concentration of copper in each standard and sample.

The table below shows the data from your AAS. Analyze the data with a linear regression to determine the line equation for your standard curve.

Concentration (mg/L)	Absorbance
0	0.003
0.2	0.033
0.4	0.065
0.6	0.098
0.8	0.125

Show me how to get the answer

Step 1. Copy the data into an Excel spreadsheet in two columns, then decide which variable is the independent variable and which is the dependent variable. Then insert a chart (graph) and add a trendline.

Show me the answer

Steps 2 and 3. The Data Analysis Toolpak should already be activated and ready for you to use the Regression analysis. Be sure to select the Absorbance column data for the box "Input Y range." Then select the data in the concentration column for the box "Input X range." Click the "Output Range" circle, then click in the box to the right of the words "output range." In your spreadsheet, click in the upper left corner of a blank area where you want to place the regression output. Finish by clicking "Ok."

Show me the answer

Step 4. Extract the `m` and `b` values from the regression analysis to get your line equation in the form of `y = mx + b`.

Show me the answer

Problem 2B: You analyzed two water sample for copper with your spectrophotometer. The absorbance for the Rabbit Run stream water is 0.114 and the absorbance for the Mill Creek stream water is 0.078. What are the copper concentrations in these samples?

Show me how to get the answer

You need to use the line equation you got in Part 2A. You will solve for `x` by substituting the given absorption values for `y`.

Show me the answer

Geochemical variation diagrams

Harker diagrams are geochemical variation diagrams commonly used in Earth science to represent the chemical constituents in a rock as a proportion of silica `(SiO_(2))`. Some of these relationships are linear and can be represented with linear regression.

Problem 3: The figure to the right shows some examples of Harker Diagrams from Montserrat, Lesser Antilles volcanic arc. You are given some of the data for `CaO` and `SiO_(2)` in the table below. Units are weight percent (wt %).

SiO₂ (wt %)	CaO (wt %)
52	8
69	2
56	7
53	8
62	5
74	1
60	5
53	9
47	12
55	9

Problem 3A: First, perform a linear regression using the step-by-step instructions for calculating `m` (slope) and `b` (intercept) of the regression line. What is the full equation for the regression line that you calculated?

Step 1. Decide which variable is the independent variable and which is the dependent variable.

Step 2. Calculate `x^2` for every value of `x`, and `y^2` for every value of `y`. For each pair of values, calculate `x xx y`.

Step 3. Calculate `Sigmax`, `Sigmay`, `Sigma(x^2)`, `Sigma(y^2)`, and `Sigma(x xx y)`.

`Sigmax = 2 + 4 + 2 + 5 + 4 + 0 + 3 + 5 + 1 + 3 = 29`

`Sigmay = 137 + 70 + 184 + 0 + 35 + 297 + 122 + 1 + 253 + 150 = 1249`

`Sigma(x^2) = 4 + 16 + 4 + 25 + 16 + 0 + 9 + 25 + 1 + 9 = 109`

`Sigma(y^2) = 18769 + 4900 + 33856 + 0 + 1225 + 88209 + 14884 + 1 + 64009 + 22500 = 248353`

`Sigma(x xx y) = 274 + 280 + 368 + 0 + 140 + 0 + 366 + 5 + 253 + 450 = 2136`

`Sigmax`	`Sigmay`	`Sigma(x^2)`	`Sigma(y^2)`	`Sigma(x xx y)`
581	66	34373	538	3590

Step 4. Familiarize yourself with the equation of a line: `y=mx+b`.

Step 5. Calculate m (slope): `(n(Sigma(x xx y)) – (Sigmax)(Sigmay))/(n(Sigma(x^2)) – (Sigmax)^2)` .

To simplify this calculation, let's start with the numerator: `(n(Sigma(x xx y)) – (Sigmax)(Sigmay))`

Remember n! In this example, `n = 10` because we have 10 values of `x` and 10 values of `y` .

numerator = `(10(3590) – (581)(66))`

numerator = `(35900 – 38346)`

numerator = `-2446`

Now we can calculate the bottom half of the equation (this is called the denominator): `(n(Sigma(x^2)) – (Sigmax)^2)`

denominator = `(10(34373) – (581)^2)`

denominator = `(343730 – 337561)`

denominator = `6169`

To finish the calculation, divide the numerator by the denominator:

`m = -2446/6169 = -0.396`

Here, we have rounded the actual answer from a long decimal string (-0.39649862...) to -0.396.

Step 6. Calculate b (intercept): `(Sigmay – m(Sigmax))/(n)`

Again we can start on the numerator (top half of the equation): `(Sigmay – m(Sigmax))`

numerator = `(66 – (-0.396)(581))`

numerator = `(1249 – (-230.366))`

numerator = `296.366`

Next, we would calculate the denominator (bottom half of the equation), but this is just `n` !

denominator = `10`

To finish the calculation, divide the numerator by the denominator:

`b = 296.366/10 = 29.64`

Again, we have rounded the actual answer from a longer decimal (29.6365699) to 29.64.

Step 7. Replace 'b' and 'm' in line equation: `y=mx+b`.

Problem 3B: Next, run the linear regression statistics using Excel's Data Analysis Toolpak. Do the values given for `m` and `b` match the values that you calculated in Problem 3A?

Step 1. Enter your data into an Excel spreadsheet in two columns.
Optional: to help visualize the relationship between the variables, create a scatterplot using the Excel graph feature and insert a trendline. A trendline shows you a best fit line for the data on the chart.

Show me how to create a scatterplot with a trend line

Create the scatterplot: Select both columns of your data. In the Excel menu, click on "Insert" and then the chart feature and select the scatter option (with no connecting lines). In the resulting chart, click the chart area to display the Chart Design and Format tabs in the top menu. Update the chart title, axis labels, and more.

Add the trendline: Select the chart area again. Click on "Chart Design" then "Add Chart Element." Select Trendline --> Linear.

Step 2. Use the Regression tool on your data.

Show me how to use this tool

In the pop-up menu, select "Regression." Click in the box "Input Y range." Then highlight the `y` values in your spreadsheet. Then click in the box "Input X range." Then highlight the `x` values in your spreadsheet. Click the "Output Range" circle, then click in the box to the right of the words "output range." In your spreadsheet, click in the upper left corner of a blank area where you want to place the regression output. Finish by clicking "Ok."

Step 3. Determine the line equation for your data. Write it in the form of `y=mx+b` and look for the `R^2` value.

Show me how to get the line equation and `R^2`

The Summary Output will display three tables of results: Regression Statistics, ANOVA, and a third unlabeled table. The slope, `m`, is in the third table, in the row labeled X Variable and the Coefficients column. The `y` intercept is the value in the Intercept row under Coefficients. The `R^2` value is the second line in the Regression Statistics table (labeled R square).

The line equation is `y = -0.396x + 29.64`.
`R^2 = 0.95` which indicates a relatively strong correlation. It is important to note here, that the relationship is negative (we can see this if we graph the data, or determine this from the negative slope), but since `R^2` is a squared value, it will always be positive.

Step 4. Check to see if this line equation matches what you calculated by the least squares method above. Note that the values of `m` and `b` may differ slightly due to different rounding within the calculations, but they should be close.

Changes over time

Many things change over time- trees get taller, the earth's plates move, water evaporates. But are these changes linear? If these things change by a constant amount with each time period, then there's a linear correlation with time as the independent variable. If it increases consistently over time, that's a positive correlation; if it decreases consistently over time, that's a negative correlation. Linear regression is used to model these changes over time- and to help us make predictions about past or future time points.

Problem 4A: When glaciers retreat, they leave behind bare land with little to no soil left. Often loose, unconsolidated material called till is left behind and it will slowly become soil. About 11,000 years ago, glaciers retreated from Wisconsin, and soil has been forming since then. In the year 2000 in one remote area, the soil thickness was measured to be 32 inches. Scientists used carbon-14 dating to estimate that the soil thickness after 5000 years was 13 inches and increased to 30 inches after 10,000 years. Using these four time points (call the glacial retreat time 0 and today 11,000 years), determine the line equation that relates time and soil thickness.

Show me how to get the line equation

Step 1. Enter the data into an Excel spreadsheet in two columns, then insert a chart (graph) and add a trendline.

Steps 2 and 3. The Data Analysis Toolpak should already be activated and ready for you to use the Regression analysis. Select the Soil Depth column data for the box "Input Y range." Then select the data in the time column for the box "Input X range."

Show me the answer

Step 4. Extract the `m` and `b` values from the regression analysis to get your line equation in the form of `y = mx + b`.

Show me the answer

Problem 4B: How long does this model predict it would take to form a new inch of soil in this area of Wisconsin?

Show me how to calculate this

Step 1. Determine your unknown and known values.

Step 2. Put those values into the line equation from Part 4a.

Next steps

TAKE THE QUIZ!!

I think I'm competent with linear regression and I am ready to take the quiz! This link takes you to WAMAP. If your instructor has not given you instructions about WAMAP, you may not have to take the quiz.

Or you can go back to the Linear Regression explanation page.

« Previous Page Next Page »