Guiding students through linear regression calculations
An instructor's guide to linear regression
Dr. Melanie Szulczewski (University of Mary Washington)
Dr. Laura Treible (Savannah State University)
What should students get out of this module?
After completing this module, a student should be able to:
- Show how a least squares calculation results in a linear regression line equation
- Calculate the slope (`m`) and intercept (`b`) of a linear regression when given a dataset of two paired variables
- Use a spreadsheet program (Excel, Google Sheets, etc.) to perform a linear regression
- Predict the value of an unknown variable for a given value of another variable using a linear regression line equation
Why are these math skills challenging to incorporate into courses?
Challenge: Students don't always understand what linear regressions really show and why they are important. Tied to this is realizing if their line equation or unknown solution actually makes sense.
Challenge: Using linear regression to predict values of one variable based on given or known values of another variable, when the given is a `y` and the unknown to solve for is an `x` .
For example, after making a standard curve (e.g., absorbance vs. concentration) to determine concentration of a chemical and performing a linear regression, students should be able to use the line equation to determine the concentration in an unknown solution with a measured absorbance. Students tend to automatically put the known value in for `x` in the equation `y=mx+b` instead of putting the value in for `y` and solving for `x` .
Challenge: Basic spreadsheet organization, including data formatting and graph labeling.
For example, students may enter data into spreadsheets in rows instead of columns, which makes visualization and analyses more challenging.
What we don't include in the page?
- It is important that students understand the line equation is a best fit and predictive, but may not exactly reflect "real" life or actual data points. This can be seen with the initial volcano example, where the calculated y intercept is negative (you can't have a negative age!).
- As noted, the desired value of the correlation coefficient (`R^2`) varies with context and discipline, so every instructor should let students know what is considered a "good" (`R^2`) value. And be sure to emphasize that "correlation does not equal causation."
- Other statistical results are provided with the linear regression. The `p` value, for example, is an important qualifier of the correlation coefficient (`R^2`), but its meaning is not discussed here.
- Log transformations are often used in Earth sciences data. The interpretation of the resulting linear regression data is more complex than explained in this module.
- Using Google Sheets for linear regression can be done, but the methods vary from those described for Excel. There is no data analysis tool, but regression statistics can be calculated with a function called 'linest' (this also works in Excel, but the Data Analysis Toolpak is more straightforward). The Summary Output with this method shows unlabeled results. To perform a linear regression in Google Sheets with the 'linest' function and see annotations for the results, follow these instructions.
Instructor resources
Support for teaching this quantitative skill
- Project Eddie (Environmental Data-Driven Inquiry and Exploration) from SERC provides a lot of classroom resources, including statistical vignettes for instructors and classroom lessons and activities on Linear Regression and the Correlation Coefficient.
- What is linear regression? in an IBM webpage that provides a clear explanation of what linear regression is, why it's important, and what the assumptions are. A diversity of examples are listed.
Examples of activities that use this quantitative skill
- This "Arctic Sea Ice Extent" activity uses data from the National Snow and Ice Data Center to calculate a linear regression and make predictions.
- This "What is the Relationship between Lava Flow Length and Effusion Rate at Mt Etna? activity work with a log-log linear relationship using data on lava flow length and effusion rate.
- This "From Isotopes to Temperature: Working With A Temperature Equation" activity includes multiple analyses, including a linear regression, while looking at a dataset of ocean water temperature and oxygen isotope values of two coral species.