How Do I Construct a Straight Line through Data Points?
Best-Fit Lines
Best-Fit lines Can Also Be Called:
Linear regression
Trend lines
Most scientists use a computer program to plot a best-fit line for a set of data, but constructing one for yourself is a good way to learn how it's done. Because a computer isn't doing it, you may find that your "best-fit" line is slightly different from your lab partner's. In most cases, that is okay, as long as you've mimicked the trend of the data.
Why (and When) Should I Use a Best-Fit Line?
In introductory geoscience, most exercises that ask you to construct a best-fit line have to do with wanting to be able recognize relationships among variables on Earth or to predict the behavior of a system (in this case the Earth system). We want to know if there is a relationship between the amount of nitrogen in the water and the intensity of an algal bloom, or we wish to know the relationship of one chemical component of a rock to another. For predictive purposes, we might prefer to know how often an earthquake is likely to occur on a particular fault or the possibility of a very large flood on a given river. All of these applications use best-fit lines on scatter plots (x-y graphs with just data points, no lines).If you find yourself faced with a question that asks you to draw a trend line, linear regression, or best-fit line, you are most certainly being asked to draw a line through data points on a scatter plot. You may also be asked to approximate the trend, or sketch in a line that mimics the data. This page is designed to help you complete any of these types of questions. Work through it and the sample problems if you are unsure of how to complete questions about trends and best-fit lines.
How Do I Construct a Best-Fit Line?
A best-fit line is meant to mimic the trend of the data. In many cases, the line may not pass through very many of the plotted points. Instead, the idea is to get a line that has equal numbers of points on either side. Most people start by eyeballing the data.
- Take a look at the data and ask yourself these questions:
- Does the data look like a line? or a big blob? Try to approximate the general trend of the data with your mind (even if it's just a blob)
- Does the trend of the points look positively correlated (like they rise up to the right; click on image at right) or negatively correlated (like they start high near the x-axis and get lower as they approach the y-axis; see image to the left)? Your trend line (when you're finished with the next steps) should mimic those correlations.
- If you blur your eyes, can you see a thick line trending in one direction or another? This is another way to visualize the trend of the data.
- Now that you have an idea of the general trend of the data, there are two possible ways to construct a best-fit line by eye. You may use either of them; both are correct and relatively easy ways to get a pretty accurate representation of a best-fit line. Pick the one that makes the most sense to you. The first method involves enclosing the data in an area:
- Begin by plotting all your data. For this example, we will use some geochemical data from Lassen Peak, a volcano in Northern California that last erupted in 1915 (the data was collected by an undergraduate research student at University of Wisconsin, Oshkosh!). Here is a plot of sodium oxide (Na2O) vs. silica (SiO2) from the 1915 eruption of Lassen Peak. You can download and print this plot (Acrobat (PDF) 171kB Aug27 08) to use with this exercise.
- Draw a shape that encloses all of the data (try to make it smooth and relatively even).
- Draw a line that divides the area that encloses the data into two even-sized areas. In other words, bisect the area with a line that goes from one edge of the plot to the other.
- Congratulations! You have just constructed a best-fit line through the data!
- We begin by plotting Al2O3 data vs. SiO2. You can download and print this plot (Acrobat (PDF) 164kB Aug27 08) to use while you work through this exercise.
- Draw a shape that encloses all of the data.
- Draw a line that divides the area that encloses the data into two even-sized areas. In other words, bisect the area with a line that goes from one edge of the plot to the other.
- Begin by plotting all of your data. For this exercise, we'll use the Na2O data from above.
- Draw a dotted line that divides the data into two (even numbers of points on either side of the line)
- Place an x (or a + or a dot) in your interpretation of the center of the data on either side of the line.
- Begin by plotting all your data. For this example, we will use some geochemical data from Lassen Peak, a volcano in Northern California that last erupted in 1915 (the data was collected by an undergraduate research student at University of Wisconsin, Oshkosh!). Here is a plot of sodium oxide (Na2O) vs. silica (SiO2) from the 1915 eruption of Lassen Peak. You can download and print this plot (Acrobat (PDF) 171kB Aug27 08) to use with this exercise.