Related quantitative concepts: Basic graphing skills, Interpolation/Extrapolation, Functions', Graph Significance, Graphing confusion
by Jennifer M. Wenner, Geology Department, University of Wisconsin-Oshkosh
Jump down to: Curve Fitting | Calculating Slope | Correlation/Covariance | Teaching Examples and Resources
There are four main concepts that students struggle with but are essential for quantitatively teaching about trends on plots and graphs. They include:
- visualizing and describing data and trends,
- visually determining or estimating "best-fit" lines,
- calculating slope ,
- understanding whether a trend exists.
Visualization and description of data
In geoscience, we often present graphs of data and ask students to describe the trend the data shows. Sometimes, as in the case of a linear plot, that information is easy for students to extract from a simple scatter plot. However, many plots in geoscience are far more complicated. They may have an exponential or logarithmic nature or there may be a significant amount of "noise" in the data that students have "filter out" to see the trends.
An example of data with "noise" is climate data - a topic of great interest to many students and faculty. The plot at the right shows data from the Vostok ice core. We can measure changes in CO2 in the atmosphere for thousands of years by looking at the composition of air caught in layer upon layer of ice. From that information and information from the composition of isotopic carbon, we can infer changes in temperature for a over 100 thousand years. Many geoscientists look at this plot and say, "Both temperature and CO2 decreased from the period 140 ka to 20 ka, and have rapidly and steadily increased in the last 20 ka." Students on the other hand could recognize the trend from 20ka to present but might have a difficult time recognizing the "trend" for 140-20 ka - they might say that the data is "all over the place" or "goes down and then up and then down for a little while and then up....".
Estimation of trends/"best fit" lines
In introductory geoscience courses, we often expect students to be able to estimate trends in data using scatter plots/graphs. A number of activities in introductory geoscience laboratory manuals require students to plot data and draw a trend in the data (or a "best-fit" line). Most often, line-fitting involves interpolation or is used for extrapolation of data for prediction of future catastrophic events. Students are often confused by the request to sketch in the trend of the data; they are unsure how to construct an appropriate line. Many of them "connect the dots" and try to extrapolate from there (which is darn near impossible).
Nearly every widely published introductory geoscience lab manual has an exercise in which students are asked to plot flood recurrence interval and the discharge of floods (flood frequency). Then they are asked to use this information to predict the discharge of a given flood (or determine the recurrence interval of a given discharge). It is essential that students understand the purpose of the assignment (to "predict" or "forecast" the likelihood of a flood).
How do we know whether a flood will hit on a given river? To be honest, we don't.
But we can begin to predict the likelihood of a large flood by understanding the behavior of that river. Today you will plot information about a stream's discharge and the recurrence interval of a flood. When data from the table in your book are plotted, they will make a nice linear array (fall close to a line). You should NOT connect the dots (as shown in the figure on the left). Instead your job is to construct a "best fit line" for the data (shown in the figure to the right) - this line should be straight (in fact, I recommend using a ruler) and should go through the middle of the data.
Why doesn't connecting the dots help you? What are we trying to accomplish? We are not interested in the exact value of a given data point, instead we are interested in the relationship of all the data points to one another - the "trend" of the data. We want to be able to use that relationship to extrapolate information - to predict the recurrence interval (or probability) of an even larger flood. In other words, you want to be able to extend the line past the biggest flood that we know about! If you connect the dots, you don't know where to extend the line!
It is difficult to construct a perfect best fit line visually. Most spreadsheet programs will construct a line using a mathematical formula. However, it is good practice to see whether you can approximate the trend of a bunch of data points using just your eyes and a ruler! As a general rule, I try to draw the line so that the same number of dots are above the line as are below the line. Because we all see things slightly differently, your line might not be exactly the same as the person next to you. Why not? If your line is different, what will that mean about your answers?
This is an excellent opportunity to talk about the propagation of error, the biases of individuals and the need for uniformity in our calculation of trends. It can also serve to show the power of mathematics - if we were all to calculate the trend of the data, we should come up with the same line and thus exactly the same answers when asked to extrapolate. Mathematics can level the playing field so that we all come up with the same answer.
Once students have approximated a best-fit line, Sometimes we ask them to calcluate the slope of the line or to describe the trend. Most students are familiar with the idea of "rise over run" from high school mathematics. However, if they have constructed their own graph, they may not understand which variable is the "rise" and which is the "run". Or, if they are asked to calculate it using only the data in the tables or using topographic maps and distance data - difficult tasks for students with number phobia or math anxiety. An example of calculation of slope used in introductory geology is the calculation of the slope of the water table using contoured topographic maps.
We want to calculate the slope of a hillside (or the water table). How are we going to do that? Does everyone remember the phrase, "rise over run" as a guideline for calculating slope? What does that phrase mean? We can use it on a graph, where rise is the difference in extreme values on the vertical axis (y-axis) and run is the difference on the horizontal (x-) axis. In mathematics, we write it as the change in y over the change in x or Δ
x. How do we do it with information on a topographic map (or a contour map)? Let's think about the kind of information represented on a topographic map. We have elevation information (that can be our rise, right - it represents a change vertically. We also have a scale that tells us about distance (the horizontal value or x-axis). Thinking about this information, can you calculate a slope? What are the units on your slope? (Feet per mile? meters per km?) Can you calculate your slope so that it is unitless?
This template can be used any time you have students calculate slope - helping them to see that "rise over run" is just a simple arithmetic calculation. Calculating slope may also help students to visualize and describe a set of data. They are likely familiar with the ideas of positive and negative slope. Giving them a number of representations (graphical, numerical, symbolic) may help them to get better at recognizing and understanding trends in a variety of ways.
Recognizing trends in graphed data
Do all natural data have trends? Geoscientists approximate trends in many data sets using just our eyes and we can certainly describe the graph of a data set (even if it is completely random). Trends in a large data set can be non-linear or approximate some other function (exponential, periodic, etc). However, most often we want students to be able to understand linear trends. Does it matter what kind of data we have? Does increasing the number of data points change our interpretation? Lets look at an example that illustrates pitfalls of random sampling and trying to fit a curve to the data.
I am going to give you some CO2
data from the Vostok Ice core (from the NOAA website (more info)
). This data goes back about 160,000 years and can tell us about the "hothouse" conditions of long ago.
If we plot a random sample of the data with points from close to the present to about 160,000 years ago, we might say that carbon dioxide has steadily decreased over the past 160 kyr. We might say that the data has a slightly positive slope. And the correlation (as y increases, x increases) is pretty good (not great, but good). Is this what we have heard about global warming? What does the entire data set tell us?
If, instead, we take all the data from the Vostok Ice Core, that nicely correlated trend is gone. With more data, the curve takes on a completely different shape. We see that changes in CO2
over the past 160,000 years is far more complicated than indicated on the first plot. We might even try to recognize three distinct trends: An increase from 160-120 kyr, a steady decrease from 120-20 kyear and then a dramatic increase from 20 kyr to present! This is an excellent example of the importance of a significant amount of data.
Sometimes a set of data is completely random (or not complete enough to evaluate the true trend. Helping students to see that they can rely on their eyes to evaluate trends and that it is perfectly reasonable to suggest that there is NO trend in some data. They often need to increase their self-confidence in terms of quantitative skills. Reinforcing their inate abilities is key to helping students to understand trends.
Examples and resources
- Carbon Dioxide Exercise
This exercise by Randy Richardson has students plot random bits of CO2 data from Mauna Loa. He then puts each set of plotted data together to show the importance of sampling interval and the seasonal/photosynthetic variation in CO2 concentration. An in-class activity for introductory courses.
- Regression by Eye
This student resource has an applet that allows the student to estimate the trend of a large set of data. The student can draw a "best-fit" line, can estimate the value of r (Pearson's correlation) and can try to minimize the mean square error.
- Regressing a line through Rb-Sr data - an isochron exercise
Science Courseware's Geology Labs Online has a unit on virtual dating that allows a student to work on regressing a line visually (it also allows you to change the intercept and shows error). An excellent resource for students to work on best-fit lines.