Understanding Trends
Related quantitative concepts: Basic graphing skills, Interpolation/Extrapolation, Functions', Graph Significance, Graphing confusion
Understanding Trends
by Jennifer M. Wenner, Geology Department, University of Wisconsin-OshkoshJump down to: Curve Fitting | Calculating Slope | Correlation/Covariance | Teaching Examples and Resources
Essential Concepts
There are four main concepts that students struggle with but are essential for quantitatively teaching about trends on plots and graphs. They include:
- visualizing and describing data and trends,
- visually determining or estimating "best-fit" lines,
- calculating slope ,
- understanding whether a trend exists.
Visualization and description of data
In geoscience, we often present graphs of data and ask students to describe the trend the data shows. Sometimes, as in the case of a linear plot, that information is easy for students to extract from a simple scatter plot. However, many plots in geoscience are far more complicated. They may have an exponential or logarithmic nature or there may be a significant amount of "noise" in the data that students have "filter out" to see the trends.Estimation of trends/"best fit" lines
In introductory geoscience courses, we often expect students to be able to estimate trends in data using scatter plots/graphs. A number of activities in introductory geoscience laboratory manuals require students to plot data and draw a trend in the data (or a "best-fit" line). Most often, line-fitting involves interpolation or is used for extrapolation of data for prediction of future catastrophic events. Students are often confused by the request to sketch in the trend of the data; they are unsure how to construct an appropriate line. Many of them "connect the dots" and try to extrapolate from there (which is darn near impossible).Nearly every widely published introductory geoscience lab manual has an exercise in which students are asked to plot flood recurrence interval and the discharge of floods (flood frequency). Then they are asked to use this information to predict the discharge of a given flood (or determine the recurrence interval of a given discharge). It is essential that students understand the purpose of the assignment (to "predict" or "forecast" the likelihood of a flood).
How do we know whether a flood will hit on a given river? To be honest, we don't.
But we can begin to predict the likelihood of a large flood by understanding the behavior of that river. Today you will plot information about a stream's discharge and the recurrence interval of a flood. When data from the table in your book are plotted, they will make a nice linear array (fall close to a line). You should NOT connect the dots (as shown in the figure on the left). Instead your job is to construct a "best fit line" for the data (shown in the figure to the right) - this line should be straight (in fact, I recommend using a ruler) and should go through the middle of the data. Why doesn't connecting the dots help you? What are we trying to accomplish? We are not interested in the exact value of a given data point, instead we are interested in the relationship of all the data points to one another - the "trend" of the data. We want to be able to use that relationship to extrapolate information - to predict the recurrence interval (or probability) of an even larger flood. In other words, you want to be able to extend the line past the biggest flood that we know about! If you connect the dots, you don't know where to extend the line! It is difficult to construct a perfect best fit line visually. Most spreadsheet programs will construct a line using a mathematical formula. However, it is good practice to see whether you can approximate the trend of a bunch of data points using just your eyes and a ruler! As a general rule, I try to draw the line so that the same number of dots are above the line as are below the line. Because we all see things slightly differently, your line might not be exactly the same as the person next to you. Why not? If your line is different, what will that mean about your answers?This is an excellent opportunity to talk about the propagation of error, the biases of individuals and the need for uniformity in our calculation of trends. It can also serve to show the power of mathematics - if we were all to calculate the trend of the data, we should come up with the same line and thus exactly the same answers when asked to extrapolate. Mathematics can level the playing field so that we all come up with the same answer.
Slope calculation
Once students have approximated a best-fit line, Sometimes we ask them to calcluate the slope of the line or to describe the trend. Most students are familiar with the idea of "rise over run" from high school mathematics. However, if they have constructed their own graph, they may not understand which variable is the "rise" and which is the "run". Or, if they are asked to calculate it using only the data in the tables or using topographic maps and distance data - difficult tasks for students with number phobia or math anxiety. An example of calculation of slope used in introductory geology is the calculation of the slope of the water table using contoured topographic maps.This template can be used any time you have students calculate slope - helping them to see that "rise over run" is just a simple arithmetic calculation. Calculating slope may also help students to visualize and describe a set of data. They are likely familiar with the ideas of positive and negative slope. Giving them a number of representations (graphical, numerical, symbolic) may help them to get better at recognizing and understanding trends in a variety of ways.
Recognizing trends in graphed data
Do all natural data have trends? Geoscientists approximate trends in many data sets using just our eyes and we can certainly describe the graph of a data set (even if it is completely random). Trends in a large data set can be non-linear or approximate some other function (exponential, periodic, etc). However, most often we want students to be able to understand linear trends. Does it matter what kind of data we have? Does increasing the number of data points change our interpretation? Lets look at an example that illustrates pitfalls of random sampling and trying to fit a curve to the data.
If, instead, we take all the data from the Vostok Ice Core, that nicely correlated trend is gone. With more data, the curve takes on a completely different shape. We see that changes in CO2 over the past 160,000 years is far more complicated than indicated on the first plot. We might even try to recognize three distinct trends: An increase from 160-120 kyr, a steady decrease from 120-20 kyear and then a dramatic increase from 20 kyr to present! This is an excellent example of the importance of a significant amount of data.
Examples and resources
- Carbon Dioxide Exercise This exercise by Randy Richardson has students plot random bits of CO2 data from Mauna Loa. He then puts each set of plotted data together to show the importance of sampling interval and the seasonal/photosynthetic variation in CO2 concentration. An in-class activity for introductory courses.
- Regression by Eye This student resource has an applet that allows the student to estimate the trend of a large set of data. The student can draw a "best-fit" line, can estimate the value of r (Pearson's correlation) and can try to minimize the mean square error.
- Regressing a line through Rb-Sr data - an isochron exercise Science Courseware's Geology Labs Online has a unit on virtual dating that allows a student to work on regressing a line visually (it also allows you to change the intercept and shows error). An excellent resource for students to work on best-fit lines.