Influence of Outliers on Correlation
This activity has been undergone anonymous peer review.
This activity was anonymously reviewed by educators with appropriate statistics background according to the CAUSE review criteria for its pedagogic collection.
This page first made public: May 9, 2007
This material is replicated on a number of sites as part of the SERC Pedagogic Service Project
- Identify points that would be considered outliers on a scatterplot.
- Given a scatterplot with an outlier marked, determine if the correlation will increase, decrease or leave the correlation unchanged.
- Students should be able to construct a scatterplot with low correlation coefficient and then add an outlier that will increase it.
- Students should be able to construct a scatterplot with high correlation coefficient and then add an outlier that will decrease it.
Context for Use
Prerequisites for this activity:
- Students would have been taught about the correlation coefficient and seen several examples that match the correlation coefficient with the scatterplot.
- Students will have discussed outliers in a one variable setting.
Description and Teaching Materials
Teaching Notes and Tips
Use the applet is fairly straightforward and outlined here, or can be read on the webpage.
- When the "Add Data" button is selected, click anywhere on the plot area to add a point to the scatterplot. A point may be dragged anywhere on the plot area just by clicking and holding the left mouse button then moving the mouse. To erase a single point just drag the desired point to the trash icon.
- When enough points are placed note the correlation value in upper right portion of the applet. Also you may include the least squares regression line by clicking the "Show Least Squares Line" box. You may also show the x and y means by clicking the "Show mean X and mean Y" box.
- When "Draw line" button is selected, click-drag-release on the plot to draw a line. Drag the end-points of the line to change its slope and drag the center point of the line to move it up or down.
- The blue/green bar above the plot area gives the sum of squares for your drawn line. The smallest sums of squares possible is represent by the blue portion of the bar, so if your line matches that of the least squares regression line all that will be seen is blue, otherwise a green portion will appear representing an increased sums of squares for your drawn line.
- Finally if you wish to start over with new points, simply double click the trash icon.