The Evolution of Pearson’s Correlation Coefficient/Exploring Relationships between Two Quantitative Variables
This activity has been undergone anonymous peer review.
This activity was anonymously reviewed by educators with appropriate statistics background according to the CAUSE review criteria for its pedagogic collection.
This page first made public: Jan 10, 2008
This material is replicated on a number of sites as part of the SERC Pedagogic Service Project
Context for Use
Description and Teaching Materials
Teaching Notes and Tips
The first part of this activity is an interactive lecture using whole group discussion of a scatterplot to understand association. Below is an example of this discussion including scatterplots, questions and prompts.
Today we will examine a problem from anthropometrics, the statistical study of the human body and relationships between difference human characteristics. Specifically, we will explore the following statistical question:
Is there a relationship between arm span and height?
Prompts for discussion"
- What do you think?
- Do short people generally have short arms?
- Can short people have long arms?
- Do tall people generally have long arms?
- Can tall people have short arms?
You might collect data on height and arm span for the students in your class or you can download data (Acrobat (PDF) 24kB Oct18 07) for Example 1.
You can create a scatterplot for the data collected in class or you can download the scatterplot (Acrobat (PDF) 53kB Oct30 07) for Example 1.
Based on the scatterplot how would you describe the relationship between height and arm span? Some specific questions to address include:
- How would you characterize the arm span for the shorter people in this study?
- How would you characterize the arm span for the taller people in this study?
- Based on the scatterplot, is it always true that if one person is taller that another person, that he/she will have longer arms? Explain.
- Is the plot of the data perfectly linear? Is it generally linear?
- How strong is the relationship between height and armspan?
The Quadrant Count Ratio: A First Measure for Strength of Association
The quadrant count ratio (QCR) provides a measure of the strength of association between two quantitative variable. To determine the QCR, the scatterplot of the data is divided into four "quadrants" based on the mean values of the two variables. This idea is illustrated in the scatterplot (Acrobat (PDF) 89kB Oct31 07) for the height-armspan data.
The QCR is defined to be:
[(The Number of Points in Quadrants I and III) - (The Number of Points in the Quadrants II and IV)]/[The Total Number of Points]
From the definition, the value of the QCR is guaranteed to be between -1 and 1, inclusive, and the QCR does not depend on the units of measurement for the two variables.
Additional properties of the QCR are best explored through scatterplots. In the following examples, there is no context. The illustrations are designed simply to demonstrate various properties of the QCR. Each scatterplot has been divided into the four quadrants based on the means. Each scatterplot is followed by questions that should be addressed by students. A discussion related to these questions is provided following each scatterplot. As properties of the QCR evolve, they will be noted.
Example 2 (Acrobat (PDF) 53kB Oct23 07)
If the points are predominately in Quadrants I and III, then the QCR will be positive. If the points are predominately in Quadrants II and IV, then the QCR will be negative.
If the association is positive, then the QCR will be positive. If the association is negative, then the QCR will be negative.
The stronger the association, the closer to ???1 the QCR will be.
Example 3 (Acrobat (PDF) 54kB Oct23 07)
When the association between the two variables is weak, the QCR is close to 0.
When the relationship is perfectly linear, the QCR will be ???1.
When there is a strong positive (negative) association, the QCR is close to +1 (-1). When there is little association, the QCR is close to 0. When all the points are on a line, the QCR will be ???1. Consequently, a QCR close to ???1 suggests a strong association, while a QCR close to 0 suggests a weak association.
Thus, the QCR appears to behave the way we want in terms of characterizing the direction, form, and strength of the relationship between two variables. Unfortunately, since the QCR is a rather crude measure, it does not always provide the information we seek from a correlation coefficient as the following examples illustrate.
A QCR = ???1 does not mean the relationship between the two variables is perfectly linear.
Note that in this example, the data satisfy Y = X^2.
When all the points are in Quadrants I and III then the QCR will be 1. When all the points are in Quadrants II and IV then the QCR will be -1. Consequently, the QCR can be ???1 even when the relationship between Y and X is not exact.
In example 7 two scatterplots are compared to illustrate the primary weakness with the QCR. Note that the two scatterplots have the same scale.
Pearson's Correlation Coefficient
Example 7 Revisited (Acrobat (PDF) 47kB Oct23 07)
This example contrasts Pearson's r with the QCR and illustrates that points in Quadrants I and III have more weight in the determination of r.
Example 1 Revisited (Acrobat (PDF) 44kB Oct23 07)
Example 2 Revisited (Acrobat (PDF) 41kB Oct23 07)
In Example 1, there is a fairly strong positive linear trend, and the r is fairly close to 1.
In Example 2, there is a fairly strong negative linear trend, and the r is fairly close to -1.
Property 1: -1 ≤ r ≤ 1 When the general trend in negative, Pearson's r will be negative. When the general trend in positive, Pearson's r will be positive.
Example 3 Revisited (Acrobat (PDF) 41kB Oct23 07)
There is little trend and r is fairly close to 0. .
When the association between the two variables is weak, then Pearson's r will be close to 0.
Example 4 Revisited (Acrobat (PDF) 40kB Oct23 07)
The relationship is perfectly linear and Pearson's r is 1.
Property 3: When the relationship is perfectly linear then Pearson's r will be ???1. Note that properties 1, 2, and 3 suggest that Pearson's r will always be between—1 and +1, which is true.
In Example 5 there appears to be a perfect relationship (Y = X2), but the relationship is not linear and Pearson's r is not 1.
In Example 6, all the points are in Quadrants I and III; however, the relationship is not perfectly linear. Although the QCR is 1, Pearson's r is less than 1.
Property 4: Pearson's r = ???1 if and only if the relationship between Y and X is perfect linear.
When Pearson's r is positive (negative) this suggests a positive (negative) association.
When Pearson's r is close to 0 this suggests a weak linear relationship.
As Pearson's r moves away from 0 and gets closer to ???1, this suggests a stronger association.
A value of r close to either ???1, suggests a linear relationship. A value of r equal to either extreme, ???1, will occur only if the points are all on a line.
Summary of Activity
This activity provides a developmental sequence for understanding Pearson's correlation coefficient. Pearson's correlation coefficient is a measure of the direction and strength of the linear relationship between two quantitative variables.