National Numeracy Network > Teaching Resources > Teaching with Data Simulations > Examples > Simulating a P-value for Testing a Correlation with Fathom

Simulating a P-value for Testing a Correlation with Fathom

Robin H. Lock, St. Lawrence University
Author Profile

This activity has been undergone anonymous peer review.

This activity was anonymously reviewed by educators with appropriate statistics background according to the CAUSE review criteria for its pedagogic collection.


This page first made public: May 17, 2007

This material was originally developed through CAUSE
as part of its collaboration with the SERC Pedagogic Service.

Summary

Students use simulation to test whether the capacity of major league baseball parks and average attendance at games have a positive association. After creating a plot and finding the correlation for a sample consisting of values for all teams in the 2006 season, students use the Fathom software package to scramble the capacities to see how the sample correlation behaves when there is no association between the variables.

Scatterplot of Ballpark Capacity vs. Attendance

Learning Goals

The main goal is to give students experience with seeing the p-value of a hypothesis test as the chance, when the null hypothesis is true, of seeing data as extreme (or more extreme) than the data observed in an original sample.

Context for Use

This activity is designed to help students understand the idea of a p-value within the context of hypothesis testing. It assumes that students are already familiar with the idea of correlation as a measure of association between two quantitative variables and have had some experience with setting up a null and alternative hypothesis. Otherwise it could be situated at any point within the development of the ideas of hypothesis testing - including as an early activity before seeing a standardized test statistic. Ideally students (individually or in groups) need access to computers, although the activity can also be adapted as a classroom demonstration from an instructor's station. The instruction handout is written assuming students will be using Fathom as the software package - but might be modified for other software that supports the operations to permute the data and collect the sample correlations. Assuming students are already somewhat familiar with the software, the activity takes about 15-20 minutes.

Description and Teaching Materials

The instructions on the handout walk students through what amounts to an approximate permutation test for a correlation. Data are on ballpark capacity and average attendance for all teams in one season are provided as a Fathom file. Students start by using the data to examine a plot of capacity vs. average attendance and compute the sample correlation (thinking of these data as a sample from all teams and seasons). Is this correlation indicative of a clear positive association between capacity and attendance (Ha) or could the variables be unrelated (Ho) and still produce this large a correlation by chance? To investigate this question, students use Fathom to create a new dataset where the capacities are scrambled to have no association with the attendance values. They record the correlation for the scrambled sample and then re-scramble several more times and note the correlation each time. Once they've got a feel for how the scrambling works to produce correlations based on a null hypothesis of no association, students use Fathom to automate the process and collect the correlations for 1,000 simulated re-scramblings. They view a plot of the simulated correlations and find that very few are beyond the correlation found in the original sample. By counting these "extreme" correlations, they may compute an approximate p-value for the chance of seeing a correlation as large as was observed in the original sample when there actually is no association between the variables. This value should be quite small, leading to a conclusion that a belief that capacity and attendance are unrelated is not very reasonable. Handout for students to work through (Microsoft Word 63kB May3 07) Fathom file with Ballpark data ( 8kB May3 07) Ballpark data as a (tab delimited) text file ( 710bytes May3 07)

Teaching Notes and Tips

Assessment

Formal: A multiple choice exam question asks for interpretation of a p-value giving several of the standard misconceptions (e.g. probability the Ho is true) as possible answers.
Informal: Ask students when doing other hypothesis tests "What does that p-value you just found actually measure?"

References and Resources

The original data on ballpark capacity and attendance can be found at ESPN's website http://sports.espn.go.com/mlb/attendance?sort=home_avg&year=2006&seasonType=2

See more Examples »