# Simulating a P-value for Testing a Correlation with Fathom

### This activity has been undergone anonymous peer review.

This activity was anonymously reviewed by educators with appropriate statistics background according to the CAUSE review criteria for its pedagogic collection.

This page first made public: May 17, 2007

This material is replicated on a number of sites as part of the SERC Pedagogic Service Project

#### Summary

Students use simulation to test whether the capacity of major league baseball parks and average attendance at games have a positive association. After creating a plot and finding the correlation for a sample consisting of values for all teams in the 2006 season, students use the Fathom software package to scramble the capacities to see how the sample correlation behaves when there is no association between the variables.

## Learning Goals

## Context for Use

## Description and Teaching Materials

## Teaching Notes and Tips

- Assuming students are working individually or in small groups, they should be encouraged to look at the results for other students (groups) nearby to recognize that, although their answers are not exactly the same, they should be very similar - especially the general shapes of the plots of correlations under the assumption of no association. This helps motivate the notion that we can find good approximating distributions to do the tests in practice, rather than always relying on simulation.
- As with any simulation, we need to emphasize that the results are still approximations and will differ (hopefully only slightly) from simulation to simulation.
- If students have already seen the traditional test for correlation they can create it in Fathom and check that the p-value (0.0058 for a one tail test) is consistent with what they approximate with the simulation.

Data may be updated from ESPN's website for subsequent years. The original data has average attendance and % of capacity, from which the capacities were computed. A point for discussion might be whether using data from a singel season is reasonable for estimating the correlation for a "populaiton" of all seasons. - For additional motivation, the original question arose from a discussion at a student presentation in Economics where a faculty member suggested using ballpark capacity (which is relatively stable) as a proxy for attendance (which changes from year to year).
- Although many students enjoy the baseball context, a sports example might not be appropriate for some classes. Other data can be substituted easily. A moderately significant correlation works best and it helps if sample units (e.g. the teams in the baseball example) are identified.

## Assessment

Informal: Ask students when doing other hypothesis tests "What does that p-value you just found actually measure?"