Using an Applet to Demonstrate Sampling Distributions of Regression Coefficients

This page authored by Roger Woodard, Steve Stanislav, Jennifer Gratton, Pam Arroway, NC State University, based on an applet This site appears to be offline. by Stefan Michiels, Bert Raeymaekers, University Center for Statistics, Katholieke Universiteit Leuven.
Author Profile

This material is replicated on a number of sites as part of the SERC Pedagogic Service Project

Summary

This visualization activity combines student data collection with the use of an applet to enhance the understanding of the distributions of slope and intercept in simple linear regression models. The applet simulates a linear regression plot and the corresponding intercept and slope histograms. The program allows the user to change settings such as slope, standard deviation, sample size, and more. Students will then see theoretical distributions of the slope and intercept and how they compare to the histograms generated by the simulated linear regression lines.





Learning Goals

Student should:
  • Recognize that even if there is no linear relationship between two variables the slope from a sample regression line can be non-zero.
  • Explain that the slope and intercept of a regression line are random quantities and will vary from sample to sample.
  • Explain that the sampling distribution of the slope (and intercept) of the regression line have a normal distribution.

Context for Use

This activity is appropriate for all class sizes and is best used as an interactive activity for an advanced introductory or second college statistics course. It is best used as an introduction to the concept of inference for regression parameters. This demonstration can be completed within a 50 minute class period.

Prerequisites for this activity:
  • Students should have some understanding of the concept of fitting simple linear regression models. That is, the idea that a scatterplot leads to a sample slope and a sample intercept.
  • Also, the general concept of a sampling distribution is assumed. It might not be appropriate to introduce sampling distributions in this setting.

Description and Teaching Materials

This activity links a real world data collection with a simulation. Students collect data on a relationship that should have little or no relationship (the height and age of the students). Students working in small groups calculate a sample slope and intercept. Groups then compare their results for the entire class and begin the formation of a sampling distribution of the regression coefficients. The instructor then demonstrates the applet showing students how the sampling distribution would develop for a larger number of samples.

By teaming a hands-on activity with the use of an applet the instructor can help students better understand the idea of a sampling distribution in the regression setting.

Teaching Notes and Tips

Instructors should stress the idea that each sample that the students have taken in the activity is represented by the samples generated in the applet. Students (including those in an advanced class) often have trouble understanding the idea of a sampling distribution.

A little bit of careful planning will make this demonstration go smoothly.

  • Students should bring a calculator to class to aide in regression line calculations. Although it may not be necessary, it streamlines the activity process.
  • Students will work in groups of five on a worksheet. Worksheets should be pre-numbered to facilitate calling on the groups.
  • As the students are calculating their regression lines, the instructor should prepare histograms on the board or a transparency for the sample intercepts and slopes.
  • The instructor should open the applet while the students are working.
Detailed instructions and tips for applet usage are given in attached document.

Assessment

The upcoming lectures can bring this material back into light when discussion of confidence intervals for regression slope and hypothesis testing for regression slope.

Assessments over this topic should be focused on the students understanding of the sampling distribution of the coefficients. Students should be asked to explain why a confidence interval or hypothesis test is needed for a regression parameter. Many students in basic regression courses or advanced introductory classes can calculate the confidence intervals and hypothesis tests but not explain their meaning. Questions that ask for calculations of intervals and tests should be followed with questions that ask for explanations of their meaning.

References and Resources