# Using Regression Models to make Predictions

**This activity was selected for the Teaching Computation in the Sciences Using MATLAB Exemplary Teaching Collection**

Resources in this collection a) must have scored Exemplary or Very Good in all five review categories, and must also rate as “Exemplary” in at least three of the five categories. The five categories included in the peer review process are

- Computational, Quantitative, and Scientific Accuracy
- Alignment of Learning Goals, Activities, and Assessments
- Pedagogic Effectiveness
- Robustness (usability and dependability of all components)
- Completeness of the ActivitySheet web page

For more information about the peer review process itself, please see https://serc.carleton.edu/teaching_computation/materials/activity_review.html.

This page first made public: Aug 23, 2019

#### Summary

This activity introduces students to prediction and confidence intervals for a simple linear regression model using a MATLAB Live Script. To draw a connection to confidence intervals for an unknown population mean, the activity begins with an explanation of how the true regression line is simply a line of average values. The concept of a point estimate and confidence interval for the mean response *E[Y]* is explained mathematically and illustrated graphically as well as the concept of a prediction, prediction error, and prediction interval for a new observation *Y _{i}*. At the conclusion of the activity, the student will understand the key differences between confidence intervals and prediction intervals for simple linear regression models.

## Learning Goals

At the conclusion of this activity, the student will:

- Understand that the
**true regression line**is simply a line of**average**values. - Understand the
**graphical representation**of the distribution of*Y*for different values of*x*. - Understand the concept of a
**point estimate**and**confidence interval**for the mean response*E[Y]*. - Understand the concept of a
**prediction**,**prediction error**, and**prediction interval**for a new observation*Y*._{i}

The student will gain the following MATLAB skills:

- Fit a regression line using
**fitlm(X,y)** - Use
**predict(mdl,Xnew,'Prediction','curve')**to add the 95% confidence interval for the true mean response*E[Y]* - Use
**predict(mdl,Xnew,'Prediction','observation')**to add the 95% prediction interval for the prediction of a new(future) value of*Y*

## Context for Use

Although this activity was designed for use with students at the graduate-level, Simple Linear Regression is a basic (undergraduate) statistics and data analysis topic and therefore this activity would also be appropriate in an undergraduate level statistics course where linear regression is covered. This activity could be conducted individually, in groups of 3-4 students, or collectively as a class. In a classroom setting, students are allowed the entire class period (50 minutes) to work through the MATLAB live script and complete the exercises for further exploration.

The students should have already completed standard instruction on parameter estimation, confidence intervals, and simple linear regression.

## Description and Teaching Materials

### The Simple Linear Model

The MATLAB Using Regression Models to Make Predictions Live Script (MATLAB Live Script 54kB Aug17 19) begins with a look back at the simple linear regression model and mathematically demonstrates that the simple linear regression model is simply a **line of average values**.

### The Distribution of *Y* for Different Values of *x*

Next, the MATLAB Live Script walks the student through a demonstration of what it means for the error terms to be random variables that are normally distributed with mean 0 and variance equal to sigma squared. The **Further Exploration activity** asks the student to try different values for the parameters to determine their effect on the distribution and to try different numbers of samples to determine what happens as *n* approaches infinity.

### The Relationship between the True Regression Line and our Fitted Regression Line

The third exploratory activity generates a random sample of observations, uses **fitlm(X,y)** to fit a linear regression model and then compares the fitted model to the true regression line so that students can see the relationship. The **Further Exploration activity** asks the students to generate new random samples until they understand the relationship between the true regression line, the sample data, and the estimated (fitted) regression line.

### The Confidence Interval for a Simple Linear Regression Model

**predict(mdl,Xnew,'Prediction','curve')**command is used to add the 95% confidence interval for the true mean response

*E[Y]*that corresponds to the very last sample. The use of the visual display helps to explain the hourglass shape of the confidence interval.

### The Prediction Interval for a Simple Linear Regression Model

A very difficult concept to grasp is the difference between a confidence interval and a prediction interval for a simple linear regression model, so the next part of the MATLAB Live Script mathematically describes the difference and then uses **predict(mdl,Xnew,'Prediction','observation')** to add the 95% prediction interval for the prediction of a new(future) value of *Y*. Students are able to visually see that the prediction interval is much wider than the confidence interval.

**Further Exploration activity**asks the student to compare the width of the intervals near the mean value and then again near the max value of the predictor variable. After demonstrating how the width of the prediction interval relates to the sampling distributions of the y

_{i}'s, the student is asked to create an array of values outside the domain of the predictor variables and learn (through exploration) why we don't want to use a fitted regression model to make predictions far outside the range of the predictor variables.

## Teaching Notes and Tips

The Using Regression Models to Make Predictions Live Script (MATLAB Live Script 54kB Aug17 19) explores the concepts of confidence intervals and prediction intervals for simple linear regression models from a graphical perspective which introduces the student to the fitlm(X,y) command to create a linear regression model and the predict(mdl,Xnew,Name,Value) command to predict the response from the linear regression model. The parameters used to illustrate the concepts were arbitrarily chosen and could be easily modified for any situation where the true linear regression model is known.

For reproducibility in support of the in-class demonstrations, the Using Regression Models to Make Predictions Live Script (MATLAB Live Script 54kB Aug17 19) uses the rng(seed, generator) command to control the random generation of the sample data. For the **Further Exploration activities**, the instructor/student will need to remove (or comment out) these lines in the code.

The Using Regression Models to Make Predictions Live Script (MATLAB Live Script 54kB Aug17 19) was intentionally designed to support individual exploration, as well as collective exploration. At the undergraduate level, I would recommend walking through the Live Script together as an in-class activity. At a post-graduate level, I encourage students to explore the Live Script individually prior to coming to class and then we collectively discuss the observations during class.

## Assessment

As a stand-alone document, the Using Regression Models to Make Predictions Live Script (MATLAB Live Script 54kB Aug17 19) is intended to serve as an exploratory in-class activity and is not directly assessed. However, students are expected to recall this information in order to answer conceptual questions about the relationship between confidence intervals and prediction intervals for simple linear regression on the final exam.

Additionally, on the computational portion of the final exam students are expected to use bivariate data to fit a linear model, use the model to make predictions and then describe the corresponding **confidence interval** and/or**prediction interval** based on whether they are predicting a **mean response** or a **new observation**, respectively.

## References and Resources

**Textbook**:

*Probability and Statistics for Engineering and the Sciences*, 9th Edition, (2016) by Jay Devore. Published by Cengage Learning, Boston.