Introduction to Analysis of Variance (ANOVA)

Michelle Isenhour
Naval Postgraduate School, Operations Research
Author Profile

Summary

This activity introduces graduate-level students to one-way (single-factor) analysis of variance (ANOVA). The students are familiar with MATLAB, but have little-to-no experience with ANOVA. The activity is motivated by a short video that demonstrates a classic historical ANOVA case (the 1970 military draft lottery), includes a pre-class assignment requiring the review of two MATLAB live scripts and an assignment available in MATLAB Grader (formerly Cody Coursework), and concludes with in-class discussion and practical exercise using a MATLAB live script.

Learning Goals

At the conclusion of this activity, the student will:

  • Understand the concept of single-factor (or one-way) Analysis of Variance (ANOVA).
  • Know how to "manually" compute SSTr, MSTr, SSE, MSE, and SST using MATLAB.
  • Know how to use SSTr, MSTr, SSE, MSE,and SST to manually construct an ANOVA table.
  • Know how to use MATLAB to generate the ANOVA table.
  • Understand the F probability distribution and know how to use an F-test to conduct hypothesis tests for single-factor ANOVA situations.

The student will gain the following MATLAB skills:

  • Compute summary statistics organized by group: grpstats(X, group)
  • Use the F-distribution to conduct analysis of variance:
    • Use the F inverse cumulative distribution function to compute a critical value: finv(1-alpha,nu1,nu2)
    • Use the F cumulative distribution function to find a p-value: fcdf(x,nu1,nu2,'upper')
  • Conduct one-way analysis of variance: anova1(y, group)

Context for Use

This activity was developed for a graduate-level course in Statistics and Data Analysis with an hour-long pre-class assignment, similar to what would be required in a flipped-classroom, and one-hour of in-class discussion and demonstration. In a traditional classroom setting, this activity could be reproduced, with the first hour of lecture encompassing the activities listed in the pre-class assignment.

The students should have already completed standard instruction on descriptive and inferential statistics, as well as hypothesis testing. The MATLAB Grader exercise(s) and MATLAB live script(s) can be adapted for use with any appropriate data set.

Description and Teaching Materials

This activity begins with a pre-class assignment. The pre-class assignment is designed to take (on average) approximately one-hour of the student's time outside class and is designed to prepare the student for the in-class portion. As part of the pre-class assignment the student is required to watch a short video, read the text, review MATLAB live script(s), and then complete at least one example problem in MATLAB Grader. Once finished with the pre-class assignment, the student uses a "discussion board" in the learning management system (LMS) to post his/her solution and/or any questions prior to class.
  • Pre-Class Assignment
    • Demonstrate why Analysis of Variance is important through a historical lens:
    • Complete assigned reading from text:
      • Read: PS4ES9e, Chapter 10, Section 10.1, pages 409-420. Probability and Statistics for Engineering and the Sciences, 9th Edition, (2016) by Jay Devore. Published by Cengage Learning, Boston.
    • Review Single Factor ANOVA Live Script (MATLAB Live Script 11kB Aug14 18)
    • Review F distribution and Analysis of Variance (ANOVA) (MATLAB Live Script 56kB Aug14 18)
    • Complete single-factor (one-way) ANOVA problem in MATLAB Grader:
    • Use "Comment" tool in LMS to submit solution and/or questions for instructor review prior to class.
Prior to class, I review the student solutions in MATLAB Grader as well as their posts to the discussion board and frame the in-class lecture around student performance on the pre-class exercise(s). In a typical 50-minute class period, I spend no more than 20 minutes on the lecture in order to give the students the last 30 minutes to work on an in-class practical exercise. Whenever possible, I try to tie the in-class exercise back to the motivational video. In this lesson, the students create a MATLAB live notebook and conduct Analysis of Variance on the data from the 1970 military draft lottery.
  • In-Class
    • Review student performance on MATLAB Grader; highlight interesting code and/or methods of solutions discovered by the students.
    • Respond to submissions on discussion board; field questions from students.

Teaching Notes and Tips

The Single Factor ANOVA Live Script (MATLAB Live Script 11kB Aug14 18) explores the concepts of Total Sum of Squares, Error Sum of Squares, and Treatment Sum of Squares from a graphical perspective which introduces the students to the various components of the ANOVA table prior to demonstrating the MATLAB built-in ANOVA commands. The data used was selected to best demonstrate the concepts; however, the file could be adapted to other data sets.

At this point in the course, my students had not been introduced to the F distribution, so F distribution and Analysis of Variance (ANOVA) (MATLAB Live Script 56kB Aug14 18) was necessary to bridge the gap. If students are familiar with the F-distribution, this portion of the pre-class assignment could be omitted.

MATLAB Grader is not integrated with our learning management system. Accordingly, completion of the assignment in MATLAB Grader is optional. However, as you can see from the screenshot above, I have found that most students use MATLAB Grader (especially when learning new topics and skills in MATLAB) because it provides instant feedback.

Assessment

To motivate the students to complete the pre-class assignment, the students are awarded 10 points once they have posted on the discussion board. Over the course of the quarter, these pre-class assignments account for 25% of their overall course grade.

Typically, I do not assign a grade the in-class practical exercises. The learning objectives and MATLAB skills are assessed on a bi-weekly laboratory assignment and corresponding report.

References and Resources

Textbook: Probability and Statistics for Engineering and the Sciences, 9th Edition, (2016) by Jay Devore. Published by Cengage Learning, Boston. Problems in MATLAB Grader are taken directly from the text.

Article: Rosenbaum, David E. "Statisticians Charge Draft Lotter'y Was Not Random." The New York Times, The New York Times, 4 Jan. 1970, www.nytimes.com/1970/01/04/archives/statisticians-charge-draft-lottery-was-not-random.html.

Video: Werner, Mark. "1970 Draft Not Random." YouTube, 10 Jan. 2013, https://youtu.be/VJO-NI07yLs.