Creating a Spam Filter

This page was authored by the CATALST Group at the University of Minnesota.

This material is replicated on a number of sites as part of the SERC Pedagogic Service Project

Summary

This activity asks students to work in a team to develop a set of rules that can be used to program a SPAM filter for a client. The rules are based on characteristics of the subject lines of emails. Students are given samples of SPAM and non-SPAM subject lines to examine. After their rules are ready, they are given a test set of data to use and are asked to come up with a numerical measure to quantify how well their method (model) works. Each team writes a report describing how their model works and how well it performed on the test data. This activity could serve as an introduction to ideas of classification. Alternatively, the activity could be the basis for student introduction to types of statistical errors.


Learning Goals

This activity has the following goals:

  1. Expose students to a real-world problem with real data.
  2. Expose students to a type of statistical problem (classification) and create the background needed to learn more formal approaches to statistical classification.
  3. Provide students with a conceptual understanding of types of statistical errors.
  4. Engage students in statistical thinking and working as a team.


Context for Use

This activity:

  • Is appropriate for use at any time in an introductory statistics course.
  • May be adapted for junior high, high school, and college-level instruction.
  • Is most effective when students work in groups of 3-4.
  • Lasts 50 - 75 minutes. The reading and individual students responses can take place prior to class and comparison of student reports can take place at a subsequent class or via an online class management system.


Description and Teaching Materials

  1. Media article: Students individually read the media article to become familiar with the context of the problem. This handout is available here. Spam Media Article (Microsoft Word 46kB Sep14 09)
  2. Readiness questions: Students individually answer these questions about the media article to become even more familiar with the context and begin thinking about the problem. This handout is available here. Spam Readiness Questions (Microsoft Word 22kB Sep14 09)
  3. Problem statement: In teams of three or four, students are given the problem statement and work on the problem in a group for 30 - 45 minutes. This time range depends on the amount of self-reflection and revision you want the students to do. The handout is available here. Spam Problem Statement (Microsoft Word 23kB Sep14 09)
    The initial set of Spam and Non-Spam subject lines that students are given is available here. First Set of Subject Lines (Microsoft Word 31kB Sep14 09)
    After they develop a preliminary model, students are then given an additional set of subject lines to modify their model. Second Set of Subject Lines (Microsoft Word 28kB Sep14 09)
  4. Process of sharing solutions: Each team writes their solution in a letter or memo to the client. Then, each team presents their solution to the class. Whole class discussion is integrated with these presentations to discuss the different solutions, the statistics involved, and the effectiveness of the different solutions in meeting the needs of the client.

The following supplies and materials are recommended for this activity.

  • Computers with word-processing programs to write up their reports.
  • Optional: Computers with programs such as Fathom
  • Optional: Calculators
  • Optional: Materials for students to create posters to share their solutions.


Teaching Notes and Tips

  1. The purpose of the media article and the readiness questions is to introduce the students to the context of the problem. Depending on the grade level and/or your instructional purposes, you may want to use a more teacher-directed format or a more student-directed format for going through the article and the questions.
  2. Place the students in teams of three or four. If you already use teams in your classroom, it is best if you continue with these same teams since results for may be better when the students have already developed a working relationship.
  3. Encourage (but don't require or assign) the students to select roles such as timer, collector of supplies, writer of letter, etc.
  4. Remind the students that they should share the work of solving the problem.
  5. After it appears that most students have completed Part 1, the instructor can suggest that students move on to Part 2, and provide the First Set of Subject Lines (Microsoft Word 31kB Sep14 09)
  6. Do not pass out the Second Set of Subject Lines (Microsoft Word 28kB Sep14 09) until students have shown to you they have a good set of rules to use.
  7. As students work in groups, the teacher's role should be one of a facilitator and observer. Avoid questions or comments that steer the students toward a particular solution. Try to answer student questions with questions so that the student teams come to their own solutions.
  8. Watch the time and try to urge groups on if they are falling behind.
  9. If students seem to get off task and are not focusing on the data provided, direct them back to the actual data and task.
  10. If more follow-up is desired, after presentations and discussion, allow students to resume their groups and modify their models.


Assessment

Assessment is an integral part of a model-eliciting activity. Each group is required to write a report to a "client" that describes their model, the reasoning that led to the model, and a justification of all decisions that are made based on the model. Group reports may be assessed for their clarity, completeness and the soundness of the explanations and justifications. In addition, instructors can decide if they wish to evaluate the students' presentations. Example rubric and scoring methods for student reports and presentations can be found at: https://engineering.purdue.edu/ENE/Research/SGMM/Problems/CASESTUDIESKIDSWEB/casestudies/airport/tools.htm

Follow-up questions to the activity may be used to assess student learning outcomes. For example,

  • What do you think you learned from this activity?
  • What questions do you have as a result of completing this activity?

Additional assessment items may be used depending on the purpose for using the activity and the nature of the course.


References and Resources