Map-reduce Computing for Introductory Students using WebMapReduce

Professor Richard Brown, St. Olaf College
Professor Libby Shoop, Macalester College

Summary

Designed for the first CS course, yet usable in many later courses, students use a web application called WebMapReduce(WMR) and one of several supported programming languages to learn how using multiple processes can aid in solving challenging problems faster. This module emphasizes data-parallel problems and solutions, the so-called 'embarrassingly parallel' problems where processing of input data can easily be split among several parallel processes. Examples in this include manipulation of very large data files, such as counting the frequency of words in large texts, or transforming a collection of numeric data values. Following the widely used map-reduce computational pattern, students write code for mapper and reducer functions, submit them in the web interface of WMR, enter the data file to be used as input, and submit the job to be run using cluster computation. The readings, concept presentation material, active in-class exercises, and homework exercises build on base material commonly covered in an introductory course, such as iteration over collections and working with strings, and file manipulation.

Module Characteristics

Languages Supported: Python, Scheme, C++, Java
Relevant Parallel Computing Concepts: Data Parallelism, Task Parallelism
Recommended Teaching Level: Introductory
Possible Course Use: Introduction to Computer Science


Learning Goals

  • Students should be able to identify basic forms of data parallelism in computational problems.
  • Students should be able to distinguish between sequential and parallel computation, and identify the practical significance of each.

Context for Use

This module is designed for use in introductory CS courses using their programming language of choice. The activity could be conducted during the second half of the course for about one to two class periods in length.

Description and Teaching Materials

Teaching Notes and Tips

This module uses the Parallel Platform Package WebMapReduce, a web-based interface for creating and running Hadoop Map-reduce jobs. You can go to our WebMapReduce page for more information or the WebMapReduce home page on sourceforge .
You will need WebMapReduceinstalled on some platform resouce, such as a cluster or the cloud.

To have students work with very large files, you will need to place some on your underlying hadoop installation.

Assessment

We are in the process of preparing assessment questions for this module, which will be designed to ascertain how much students have learned towards the educational goals stated above.

References and Resources





Map-reduce Computing for Introductory Students using WebMapReduce --Discussion  

Join the Discussion


Log in to reply