Parallel Computing in the Computer Science Curriculum > Modules > Concept: Data Decomposition Pattern

Concept: Data Decomposition Pattern

Summary

This module consists of reading material and code examples that depict the data decomposition pattern in parallel programming, using a small-sized example of vector addition (sometimes called the "Hello, World" of parallel programming. Code is provided, but students need not execute it to see how the pattern is implemented. The example code begins with an original serial version, then shows how decomposition is defined in MPI, OpenMP, and CUDA.

Learning Goals

Students will be able to explain the data decomposition pattern that is used in many parallel programming solutions.

Context for Use

This is designed as reading material to be completed before additional activities in which the data decomposition pattern is explored. For instance, see our modules "Multicore programming using openMP", "Visualize Numerical Integration", "GPU Programming", and "Distributed Computing Fundamentals" as follow-up to this reading.

Description and Teaching Materials

You can visit the module in your browser:

Concept: Data Decomposition Pattern

or you can download the module in either PDF format or latex format.

Teaching Notes and Tips

Assign as a reading. The code examples are supplied, so students could try running them if they are comfortable compiling using various C compilers for each of the software/hardware combinations used (OpenMP on multicore with gcc; MPI on single machine or cluster with mpicc; GPU co-processor with nvcc). You could easily concentrate on one, two or all three of the examples, as each is in a separate section.

Assessment

No assessment instrument available.

See more Modules »

Concept: Data Decomposition Pattern -- Discussion

I would like to register an objection. The data decomposition pattern as explained here for MPI uses a type of master-worker strategy: the master creates the array, then scatters and later gathers it. The master-worker paradigm is usually unsuited to MPI since it introduces a sequential component, and also because in realistic cases there is unlikely to be a master that has as much memory as all workers combined.

Suggestion: treat all processes symmetrically, and let them generate their own array. For this they need to translate between local zero-based indexing and global indexing; they could then initialize the arrays with x = somefunction( i+myfirst ). I feel that this would be more idiomatic for MPI.

7347:24087