Principal Component Analysis

Daniel Zysman
Massachusetts Institute of Technology,

This activity is part of the Teaching Computation in the Sciences Using MATLAB Peer Reviewed Teaching Activities collection. (view details)

This activity was selected for the Teaching Computation in the Sciences Using MATLAB Peer Reviewed Teaching Collection

This activity has received positive reviews in a peer review process involving five review categories. The five categories included in the process are

Computational, Quantitative, and Scientific Accuracy
Alignment of Learning Goals, Activities, and Assessments
Pedagogic Effectiveness
Robustness (usability and dependability of all components)
Completeness of the ActivitySheet web page

For more information about the peer review process itself, please see https://serc.carleton.edu/teaching_computation/materials/activity_review.html.

Initial Publication Date: September 30, 2016 | Reviewed: November 15, 2018 (see revision history: 2 events)

Summary

Video Player is loading.

Current Time 0:00

Duration -:-

Loaded: 0%

Stream Type LIVE

Remaining Time -:-

Click to watch Daniel Zysman discuss his activity or watch the full webinar.

In this exercise we will use principal components analysis (PCA) to cluster images. In particular, we will investigate a data set consisting of images of handwritten digits 1 and 7. This is a subset of larger data set of handwritten digits which is often used to test and benchmark classification and learning algorithms (see http://yann.lecun.com/exdb/mnist/).

These handwritten digit images live in a high dimensional space. However, we can exploit pixel intensity covariance patterns to reduce the dimensionality of the data. PCA provides a principled way to find a low-dimensional subspace where most of the image variability is thus retained.

We will use PCA to explore if digits 1 and 7 form distinguishable clusters in this lower dimensional representation of the data.

Biology, Mathematics | College Lower (13-14), College Upper (15-16)

Expand for more detail

Share your modifications and improvements to this activity through the Community Contribution Tool »

Learning Goals

The main objective of this exercise is to get students to use all the linear algebra they have learned so far.

The exercise will allow them to visualize a relative complex and large data set. By doing so they will put in practice the concepts of dot product, projections, orthonormal basis sets, dimensionality reduction, singular value decomposition, and principal component analysis, among others.

It also offers the opportunity to introduce basic machine learning concepts such as linear separability, single layer perceptrons and clustering.

Additionally it opens the door for discussion on how the brain may perform these operations and what are the biological basis of PCA.

For those in the neuroscience/ computational neuroscience domain it also presents an opportunity to discuss Principal component analysis in the context of neural circuits (1). In fact, Oja's rule is very similar to the power method to obtain eigenvectors.

Finally, the MATLAB skills needed for this activity are:

making plots and matrix visualizations
matrix operations and indexing
logical indexing
reshaping matrices
for-loops.

(1) Oja, E.(1982). "Simplified neuron model as a principal component analyzer". Journal of Mathematical Biology. 15 (3): 267–273.

Context for Use

I use this exercise as problem set in 9.40 (Introduction to Neural computation), which is a sophomore/junior subject.

We have covered linear algebra, PCA/SVD and perceptrons before they work on this particular problem set.

Alternatively I use this exact same exercise as an in-class activity for a graduate level subject (9.014 Quantitative Methods and Computational Models in Neurosciences). Here, we have presented linear algebra, covariance matrix and SVD.

Description and Teaching Materials

This activity can be set as homework/problem set or as an in-class activity.

The in-class activity takes roughly 80 minutes. For this format the best approach is to break the class into smaller groups and make them work on each question. After 35 minutes make a brief class discussion to summarize results and programming difficulties that might arise. Usually students get up to question 5 by this time.

5-10 minutes before the end of class make a wrap up by discussing other applications of PCA.

Materials included:

Problem set / handout for activity in PDF and .doc format. Problem set / handout (Zip Archive 253kB Sep29 16)

Data set as .mat file. Data set (Matlab .MAT File 121kB Sep29 16)

Two solutions sets as .m, .mlx and pdf from live editor. (With svd and with eig)

svd: Solution SVD (Zip Archive 167kB Sep29 16)

eig: Solution EIG (Zip Archive 166kB Sep29 16)

Teaching Notes and Tips

If you don't cover SVD in your class but you do cover eigenvectors and eigenvalues, you can use the alternative solution with eig.

Also you will need to change the handout to use eig rather than svd.

Students usually struggle to understand projections, rotations (the essence of orthonormal basis sets) and reshaping matrices.

I usually show a simple 2-D example at the beginning of class to set the problem up.

Make sure that your TAs know the problem in advance and are willing to help students to get through the exercise.

There is also a follow up problem for extra credit when you use it as a PSET.

Share your modifications and improvements to this activity through the Community Contribution Tool »

Assessment

At the end of the class you can discuss what the V matrix in the SVD decomposition (we did not use it) represents. Try to make them think of what doing PCA in the other space represents.

If they get this you can rest assured that the exercise has worked quite well.

See more Teaching Activities »

Teaching Computation Using MATLAB^®
Workshop: Carleton College • Northfield, MN

Principal Component Analysis

Summary

Activity Classification and Connections to Related Resources
Collapse

Topics

Grade Level

Learning Goals

Context for Use

Description and Teaching Materials

Teaching Notes and Tips

Assessment

Provenance

Reuse

Provenance

Reuse

Provenance

Reuse

Provenance

Reuse