Constructing and using a PAM style scoring matrix

Eliot Bush, Harvey Mudd College
Author Profile

Summary

In this activity, students begin with a set of trusted alignments and then use them to create PAM style scoring matrices. These matrices can then be used to better align several test sequences.


Learning Goals

The goal of the activity is to help students develop a better understanding of the biology which underlies alignment algorithms. Alignment scoring systems make fundamental assumptions about the process of nucleotide or amino acid change and about the relatedness between species being compared. In this activity students are asked to construct a PAM style scoring matrix, thereby getting a better understanding of what such a matrix is and how it works. They should come away with a clear understanding of how substitution matrices are constructed and be able to make such a matrix themselves.

Context for Use

This is a lab module which is intended to be used by students learning alignment algorithms. It was developed for use in a computational biology course taught to upperclassmen. However it could also be useful in an in upper level computer science courses where alignment algorithms are a topic. The basic requirement is that students know some biology and know how to program in Python.

Description and Teaching Materials

This is a programming activity to be done in a computational lab setting. It can also be given as homework. The following document provides some background on scoring matrices. It can be given to students as a pre-lab (Acrobat (PDF) 74kB Mar3 09).

The activity can be carried out in a single 1 hour and 15 minute lab session. Students are given the following pre-lab (Acrobat (PDF) 30kB Mar3 09). This describes their task, which is to modify a python program called makeMat.py. Their modified version will produce scoring matrices, which can then be used to align several sequences which have been provided.

This exercise makes use of a number of support files. Here these are as a zip archive (Zip Archive 693kB Jul13 09). This archive contains the following files:

align_w_matrix.py
background-scoringMatrices.odt
background-scoringMatrices.pdf
EXO40_example.mat
exoBio.fa
makeMat.py
scMatExercise.odt
scMatExercise.pdf
spC.fa
spD.fa
trueSpCD_align.fa

The exercise requires python with the Numeric module installed. It was developed in a unix environment, but should work on mac or windows as well.

Teaching Notes and Tips

As set up here, the exercise provides students with the frequencies of alignment columns and letters. All they need to do is recognize what things are, and do the necessary matrix multiplication. If the instructor wanted them to do more, he or she could simply cut more things out of the provided file makeMat.py.

Assessment

The effect of this exercise on student understanding is looked at in three ways:
  • Their code and responses to the lab exercise questions
  • Responses to questions in a course exam
  • Responses to a survey as part of a course evalution
Assessment details: pdf (Acrobat (PDF) 23kB Jul13 09)

References and Resources

NC Jones and PA Pevzner. An introduction to bioinformatics algorithms. 2004. MIT Press.

SR Eddy. Where did the BLOSUM62 alignment score matrix come from? Nature Biotechnology 22, 1035 - 1036 (2004) http://dx.doi.org/10.1038/nbt0804-1035