MERLOT Biology Portal Visit Main MERLOT

Protein Evolution

This page and activity were authored by Scott Cooper, University of Wisconsin-La Crosse
This material is replicated on a number of sites as part of the SERC Pedagogic Service Project
Initial Publication Date: January 15, 2007


In this activity students explore the evolution of proteins. They first choose or are assigned a protein sequence and find other homologous protein sequences, from orthologs or paralogs. They then align these sequences and superimpose the alignment onto the 3D structure of their protein. Finally they examine the 3D alignment to observe where the conserved residues are found on the structure. They can also speculate on the differences in alignment between orthologs and paralogs.

Used this activity? Share your experiences and modifications

Learning Goals

  • Students learn how to generate a 3D alignment of protein sequences.
  • Students identify regions of a protein that are conserved in 3D.
  • Students explain why orthologs are more highly conserved than paralogs.
  • Context for Use

    This is an upper level activity and could be used in a biochemistry, molecular biology or bioinformatics course. Students will need access to a computer and will require 2-3 hours in or out of class to complete the activity. Students will need a good understanding of protein structure and some bioinformatics background. This activity is designed as a demonstration using trypsin, but could be adapted to use any protein. There is a list of 18 proteins that work well that we assign to students.

    Description and Teaching Materials

    The BioWeb page containing the activity is linked through Resources below. We teach this unit in a computer lab. Three web-based bioinformatics and modeling programs will be used including Biology Workbench, Protein Explorer, and ConSurf.

    Students are first walked through the exercise using Trypsin as an example. They then perform similar analysis of their assigned sequence. First, the amino acid sequence for human trypsin is stored in BiologyWorkbench, be sure the students get the sequence from SWISS PDB, and have them record the 4 figure code, 1TRN for trypsin, as that is also the ID for the crystal structure. Next the students search for paralogs (other human proteins related to trypsin) and orthologs (trypsin from other species). These are imported into Biology Workbench and aligned using ClustalW, with separate alignments for the orthologs and paralogs. The students examine the alignments to see if the orthologs or paralogs appear to be more highly conserved, and if the conserved regions cluster together. The alignments are then put into a FASTA format and saved as a .txt file.

    Next the crystal structure of trypsin, 1TRN, is opened and examined using Protein Explorer. The active site residues are identified for future reference.

    Finally the students use the program ConSurf to superimpose the 2D alignment onto the 3D structure. To do this they will need the code for the .pdb file and the .txt file of the alignment. Once the aligned sequences are superimposed onto the 3D structure the students can see how the conserved regions that were scattered in 2D fold in and cluster around the active site in 3D.

    Teaching Notes and Tips

    This is a challenging unit, and students can get stuck conceptually and on the computer in a few places.

    First, conceptually students assume that paralogs will be more highly conserved, since they are all proteins from the same species. In fact, orthologs are typically more conserved because the proteins have the same function, and thus need to have the same structure.

    On the computer side, it is important that students understand what an ortholog and paralog are, so that when they are finding sequences for the alignment they choose the proper proteins.

    The text file needs to be saved properly, in FASTA format or ConSurf won't recognize it. The students will also need to identify which sequence in the alignment (.txt) lines up with the structure (1TRN). They will also have to pick a chain, typically this is identified in the sequence (i.e. 1TRN_A) would refer to chain A.


    We assign each student their own sequence to analyze after we have gone through the demonstration with trypsin. They then turn in their report electronically. The instructions are on the website, and as follows.

    Search PDBFINDER using BLASTP for a 3D structure of the protein you were assigned..

    Be sure to import this amino acid sequence so that you can align this sequence with the other sequences.

    Obtain 6-7 amino acid sequences of human proteins from the same family as the protein you were assigned.

    Obtain 6-7 amino acid sequences of the protein you were assigned from different species.

    With each group of proteins, perform a Clustal W alignment as before and use Protein Explorer to create a consensus structure of this protein.

    We would like at least three images printed out from this exercise. Once you have generated your model you can either right click on the image and paste it into your final report, or use "print window" to make a hard copy of the molecule. Unfortunately these models cannot be saved. These images will be part of your take-home assignment.

    Image 1. The structure of your protein in cartoon form with the secondary structures colored and the active site residues in space filling format. You can probably identify the active site residues using PROSEARCH.

    Image 2. The structure of your protein with the aligned sequences of the other family members superimposed. You will make a similar image of the aligned sequences of your protein from other species to illustrate evolutionary changes.

    Image 3. The structure of your protein with the aligned sequences of the other species superimposed. You will make a similar image of the aligned sequences of your protein from other species to illustrate evolutionary changes.

    References and Resources

    MERLOT description of the "Biology Workbench" resource that is used in this activity.

    MERLOT description of the "Protein Explorer" resource that is used in this activity.

    Other Resources:

    • The following webpage contains all of the instructions and links to complete this activity. Instructions
    • This webpage contains a list of 17 protein sequences that work well with this assignment. Assignment
    Main MERLOT Home | MERLOT Communities | About MERLOT | Contact MERLOT
    Copyright 1997-2006 MERLOT. All rights reserved.