Comparison of Protein Sequences: BLAST searching and Phylogenetic Tree Construction

Wade H. Powell
Biology Department
Kenyon College
Author Profile


This laboratory exercise is a guided discovery of computational methods for comparing protein sequences. It accompanies several weeks of wet lab work in which students clone cDNAs encoding Cytochrome P450 1A (CYP1A) from animals (primarily fish) collected locally and exposed to pollution compounds that induce expression of the enzyme. In this exercise, students perform BLAST searches of reported CYP1A sequences and construct phylogenetic trees using CYP1A amino acid sequences from various vertebrate species, especially those with multiple CYP1A paralogs. These guided discovery activities prepare students for subsequent analyses of novel cDNA sequences that they clone.

Learning Goals

  • Students will distinguish basic relationships between similar members of multi-gene families: paralogy and orthology.
  • Students will gain proficiency in important bioinformatics techniques for identifying evolutionary relationships between protein sequences.
  • Students will grasp the potential importance of gene duplication in evolution.
  • All immediate learning goals equip students with technical and intellectual skills necessary for analyzing the sequence they will clone later in the semester.

Context for Use

Transcription of genes encoding CYP1A enzymes is induced in vertebrate animals by exposure to aromatic hydrocarbon pollutants, including dioxins, PCB, and many petroleum hydrocarbons. CYP1A expression serves as a sensitive biomarker of exposure. The evolution of CYP1A genes is complex, with at least one, if not several, gene duplications occurring following the diverge of teleost fish from the vertebrate lineage. Thus, evolutionary analysis of CYP1A sequences from many species provides an opportunity to explore the number and timing of gene duplications during vertebrate evolution.

This activity is part of an intermediate-level lab course (12-14 students) on recombinant DNA and the measurement of mRNA expression. The course consists of a semester-long project in which students clone cDNAs encoding Cytochrome P450 1A (CYP1A) from animals (primarily fish) collected locally and measure changes in mRNA levels following exposure of the animals to aromatic hydrocarbon pollutants that induce expression. Previous to this exercise, students isolate RNA, assess its integrity by electrophoresis, and design degenerate primers for amplification of partial cDNAs by RT-PCR. This computer exercise, which can be completed in a single 2-3 hour lab session, provides an opportunity to strengthen the students' conceptual framework of the project following several weeks of technique-intensive focus. It also allows adequate time for the procurement of student-designed primers prior to performing RT-PCR.

Description and Teaching Materials

Student preparation. Three short readings about gene duplication in evolution and phylogenetic analysis are provided at least one week prior to class. Students are expected to read these thoroughly before the activity. These readings complement earlier readings that specifically concern the cloning and evolutionary analysis of CYP1A genes in fish.

Discussion. (30-40 minutes) Using the assigned reading as a springboard, we have a seminar-type discussion on how genes get duplicated, the importance of gene duplication in evolution, and the interpretation of phylogenetic trees.

Guided Discovery. Students work individually to complete the exercise described in the Student Lab Handout. The handout guides them through accessing and using GenBank and constructing phylogenetic trees. Prompting questions throughout the exercise help students make key distinctions between different (for example) types of searches and different relationships depicted in phylogenies. Discussion and collaboration between pairs or small groups students is encouraged as they work through the activity at different paces. Materials: In addition to the Lab Handout, this activity requires a computer for each student CLUSTALX software, TreeView software (See Resources section)

Assignment. At the end of the exercise, each student is issued a GenPept accession number. In the subsequent 2-3 weeks, they use the computer techniques introduced in this course to learn about their sequence and its orthologs and paralogs. Students construct a report that includes a phylogenetic tree, constructing a narrative or an argument about the number and timing of duplications within the gene family to which their assigned sequence belongs. Assign accession numbers corresponding to members of multi-gene families. I often use transcription factors related to the regulation of CYP1A expression. This is a challenging assignment that requires students to identify and read papers about their protein as well as to think critically about the related proteins they identify, choosing carefully from among many possibilities so as to construct the most complete, accurate, and effective description of the evolution of the gene family.

Materials: A relevant accession number for each student. Print them on index cards and allow students to draw from a hat. Access to the free software for installation on personal computers is also helpful.

Multiple Sequence Alignment Lab Handout (DOC format (Microsoft Word 49kB Jul16 07) or PDF format (Acrobat (PDF) 118kB Jul16 07)
CYP1 sequences (Text File 12kB Jul16 07)

Teaching Notes and Tips

Integrating the lab experience.

Rather than correspond to a particular lecture/discussion course, this lab course stands alone. It nonetheless integrates concepts from multiple upper-level biology courses, including Evolution, Developmental Biology, and Environmental Toxicology. A required course for the Biochemistry major, it does not shy away from explicit discussion of the chemistry and enzymology involved in different methodologies. Students develop lab techniques (including scientific writing) that are useful in other lab courses throughout Kenyon's Biology Department.

Tips and troubleshooting.

Well prepared students readily handle the discussion phase at the beginning of the exercise. It is important to keep this part moving and not let it drag out beyond ~40 minutes. I prefer to emphasize broad concept rather than procedural detail.

The computer exercise makes use of several features of the NCBI web pages. These are frequently updated, requiring revisions to the descriptions and URLs noted in the Lab Handout. A careful scouting of the sites by the instructor prior to the class meeting can avert unexpected confusion.

The assignment is particularly challenging, and I often allow students 2.5 weeks to complete it. The learning goals are most effectively realized when they work steadily along, seeking advice from each other and from the instructor and continually revising their analysis. I urge each student to at least identify the protein corresponding to his/her accession number prior to leaving class on the day of the exercise. Holding an evening workshop (led by the TA) more than one week before the due date is an effective way to keep students on task and prevent procrastination.


Student understanding of these concepts and techniques is assessed in three ways:
  1. Brief answers to prompting questions throughout the Lab Handout. Blanks are provided for students to write their answers. I do not collect the handout, but I monitor their progress throughout the class period.
  2. Lab Report. The lab report is graded carefully according to a rough rubric. I often distribute this along with the instructions.
  3. Final lab report. Students perform similar analyses on the sequence they clone. I often see substantial improvement in undertanding and presentation between the analysis of the assigned protein (1st attempt) and the analysis of their own sequence, part of the final assignment of the course (2nd attempt).
Report Grading Description (Acrobat (PDF) 27kB Jul16 07)

References and Resources

CLUSTALX Mac Version (site offline)

National Center for Biotechnology Information

Background reading assigned for class discussion associated with this module:

Robinson-Rechavi et al. (2001) Euteleost fish genomes are characterized by expansion of gene families. Genome Research 11:781-788.

Lynch, M (2002) Gene Duplication and Evolution. Science 297:945-947.

Unda, Faride (2007) Introduction to phylogenetics. Science Creative Quarterly

Relevant background reading from previous weeks:

Morrison, Hilary G., E. Jennifer Weil, Sibel I. Karchner, Mitchell L. Sogin, and John J. Stegeman, 1998. Molecular cloning of CYP1A from the estuarine fish Fundulus heteroclitus and phylogenetic analysis of CYP1 genes: update with new sequences. Comparative Biochemistry and Physiology Part C 121: 231-240.

Morrison H.G., M.F. Oleksiak, N.W. Cornell, M.L. Sogin, J.J. Stegeman. 1995. Identification of cytochrome P-450 1A (CYP1A) from two teleost fish, toadfish (Opsanus tau) and scup (Stenotomus chrysops), and phylogenetic analysis of CYP1A genes. Biochem. J. 308: 97-104.

Berndtson, A., and Chen, T. 1994. Two unique CYP1 genes are expressed in response to 3-methylcholanthrene treatment in rainbow trout. Archives of Biochemistry and Biophysics 310: 187-195.

You can also link to the syllabus and description for the full course that included this activity.