Genome Solver: Microbial Comparative Genomics

Gaurav Arora, Gallaudet University
Vinayak Mathur, Cabrini University
Anne Rosenwald, Georgetown University
Location: Washington, DC


Genome Solver began in 2011 as way to teach Bioinformatics tools to undergraduate faculty. As part of the Genome Solver project as a whole, we developed a Community Science Project (CSP) for faculty and students to join. The CSP explores horizontal gene transfer (HGT) between bacteria and the phages that infect them. Students get involved in this project and develop testable hypotheses about the role HGT between bacteria and phages play in microbial evolution. Our own work has demonstrated that undergraduates can produce publishable data using this approach.
We invite faculty and their students to participate in the search for additional evidence of this type of HGT by investigating the vast wealth of phage and bacterial sequences currently in databases. All that is needed is a computer, an Internet connection, and enthusiasm for research. Faculty and students can work on an organism of interest or we can help them pick organisms to explore these phenomena. By pooling all of the information from a variety of small projects under the umbrella of the Genome Solver CSP, we will be able to better understand the role of HGT in bacterial evolution.

Student Goals

  1. Understand the purpose of homology searches in identifying identical/similar sequences in phage and bacteria
  2. Identify regions of similarity based on a group of multiple sequences.
  3. Analyze biological data : Communicate their work in written English

Research Goals

  1. Identify horizontal gene transfer between phage and bacteria
  2. Understand the extent to which horizontal gene transfer contributes to bacterial diversity


Students with knowledge of biology and ability to surf biological websites will be paired to work on a phage-bacteria pair. Two weeks of the CURE will be used to teach the basics required for the course and then time will be used to investigate structural and functional genome annotation, homology, multiple sequence alignments, and phylogenetics. Depending on the class size, students can either work individually or in groups on their phage-bacteria pairs. The CURE will pair students in different disciplines who can bring their knowledge to the project and help supplement areas that some participants may lack.

Target Audience: Introductory, Major, Non-major
CURE Duration:Multiple terms

CURE Design

Core Competencies:Analyzing and interpreting data, Using mathematics and computational thinking
Nature of Research:

Tasks that Align Student and Research Goals

Research Goals →
Student Goals ↓
Research Goal 1: Identify horizontal gene transfer between phage and bacteria
Research Goal 2: Understand the extent to which horizontal gene transfer contributes to bacterial diversity

Student Goal 1: Understand the purpose of homology searches in identifying identical/similar sequences in phage and bacteria

Identify differences between phage and bacteria
Define horizontal gene transfer
Define homology and its subsets
Define accession number
Download data from biological databases (nucleotide and proteins)
Use homology search tools like BLAST to identify homologous sequences
Analyze BLAST results

Understand the criteria used to analyze homology results
Explain differences between strain and species
Identify different bacterial strains that show evidence of homology
Identify role of different strains in human health

Student Goal 2: Identify regions of similarity based on a group of multiple sequences.

Define gene annotation
Define an open reading frame
Identify coding sequences (structural annotation)
Define gene ontology (functional annotation)
Construct and analyze multiple sequence alignments
Identify direction of horizontal gene transfer

Construct phylogenetic trees from MSA
Analyze phylogenetic trees
Identify regions that are similar/different based on phylogeny

Student Goal 3: Analyze biological data : Communicate their work in written English

List the steps involved in identifying the xenologs
Explain two steps involved in identifying homologs

List the steps involved in creating phylogenies
Analyze the phylogeny and the results from the tree
Explain the mode of transfer of genes based on phylogeny

Instructional Materials


Lesson I – Introduction to Genome Solver
Introduction to the Genome Solver Project, its goals and the community science project

Lesson II – Databases
Introduction to databases, accession numbers, sequence information and hands-on-activity on how to download sequences

Lesson III – Annotation
Introduction to genome sequences and the meaning of A's, C's, T's and G's in the sequences. Differences between structural and function annotation

Lesson IV- Comparative Genomics
Introduction to sequence alignment and models used to compare sequences.

Lesson V- Phylogenetics
Introduction to phylogenetic construction using MEGA and analysis of phylogenies.

Lesson VI- The Community Science Project
Examples of projects on HGT and their results.


Lesson I – Introduction to Genome Solver

Exercise on BLAST

Lesson II – Databases

Exercise on Databases

Lesson III – Annotation

Exercise on Annotation

Lesson IV- Comparative Genomics

Exercise on Multiple Sequence Alignment

Lesson V- Phylogenetics

Exercise on Creating Phylogenies

Complete set of exercises

Instructional Staffing

A single instructor runs the course.


Gaurav Arora, Gallaudet University

Advice for Implementation

No funds, equipment or supplies are required. A class with computers and access to the internet are the only requirements for this course. The instructors will need to download the latest version of MEGA from This program is required to create phylogenetic trees.

For instruction: Start with BLAST instructional materials and activities followed by the lessons in the numbered order.
For research: Pick a phage-bacterial pair of interest or use the suggested list.


The challenges that students face in this course is the MEGA software. The program is somewhat unstable on MAC platforms and sometimes requires memory for larger calculations. In this case, students have found success with a platform running Microsoft operating systems. Sometimes students may need to pick the relevant sequences and run the analysis on subsets instead of the entire dataset.

Using CURE Data

Students are not generating data in this CURE, but instead analyzing and interpreting data from existing databases. Student research data has been disseminated at various professional conferences and the projects have led to published papers.


Kyrillos A, Arora G, Murray B and Rosenwald AG. The Presence of Phage Orthologous Genes in Helicobacter pylori Correlates with the Presence of the Virulence Factors CagA and VacA. Helicobacter 2015; 21 (3): 226 - 233

Rosenwald AG, Murray B, Toth T, Madupu R, Kyrillos A and Arora G. Evidence for horizontal gene transfer between Chlamydophila pneumoniae and Chlamydia phage. Bacteriophage. 2014; 4(4): e965076.

Rosenwald AG, Russell J and Arora G. The Genome Solver Website: a virtual space fostering high impact practices for undergraduate biology. J. of Microbiol. and Biol. Edu. Dec 2012; 188

Rosenwald AG, Arora G, Madupu R, Roecklin-Canfield J and Russell, J. The Human Microbiome Project: an opportunity to engage undergraduates in research. Procedia Comp. Sci. 2012; 9: 540

Mathur, V, Arora G, McWilliams M, Russell J and Rosenwald AG. The Genome Solver Project: Faculty Training and Student Performance Gains in Bioinformatics. J. of Microbiol. and Biol. Edu. (in press).