Comparison of a Highly Polymorphic Olfactory Receptor Gene Subfamily in Genetically Diverse Dog Breeds

Lois Banta, Norm Bell, and Duane Bailey
Williams College
Williamstown, MA 01267
Author Profile

Summary

Image of a dog
In this three or four week project, students learn about single nucleotide polymorphisms (SNPs) by amplifying and generating sequence data on a highly polymorphic gene subfamily in a diverse population of subjects (dogs) with which many students have considerable familiarity and affinity. In the first week, students make use of previously acquired knowledge of phylogenetic relationships and experience in sequence alignment to design primers specific for one subfamily of canine olfactory receptor genes. In the second week, each student uses his/her primers in a polymerase chain reaction (PCR) to amplify the corresponding DNA from one dog's cheek cells. During the following lab, PCR products are purified and the yield is confirmed on a gel. In the final week, commercial or in-house sequencing is used to determine the sequence of the PCR product. The data analysis draws on a published microsatellite genotype-based population structure of 85 domestic dog breeds, allowing the students to compare a phylogenetic tree estimated from a single gene with data obtained through a genome-wide analysis. An optional bioinformatics module introduces existing web resources to predict transmembrane domains and/or provides students with a short programming assignment in which they write a Perl script to perform this analysis on an olfactory receptor sequence.


Learning Goals

  • Students will put into practice skills they have learned in previous bioinformatics labs, by analyzing and generating phylogenetic trees and by performing sequence alignments.
  • Students will appreciate the difficulties in obtaining sequence data for an individual gene in a highly conserved family and the value of single-molecule sequencing strategies.
  • Students will gain an understanding of the role of single nucleotide polymorphisms in gene family structure, the evolutionary processes that lead to genetic duplication and diversification, and the limitations in correlating function with specific SNPs.
  • Students without prior wet-lab experience will gain familiarity with the rudiments of molecular biology tools and techniques in a relatively straightforward protocol.

Context for Use

This activity is part of an upper-level elective (12-14 students) in genomics and bioinformatics. It could also be used in a molecular evolution or a genetics course to integrate with classroom-based lessons on SNPs, PCR and/or DNA sequencing. The adaptability to larger classes is limited only by the availability of gel electrophoresis equipment and the cost of the PCR reagents and sequencing; students could also work in pairs to reduce expenses. Ideally, faculty would have access to as many different dogs as there are students or pairs of students; the larger the data set the more interesting the data analysis. However, multiple students or groups could also use the same dog's cheek cell sample. The lab presupposes no previous hands-on experience in molecular biology, and the only wet-lab manipulations (Weeks 2-3) involve setting up a PCR reaction, purifying the product, and running an agarose gel to confirm the yield. This lab was designed specifically to be a good first introduction to molecular biology, simple gel electrophoresis, and the use of pipettors for computer science, chemistry, physics and math students. The in silico investigation (week 1) builds on prior experience with Clustal and phylogenetic tree analysis, and the final data analysis reinforces the students' experience with Phylip or another tree estimation program (although Clustal can also be used to generate the neighbor-joining tree). The DNA sequencing can be performed in house, if the institution has its own sequencer, but the PCR products can also be sequenced commercially. The data analysis can be completed outside of lab.

Description and Teaching Materials

Week 1: Instructor introduces project, students design primers (computer lab needed)

  • Behind the scenes between weeks 1 and 2: Instructor orders primers

Week 2: Instructor or students collect dog cheek cells from multiple dog donors and prepare genomic DNA (extract can be stored refrigerated for at least 1-2 days)
Students set up PCR reactions

  • Behind the scenes between weeks 2 and 3: Instructor may need to reamplify PCR product to enhance yield

Week 3: Students pour an agarose gel, purify the PCR products using a kit, check yield of their PCR reaction on the gel, and determine concentration of the product

  • Behind the scenes between weeks 3 and 4: Instructor sends purified PCR products out for commercial sequencing (Alternatively, students and/or instructor sequence DNA in-house)

Week 4: Students receive chromatograms and/or retrieve sequences from sequencer, analyze data, construct trees (can be done out of class) (access to computers needed)

Lab handout for week 1 (Microsoft Word 245kB Jul19 07)
Lab handout for week 2 (Microsoft Word 55kB Jul19 07)
Lab handout for week 3 (Microsoft Word 57kB Jul19 07)
Lab handout for week 4 (Microsoft Word 125kB Jul19 07)
Data analysis handout (Microsoft Word 26kB Jul19 07)
Sample class results (Text File 7kB Jul19 07)
Sequence File (Microsoft Word 3.2MB Jul19 07)
Genetic Structure of the Purebred Domestic Dog (Acrobat (PDF) 610kB Jul19 07) (required reading for Data Analysis)
The canine olfactory subgenome (Acrobat (PDF) 629kB Jul19 07) (background reading)
Dog Phylogenetic Tree (a 8579 by 750 pixel JPEG)

TMPred Lab Handout (Microsoft Word 33kB Jun8 11) Handout for prediction of transmembrane domains in an olfactory receptor sequence using the web-based resource TMPred
Cf0184 Sequence file (Microsoft Word 23kB Jun8 11) Amino acid and nucleotide sequences for one canine olfactory receptor
Perl script (Text File 1kB Jun8 11) Perl script for prediction of transmembrane domains
Perl script output (Text File 488bytes Jun8 11) Sample output of perl script for the canine olfactory receptor sequence provided here. Each line indicates the length of one hydrophobic region identified, followed by the characteristics of the sequence immediately preceding the putative TMD, the position and composition of the domain itself, and a score (the fraction of the region that is not hydrophobic; acceptable range is below 0.43). "+" represents non-hydrophobic residues, "-" represents Val, Trp, Phe, Ala, Met, Ile, Leu, or Tyr ("VWFAMILY").

Teaching Notes and Tips

Background: Olfactory receptors (OR) are G-protein coupled proteins containing seven membrane-spanning domains. They are encoded by the largest gene family known in the mammalian genome. As students learn from the background reading for this lab (Olender, et al., Genomics 83:361-372 (2004)), OR genes are found in clusters throughout the mammalian genome. A large number of OR genes has been cloned and characterized in dog, rat, mouse, and human; sequencing and comparison of the genes has revealed that the genes are most polymorphic (contain the highest proportion of single nucleotide polymorphisms, or SNPs) in transmembrane domains IV and V, which are in fact responsible for ligand recognition and binding (Quignon, et al., Genome Biology 6:R83 (2005). However, it should be noted that there is no information yet correlating specific SNP's with the ability to detect particular odorants. Students may hypothesize that dog breeds reputed to have an especially keen sense of smell will have a more highly polymorphic set of genes, and this is something they can assess (at the level of one gene subfamily) from the class data.

Dog Sampling: Instructors can recruit dogs belonging to colleagues for sampling. Digital photos of the dogs can be assembled into a powerpoint "facebook" and posted electronically so students can choose the dog they wish to study. In our experience, students take intellectual ownership of 'their" dog and frequently want to share their results with the professor or staff member whose dog donated its cheek cells. On a commuter campus, students could sample their own dogs and might feel even more attached to the results. Note that many breeds popular as pets (particularly in certain geographical regions of the country) are clustered at one end of the published population structure. It is most interesting if you are able to get at least a couple samples from some of the more diverse breeds at the left end of the population map (see Fig. 3 of Parker, et al.)

As outlined above, the faculty member obtains the dog cheek cells and isolates the genomic DNA in advance of the lab. Students could obtain the dog cheek cell samples themselves, but this is time consuming and logistically difficult on a residential campus. However, if the dogs can be sampled earlier in the day, students could perform the genomic DNA isolation themselves, using a procedure that is frequently employed in high school and college labs to isolate human DNA from cheek cells. Most dogs are relatively amenable to the simple cheek swab, but some dogs are wary and a few prove impossible to sample.

InstaGene matrix is available from BioRad (catalog #732-6030).

Primer design: There is a trade-off between allowing each student or pair of students to order their own primers (maximizing sense of investment and empowerment) and using a common set of primers for all students (minimizing cost while making it possible to directly compare all sequences). The best solution (and the one that incorporates the most student reflection and discussion) seems to be asking pairs of students to work together to design a set of primers, and then working as a class to reach consensus on one or two sets that will be ordered and used by all. One set of primers that worked well for CfOR0184: 5' CCAAGAAAACAGAAAGCAGTATTTG and 5' CATGCATATGGCTCCACTGTTAG.

Quality of the DNA and the sequence data: The genomic DNA prep is of sufficiently high quality to obtain PCR products with many primer sets. However, it is frequently necessary to do a second round of amplification, using 1 uL of the product of the first amplification as a template, to get sufficient product for purification and sequencing. This second amplification could be performed "behind the scenes" by the instructor or TA. By using a couple alternative pairs of primers, a class can enhance the chances of getting usable products, and instructors can use the topic of alternative primers to introduce a discussion of annealing temperatures and optimizing PCR conditions.

PCR: The recipe for the PCR reaction is intentionally left for the student to fill in, as a way to stimulate student thought and discussion among students on how PCR works and how to perform the necessary dilution calculations. However, a prepared master mix could also be provided in a more introductory-level course.

Sequencing: PCR products must be purified (using a Qiaquick kit from Qiagen or another similar product) before being sequenced. Product concentration can be estimated from a gel photo (the faintest band that can be seen is approximately 10 ng of DNA) or determined directly using a Nanodrop spectrophotometer if one is available. Requirements for commercial sequencing are available from the individual sequencing facilities (e.g. http://www2.umaine.edu/dnaseq/index.htm, although other places are less expensive). Typical requirements are 5 uL of product at 10 ng/uL. Note that a primer (one of the same primers that were used to perform the PCR) must be supplied by the customer. Data from a commercial sequencing facility should be obtained as raw chromatograms so that it is possible to see polymorphisms (detected as multiple peaks at a given position). The University of Maine website has links to various sequence viewing programs, although the one listed for Mac does not work on the new Intel machines.

Teachable moments: As the students analyze the existing canine olfactory gene tree and align the sequences (week 1), they discover that the target gene is part of a large subfamily with many highly homologous members. At this point the students realize that they will have to use the alignment tools they have already learned (e.g. Clustal) to choose the most appropriate regions to design their primers. Some students start by aligning only very closely related genes, while others immediately include more distantly related genes. As they start to realize that it will be impossible to amplify just the single gene of interest, they begin to think about the implications of this gene diversification for the sequencing: in any given dog, it is likely that their amplified product will be a heterogeneous mix of genes. This sets the stage for a discussion about the possibility of subcloning individual genes into vectors and amplification through E. coli of each subclone, followed by sequencing of the clonal isolates. It can also serve as the springboard for a discussion of the new single-molecule sequencing technologies (e.g. 454) that would circumvent this problem.

In performing the final data analysis, students discover that the tree they generate with their aggregate class sequence data may not fit very well with the published population structure. This leads to a consideration of the key point that a gene tree is not the same as a genome-wide phylogenetic estimation. The difficulties inherent in estimating trees with populations such as dog breeds which have been mixed to create new breeds is a starting point for a more detailed consideration of what a tree does and does not represent.

Optional module on predicting transmembrane domains:As canonical members of the seven transmembrane domain (TMD) family of proteins, olfactory receptors are a natural lead-in to a discussion on the features common to membrane-spanning regions (stretches of approximately 20 hydrophobic amino acids, uninterrupted by charged residues). A brief lesson could include use of a web-based resource such as TMPred to predict the locations of the TMDs in the olfactory receptor sequences under investigation in this lab; this tool also provides predictions about the topology of the protein. For instructors familiar with PERL programming, this module represents an opportunity for an integrated in silico unit in which students are asked to write or modify an existing Perl script that predicts the location of the TMDs by searching for strings of hydrophobic amino acids. A simple mnemonic that conveniently summarizes the most common residues in a TMD in single-letter amino-acid code is VWFAMILY (LeBlanc and Dyer, Perl for Exploring DNA, Oxford University Press, 2007). We have provided a sample Perl script and output. In our course, very few of the students have any prior programming experience, but all the students have had a brief (one 3-hour lab and some homework) introduction to PERL prior to this lab. The instructor walks the students through developing the initial Perl script before introducing TMPred, and students then work on developing their programming skills by modifying the code to optimize the output so it more closely resembles the TMPred results.

Assessment

Throughout the multi-week project, students are asked to answer questions and analyze data (e.g. agarose gel results) in their lab notebooks. Calculations are required at several stages of the wet-lab procedure (weeks 2 and 3) to assess student understanding of underlying concepts and to provide practice with routine dilutions. Primer sequences and these calculations are turned in to and/or checked by the lab instructors. In-class discussion at the end of the primer design process provides an opportunity to ensure that all students understand the constraints on the primer design, and is the time when one or two consensus pairs of primers can be chosen by the group.

A longer written analysis of the data is required at the end of the project. The questions posed are available in the Data Analysis lab handout. Although the format is not specified, this could take the form of a full lab report structured like a scientific paper, or it could be a narrative report without a full materials and methods section.

References and Resources

EMBL's ClustalOmega alignment

Background reading (linked file under Teaching Materials):

Olender, T., T. Fuchs, C. Linhart, R. Shamir, M. Adams, F. Kalush, M. Khen, and D. Lancet (2004) The canine olfactory subgenome. Genomics 83:361-372 (Students should be advised not to get bogged down in the details of this paper, but rather to focus on the big picture.)

Reading for data analysis (linked file under Teaching Materials):

Parker, H.G., L.V. Kim, N.B. Sutter, S. Carlson, T.D. Lorentzen, T.B. Malek, G.S. Johnson, H.B. DeFrance, E.A. Ostrander, and L. Kruglyak (2004) Genetic structure of the purebred domestic dog. Science 304:1160-1163 (Students will refer to Figure 3 in comparing their tree to the population structure based on a genome-wide analysis.)

Additional background information for instructors:

Tacher, S., P. Quignon, M. Rimbault, S. Dreano, C. Andre, and F. Galibert (2005) Olfactory receptor sequence polymorphism within and between breeds of dog. J. Heredity 96:812-816 (Tables 2 and 3 present known polymorphisms in various dog breeds and should be appended to the handout for Week 1.)

Firestein, S. (2001) How the olfactory system makes sense of scents. Nature 413:211-218 (This article provides a useful review of the anatomical and neurological basis for olfaction. Figure 2 nicely depicts the concentration of SNPs in transmembrane domains IV and V.)

Quignon, P., E. Kirkness, E. Cadieu, N. Toouleimat, R. Guyon, C. Renier, C. Hitte, C. Andre, C. Fraser, and F. Galibert (2003) Comparison of the canine and human olfactory receptor gene repertoires. Genome Biology 4:R80 (This article describes the presumed evolution of the OR gene family by successive duplications, and the apparent expansion of the canine OR repertoire relative to humans.)