Using Metagenomics to Investigate Microbial Diversity

Authored by David J. Esteban, Vassar College, Department of Biology. Collaborators: Lois Banta, Williams College Department of Biology; Doyle Ward and Bruce Birren, Broad Institute Microbial Genome Sequencing Unit. With assistance from Elizabeth Collins (Vassar College)


Using Winogradsky columns, a soil enrichment culture, students explore microbial diversity through metagenomics.

The Winogradsky column is a complex community of interacting microorganisms. In a community such as is present in soil, we know there is an abundance of bacteria but we don't really know the true extent of the diversity and composition of the microbial community. Culture based-techniques are limited to analysis of the bacteria that grow in the lab, and most bacteria don't grow under normal lab conditions. Therefore, we need non-culture based techniques to investigate the diversity of this community. In this module, students investigate the microbial diversity in distinct layers of the Winogradsky column using high throughput sequencing of 16s rRNA sequences. Students extract genomic DNA from the Winogradsky column, the 16s rRNA genes are amplified by PCR, and the products are sequenced. The obtained sequences are then classified and analyzed using bioinformatics tools, and metabolic activities in the different regions of the columns are inferred from the species present.

To make a Winogradsky column, mud is collected from a pond or riverbank and added to a plexiglass cylinder along with a source of cellulose, such as leaf litter, and additional sulfate to promote enrichment for microorganisms involved in the sulfur cycle. Over a period of months, layers of microorganisms requiring a range of environmental conditions develop in distinct niches with distinct populations participating in diverse metabolic activities. As various metabolites in the column are used, byproducts are produced, and the environment in the column changes. As a result of changing concentrations of oxygen, hydrogen sulfide, and variations in other metabolites, different microbes will thrive in their own niche. Bacterial growth is seen as changes in color from the original grey-brown mud to a rich pallet of reds and greens. The colored zones are due primarily to phototrophic microbes. In the upper, aerobic layers, we can expect photosynthetic eukaryotic microbes; in the lower, anoxic layers, we can expect purple and green, phototrophic bacteria. These phototrophic bacteria utilize various sulfur sources as electron donors for photolysis. The vast majority of these microorganisms cannot be cultured independently under normal laboratory conditions.

Learning Goals

  • To use metagenomics to examine microbial diversity in a soil enrichment culture (winogradsky columns)
  • To learn the concepts and principles behind the technology of metagenomics, how it is used and how it is changing microbiology
  • To learn about the metabolic diversity of bacteria, and how different environmental conditions influence the diversity and metabolic activities of bacteria
  • To develop molecular biology laboratory skills including DNA extraction, PCR and gel electrophoresis
  • To develop skills in analyzing DNA sequence data using bioinformatics tools including Ribosomal Database Project (RDP), phyolgenetics software and perl scripts
  • To learn how DNA sequence data can be used to infer biological activities occurring in specific environments

Context for Use

The module is designed for an intermediate level Microbiology course. In the two institutions where we have developed and implemented this module, the class is comprised primarily of sophomores and juniors but also includes a few seniors. Class size is limited to 18-24 students, but larger groups could also be involved. Over time, we hope to collect data from ponds in a variety of ecological settings, and welcome contributions of datasets from other instructors.

Description and Teaching Materials

The module begins with discussion of high-throughput DNA sequencing technology and its application to studying microbial diversity. Students then work in the lab to extract DNA from the soil samples and perform a PCR to amplify a portion of the 16s rRNA genes. The samples are sent for sequencing; in the absence of access to large-scale sequencing capabilities, previously sequenced samples (provided here) can be used for the data analysis. Students complete the module by analyzing the obtained sequences and write a paper or presentation on their results.

This module involves a mix of wet-labs, computational labs, and lecture to investigate the community composition of soil using high-throughput DNA sequencing. The course is designed with two 75-minute class periods per week and two 2-hour lab periods per week. Some "lab" activities are done in class time and some "class" activities are done during lab time; since the module attempts to eliminate traditional separation of lab and class, these time slots are only meaningful in the sense that some activities require more time than others. The module takes 3 weeks (2 weeks for the wet lab component, 1 week for the data analysis). An addition week of class or lab time is needed if student presentations are included. Alternatively, the module can be performed over the course of 3-4 stand-alone lab periods, integrated with lecture/discussion material covering microbial metabolism (e.g. sulfur cycling and photosynthesis) and/or metagenomics.

Wet Lab: Winogradsky columns are prepared several weeks before this lab commences, by the instructor or by one or more students. Distinct layers, composed of microbial populations that thrive in the different local environmental conditions, develop in the Winogradsky column. Students begin this module by extracting total genomic DNA from soil obtained from different layers of the column. A commercial DNA extraction kit is used, following a spin-column approach common to most nucleic acid extraction kits. The purified DNA is then subjected to PCR to amplify the 16s rRNA genes present in the sample. The products are analyzed by gel electrophoresis to verify success of the procedure. The PCR product, which is a mixture of different 16s rRNA genes can then be cloned and sequenced by a sequencing center. Barcoded primers would allow analysis using next-generation sequencing technologies.

Data Analysis: Using data obtained in the wet-lab portion of the lab, or the dataset provided here, students classify the sequences and analyze the distribution patterns of the microbial community. Sequences are uploaded to the Ribosomal Database Project, classified and results files are downloaded. Results files are then manipulated with perl scripts that separate the data into phyla, genera and other taxonomic levels. Data from different layers of the Winogradsky column are merged using perl scripts to allow comparison of the communities present. Data are graphed in Microsoft Excel. Using phylogenetic trees, students investigate the relatedness of the members of the microbial community. Although use of perl scripts requires use of the command line interface on the computer, computer science background is not needed.

Assessment: Students are given instruction on the various analysis tools and a series of questions that must be addressed in a paper that can be answered using the provided data and tools. These questions focus on the patterns of distribution of microorganisms in the column and their role in nutrient cycling. In addition, instructors can have students give presentations on one or a few of the microorganisms present in the Winogradsky column, describing their distribution, metabolic properties and role in the soil ecosystem.

Module Handout (Microsoft Word 105kB Jun6 11) Background, schedule, mechanics of the module, and details of the experimental procedures in the lab including DNA extraction from soil and PCR.
Data Analysis Handout (Microsoft Word 947kB Jun6 11) Data analysis procedure and the student assignment.
SeparateRanks Perl Script ( 3kB Jun6 11) A perl script that takes the taxonomic assignments of 16s rRNA from the Ribosomal Database Project (RDP site) and separates them into the different taxonomic levels (specific instructions for purpose and use are described in the Data Analysis document).
Trim Perl Script ( 1kB Jun6 11) A perl script to clean up RDP data output to make analysis easier. (Specific instructions on purpose and use are described in the Data Analysis document).
Merge Perl Script ( 2kB Jun6 11) A perl script to combine the taxonomic data from different samples into a single document to aid in data analysis. (Specific instructions on purpose and use are described in the Data Analysis document).
Heat-map Perlscript ( 16kB Jun8 11) A perl script to generate a heat-map of the relative abundance of different organisms in each sample, starting with the RDP data set. Organisms are listed on the vertical axis and samples are displayed along the horizontal axis.
Heat-map Readme File (Text File 2kB Jun8 11) Documentation for the heat-map perl script.
Winogradsky Columns Data File (Zip Archive 6.4MB Jun8 11) Sequence data files for the columns described in the data analysis handout
Preparation of Winogradsky Columns (Microsoft Word 2007 (.docx) 13kB Jun8 11) Protocol for preparation of Winogradsky columns

Teaching Notes and Tips

Optional Component on Cultivation and Characterization of an Unknown Bacterium from Column Layers:

One of the key features of metagenomic analyses is the ability to identify the vast majority of microbes in a community that cannot be cultivated. An optional component of this lab module involves having the students cultivate microbes from each of the column layers by streaking the mud samples on a variety of media types. The distribution of microbial morphologies on the resulting plates typically looks remarkably similar across all layers (as illustrated in the sample data below); comparison of these results with the output of the 16rDNA analyses reinforces this important take-home lesson. (The plates can serve as sources for unknowns for characterization by traditional biochemical, metabolic, and molecular characterization, e.g.

Cultivation of bacteria from column layers (Microsoft Word 38kB Jun7 11) Lab handout for cultivation of bacteria from column layers on a variety of media.

Photo of bacteria from different layers of a column cultivated on various types of media (Microsoft Word 405kB Jun7 11) A-D indicate layers from the column (top to bottom); abbreviations along the bottom indicate types of bacteriological media (see lab handout on cultivating bacteria from the column layers for full names and descriptions of the media types).

Data Analysis:

Many students need more personal attention for the data analysis than for typical "wet labs."

Perl scripts were designed to manipulate the output from RDP, which is not very user friendly. The scripts were designed by Computer Science students for this purpose. If desired, similar scripts could be written by interested students rather than using the ones provided. We have found some problems with the scripts - there appears to be a counting error such that more sequences are classified into taxonomic groups than there are sequences. (Once we have identified and fixed the error the corrected scripts will be posted). Although most students have never used the command line interface, the instructions provided are sufficient to guide students with no experience. Most problems are due to students not typing in the command exactly as shown, or not placing the perl script in the correct directory.

For the heat-map generation, the current perl program lacks two features: printing to multiple pages, and labeling the x-axis of the heat map. Adding the ability to print to multiple pages may not be too difficult for a computer science student to develop, but may require interfacing with the Unix shell to string together multiple Postscript files. In its current form, samples on the heat map are ordered by the alphanumeric ordering of the filenames; this information is not inherent in the filenames or within the fasta file itself. Students with programming experience could modify the program fairly easily to label each vertical column, by creating a feature in which the user creates a text file with the order of the files and a short description, and the program reads this file along with the rest of the RDP files and generates the x-axis.

Performing the lab as in this module requires the use of high throughput DNA sequencing , which is expensive for an undergraduate laboratory. Possible modifications of this module include analyzing data from previously sequenced samples, such as those provided in the file above, or sequencing a smaller numbers of clones.

Preparation of the Winogradsky Columns:
Several links to resources and background on Winogradsky columns can be found at The protocol provided above directs instructors or students in the preparation of Winogradsky columns, which must be done several weeks to months before the lab begins. We have had even better luck with the kit from Carolina Biological (catalog number 703490), which provides all the necessary reagents at a very reasonable cost and gives reproducible patterns of colors. If you choose to follow the protocol provided here, you can purchase the plastic columns and stoppers separately from Carolina. Individual layers of either type of column can be sampled by drilling through the plexiglass (sterilize the drill bit with ethanol between samples) and using a metal spatula and/or a pipet to remove material, or by freezing the entire column, then sawing through the plexiglass with a hacksaw and thawing each fraction individually. The latter approach allows you to more effectively scrape the inside wall of the column to harvest the photosynthetic organisms.


1. Data Analysis: Skills and knowledge are assessed in the assignment. The use of the analysis tools is demonstrated in class, then students individually work on analysis of specific samples.
2. Presentations: Students make a short presentation describing the characterization and metabolic activities of a particular bacterium identified in their sample.
3. Exam Questions: the midterm exam has some questions on material covered in this module

4. Attitude Assessment: Students were asked nine questions about their attitudes on the use of Bioinformatics/Genomics tools and whether they understood the value of these tools in research in microbial diversity. Metagenomics to investigate Microbial Diversity Attitude Assessment (Acrobat (PDF) 72kB Jun21 09),

References and Resources

Lee, L., Tin, S., and Kelley, S.T. (2007) Culture-independent analysis of bacterial diversity in a child-care facility. BMC Microbiology, 7:27

Rogan, B., Lemke, M., Levandowsky, M., and Gorrell, T. (2005) Exploring the Sulfur Nutrient Cycle Using the Winogradsky Column. American Biology Teacher 67(6); 348-356

Sleator, R.D., et al. (2008) Metagenomics. Lett. Appl. Microbiol. 47; 361-366.

Ley et al. (2005). Obesity alters gut microbial ecology. Proc Natl Acad Sci USA,102(31), 11070-11075.

Turnbaugh et al. (2006). An obesity associated gut microbiome with increased capacity for energy harvest. Nature,444(7122), 1027-31

Mardis, E. (2008). Next generation DNA sequencing methods. Annual review of genomics and human genetics 9, 387-402.