Teaching Genomics > Inquiry-based Integrated Instructional Units > Genomic word composition applications of dynamic programming

Genomic word composition applications of dynamic programming

Eliot, Tim, Pilar, Daniel
Summary

The activity is under development as part of a July 2007 workshop

This module asks students to identify underrepresented words in a bacterial genome. To do so they must first implement a recursively defined measure of word abundance. As they will discover, this algorithm becomes unacceptably slow at large word sizes. They must then implement a dynamic programming version which runs more quickly.

Learning Goals

Introduce dynamic programming
Compare recursive algorithms to dynamic programming.
Explore genomic applications as a career pathway in computer science.
Use algorithms to infer biological functions.

Context for Use

This module consists of two lectures, a lab and homework. It could be used in an upper level computer science or computational biology course.

Description and Teaching Materials

Lecture1: Set up word frequency problem. Certain words in bacterial genomes can be under or over-represented. Measures of biase, probability, expected freqs.

lab:
Download a bacterial genome
Use genbank file to go over concepts in genomic annotation.
Obtain all noncoding sequence from this genome.
Run compseq to get counts

Lecture2: teach dynamic programming. Present recursive algorithm for bias.

Homework1: What's the most underrep dinucleotide in this genome? Students implement recusive alg. Provide them with some parts of the code to make easier.

Homework2: What is the most underrepresented 6 bp word in this genome. Recursive is too slow. Must try DP.

Teaching Notes and Tips

Assessment

Students would be graded primarily on their programs, but also on their ability to interpret the biological results.

References and Resources

See more Inquiry-based Integrated Instructional Units »