RNA was isolated from shoot tips of different ages in the MN, KS, and OK ecotypes that were originally described by Julie Etterson. Purified mRNA was converted to cDNA and sequenced using Illumina/Solexa. This linked table (Microsoft Word 2007 (.docx) 41kB Jan19 09) contains information on the number of sequences obtained for the different ecotypes.

Puzzled?

Having trouble making sense of the SNP table? Click here for some tips


SNP variants (Text File 52.2MB Jan19 10) were identified. The linked SNP table can be a bit tricky to understand. To first get a feel for the data, open the file in Excel. Be sure to use the office 2008 version or you will not see all of your data. The first column contains the transcript sequence ids. Each transcript is given a "contig" or "cf" identifier. You will find 3 columns that are numbered 3, 1, and 2. Number 3 refers to OK, 1 to MN, and 2 to KS.

To inspire you to tackle this huge amount of data, here is an example table (PowerPoint 2007 (.pptx) 73kB Jan19 09) showing how you can combine candidate gene information and SNP information.

UAG - Stop and Reflect

How might you be able to use the SNP data to look for overall patterns that might help you distinguish the three ecotypes?Respond in your journal


Puzzled?

Curious about how to identify SNPs that show the greatest amount of variation among the ecotypes? Click here for some tips


Synonomous vs nonsynonomous SNPs

Not all SNPs are the same. Some base pair changes result in a change in the amino acid coded for by the nucleotide triplet (nonsynonomous SNPs), while other change the nucleotide sequence, but not the peptide sequence (synonomous SNP).


UAG - Stop and Reflect

Using the amino acid codon codon table pick an amino acid and show how a SNP could create a synonomous mutation. Then, show how a different SNP could result in a nonsynomous mutation. Predict which position in three nucleotide codon a SNP insertion will most likely cause a synonomous SNP and explain your reasoning. Respond in your journal


Protein predictions

Nonsynonomous SNPs will have a greater or lesser effect on protein structure depending on the change in the R group of the predicted amino acid. If you would like to try your hand at protein modeling, check out the information on the SERC Protein Structure page. An alternative is to try the protein structure predictor available on Biology Workbench. You will need to set up a Biology Workbench account which will allow you to use a range of tools at that site. Click here to access a tutorial that will show you how you can use Biology Workbench for protein prediction and visualization.

To work on protein structure prediction, you need to know where the SNP is in a sequence (you have this in the SNP dataset, but you also need the sequence.

Assembled Chamaecrista Sequences

Files contain different versions of the transcriptome assemblies. Most of the time you will refer to version 1.1, but 1.0 can be helpful if you can't find a sequence for a gene of interest. These files open in Text Edit or Notepad (also Word). You would use them when you want to match a sequence ID with the actual sequence. The find function in the text editor is an efficient way to find a sequence ID.
Descriptions of each assembly version (Text File 5kB Apr6 10)
Version 1.1 cDNA ( 9.8MB Jan11 10)
Version 1.1 peptides ( 3.4MB Jan11 10)
Version 1.0 cDNA ( 31.9MB Jan11 10)
Version 0.5 cDNA ( 33.9MB Jan11 10)
Assembly of 454 cDNA sequences only ( 123.1MB Jan11 10)

If you want to look at longer sequences, you might want to compare some Illumina/Solexa sequences with the 454 sequences. For example, you might want to BLAST one dataset against another. You have all these assemblies available at the Blast Chamaecrista site. You'll need a username (Carleton) and password (blastnow) to do your searches there. To compare the sequences, you may want to use an alignment tool. Alignment tools are available on Biology Workbench.

When RNA is translated (remember your cDNA represents mRNA), it is read in frame. The SIXFRAME tool in the nucleic acid tools section on Biology Workbench lets you determine the correct reading frame.

UAG - Stop and Reflect

Why would an algorithm be designed to look at six different reading frames to determine the correct reading frame? What information do you think the SIXFRAME tool uses to determine the correct sequence? Respond in your journal