You may be intrigued by a gene, gene family, or set of genes you've learned about in another species. One approach to genome exploration is to hunt for genes that have DNA or predicted amino acid sequences that are homologous to your genes of interest.
Getting started with a candidate gene approach
- To begin thinking about candidate gene approaches, try the linked exercise that allows you to align the Chamaecrista transcriptome with the sequenced soybean (Glycine max) genome. Once you've completed this exercise, explore the rest of this page and begin to develop your strategy for taking a candidate gene approach to your research.
Soybean Genome Browser Exercise
- Find your candidate gene(s).
- You might find your genes of interest through the literature (see the page on Exploring Chamaecrista biology).
- You might choose to work with the most highly expressed genes in the transcriptome and try to find out the function of these genes
- You might search the National Center for Biotechnology Information (NCBI) Entrez website which will allow you to search by key words across a very large number of databases ranging from literature to sequences.
- As you collect information about your sequences, be sure to copy and paste that information into your notebook in the Chamaecrista Genomics Explorer.
- Here's an example of candidate genes (Excel 69kB Jan11 10) from Arabidopsis and soy sequences that might map to assembly reads for Chamaecrista. See what you can find. For the curious, the Panther annotations are algorithmically assigned gene functions.
- Decide whether you are going to use a nucleotide or amino acid sequence. If it is nucleotide sequence, you want the cDNA or mRNA sequence rather than a genomic sequence to search our transcriptome databases.
- Obtain the sequence for your gene from NCBI using Entrez , Legume Information Services (LIS), or Phytozome.
- Put your sequence(s) in FASTA format in a text file (e.g. Text Edit on a Mac) and save it in your journal in the ChamaecristaGenomics Explorer as well.
- If you are downloading from NCBI, your sequence will be in a database called Genbank. In the upper left region of the menu, you can select the display button and choose FASTA format.
- You can create your own FASTA formatted text very quickly using the directions at NCBI
- You can string all your sequences together in one text file if you would like. Just remember to start on a new line.
UAG - Stop and ReflectYou are interested in finding the Chamaecrista gene that is homologous to the flowering time (FT) gene in Arabidopsis. Would you use the mRNA/cDNA Arabidopsis sequence or the predicted amino acid sequence for that Arabidopsis sequence in your search? Why? Respond in your journal
UAG - Stop and ReflectYou decide to translate a cDNA sequence into a peptide sequence. What would you need to consider in predicting the peptide sequence? Respond in your journal
Once you have your sequence(s) you're ready to use BLAST
- BLAST (Basic Local Alignment Search Tool) is a tool that allows you to compare sequences to find the similar sequences. You can BLAST a single sequence or multiple sequences against a database at one time.
- Click here for a Blast tutorial.
- You BLAST "against" known data sets to find similar sequences.
- NCBI has a huge repository of sequences from many organsims that can be searched using BLAST on their website.
- LIS) and Phytozome are focused on legumes and are searchable with BLAST.
You can BLAST known sequences against the Chamaecrista transcriptome locally at the Blast Chamaecrista site. The dropdown menu will allow you to choose the database you want to BLAST. You will want to read the description of the different assemblies before selecting one in the local BLAST. (Text File 5kB Apr6 10)
Give it a try
Here are two text files you can use to practice BLASTing at the public sites and our local Chamaecrista site. The first FASTA file is for the Arabidopsis FT (Text File 966bytes Jan19 09) sequence and the second is for a number of Arabidopsis shoot genes (Text File 301kB Jan19 09). See what happens when you BLAST a whole bunch of files at one time.
You'll probably want some more information to interpret how similar sequences are that you obtain in your BLAST search. Check out NCBI for details.