Whole Genome Data



This section is awaiting the arrival of genome sequence data from Cofactor Genomics



Transcriptome sequencing allows biologists to get a quick, but incomplete, inventory of the protein-coding genes, because it is limited to genes that happen to be expressed under a restricted set of condition(s). A "whole genome" sequence, on the other hand, is a complete inventory of the protein-coding genes, as well as the non-coding set of promoters, tRNAs, small RNAs, genomic repeats, etc. The challenge with whole genome sequences lay in 1) the assembly -- correctly piecing together the genome from millions/billions of small reads and 2) the annotation -- correctly predicting the location and structure of the genes, for example.


For Aiptasia the first questions to address might focus on practical questions about how well the sequencing effort has covered the genome


CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes


Data
Dataset of core genes