What is Transcriptome Sequence?

An organism's genes are expressed, that is transcribed from the genome's DNA code into messenger RNA (mRNA) code, and subsequently, may be translated into proteins that function within the organism's cells. Some genes are expressed more than others, some at different life stages, and some at different times according to environmental conditions or the state of the organism or the individual cell. This complex regulation of gene expression means that the organism's cells can respond to their environment, and to the development of the organism.

We are often interested to see how gene expression changes under different conditions, and we may also be interested to know which part of the organism's genome is used to code active genes, and which parts are inactive. In this case, it is possible to carry out a laboratory experiment to extract the mRNA molecules from an organism's cells, and then to 'sequence' them. In other words, to find out the composition of the genetic code making up the individual genes, and to estimate the relative quantities of transcription which have occurred in one condition as opposed to another.

The set of genes which are transcribed in any one condition is known as the transcriptome, and the process of determining the genetic codes contained in the transcriptome, and their relative proportions, is known as transcriptome sequencing.

Extracted mRNA molecules are first used to generate corresponding coding DNA (cDNA) molecules in a process called reverse transcription. This uses an enzyme to carry out the reverse process corresponding to the original expression of the gene. The cDNA molecules can then be sequenced in the same manner as an organism's genome sequence is determined.

Once a transcriptome has been sequenced, we can use the information to identify which parts of the genome are used to code for active genes. If several transcriptomes have been sequenced under different growth or environmental conditions, we can start to estimate which genes are more important in which biological processes.

Since the transcripts are the products of transcribed or 'active' genes we can also use this sequence information in multiple sequence alignments using transcriptome sequence from different individuals - for example the parents of a mapping population. We use computer software to scan the multiple sequence alignments for single nucleotide polymorphisms (SNPs).

The SNPs identified can then be used to construct a panel of markers that differentiate between the two parental lines. These SNPs can then be used to screen members of a mapping population derived from the cross between the two parental lines. The information obtained allow us to construct linkage maps. Since we can place the SNP to a relative chromosomal location, we can then add the sequence data that surrounds the SNP to that location. If the SNPs were derived from genomic DNA sequence that has been assembled into longer contiguous sequences, then these can also be anchored onto the linkage map in a similar fashion.