What is a Genome Sequence?

Each plant cell's nucleus contains the chromosomes encoding its genetic information in the form of DNA molecules. Additional DNA molecules are also present in the cell's chloroplasts and mitochondria. Genetic information is coded in the DNA molecules in an 'alphabet' comprising the codes A, T, C, and G. These 'letters' represent the bases Adenosine, Thymine, Guanine, and Cytosine which form subunits within the overall DNA molecule. The subunits are chemically strung together in a long sequence, in the region of 10s or 100s of millions of individual letters per chromosome.

A DNA sequence is a list of these letters, in a certain order, and the 'letters' are called nucleotides or bases. The exact sequence of bases is different in every living organism, with few exceptions such as identical twins, or plants which are created by division, such as clonal propagation for instance. The process of sexual reproduction enables the DNA sequences from the female and male's chromosomes to be intermixed to generate a unique genome in each offspring.

An organism's functioning genes form a small part of this sequence, and between each pair of genes is typically a long stretch of other sequence which has regulatory, unknown, or no function.

Through laboratory analysis, it is becoming possible to take the chromosomes from individual cells from plants, or any organism, and to work out the specific sequence of the bases A, T, C and G that are unique to that organism. The process of 'sequencing' a genome, as this is called, is not simple however, as the chromosomes are much too long to sequence in one go. In fact, only very short stretches of DNA can be accurately sequenced at once - up to around 1000 bases.

The solution to this problem that has been adopted so far is to take many copies of the chromosomes from an organism, and to cut or break them up into very many short fragments (millions), then to sequence each fragment. By looking at the places where the sequence of bases in one fragment seems to overlap the sequence of bases in another fragment, it is possible to overlap the fragments one by one, and eventually, to work out the whole sequence of the genome. This process is called genome sequence assembly, and is usually carried out by specialised computer software.

Sequence assembly is especially complicated in plants because some parts of the genome are repeat copies of other parts. When the software sees a repeated region it cannot tell which of the copies the short fragment came from, so is unable to accurately assemble the whole genome sequence. This is one of the reasons (other than cost!) why, so far, only a small number of plant genome sequences have been published.