Phase 1: Sequencing, assembly and annotation of the genome

Sequencing of the genome:

The first phase of the study is to determine the sequence of A’s, T’s, C’s and G’s in the Agapornis genome. We selected one young male bird for this purpose. Blood was drawn from this bird in 2015 and through several laboratory processes we got the sequence of this bird’s genome. Of course not all birds will look exactly the same, but the overall composition of DNA will be 99.999% the same. To explain how a genome is sequenced, view the YouTube video below. The video is by TED-Ed and Mark J. Kiel.

Assembly of the genome:

Once the genome is sequenced we have a lot (and I mean a lot!!) of data, but it means absolutely nothing yet. We need to build the puzzle of the genome and it is like building a million piece puzzle without having the final picture! The sequencing technology we have available today only allow us to get the sequence of around 100 – 300 bases at a time. In order to get the complete sequence of the genome, thousands of copies of the bird’s genome are chopped randomly into small pieces of around 100 bases and these are called reads. Because there are so many different copies of the same genome and the reads are chopped randomly, the different copies overlap each other. These overlap reads are called contigs. For example:

The overlaps from the contigs are used to form an even longer read called a scaffold, for example:

ATTCCTTGCTATCCGGGAATCCGATTAAATGATCTGGATTACCCTTCCATTCATAAGGGAACCTAGC

And different scaffolds that overlap give the sequence of the full genome. Unfortunately, there will always be holes in the assembled genome where there were no overlaps found in the data. These are however the minority.

The magnitude of the data is so large that these computations cannot be done on a normal laptop or desktop and we make use of a super computer to assemble the genome.

Annotation of the genome:

After the genome is assembled we need to annotate it. Annotation is the process where we label the genes, markers and other elements in the genome. This is done by using genomes that have already been assembled and annotated e.g. the chicken and budgie genomes. Many of the gene sequences are the same or very similar between species and therefore allows us to label genes in the lovebird genome.