What are the objectives of genome annotation?
What are some issues you can face during genome annotation?
Annotating genomes is a computationally intensive process, both in terms of processing and data storage
What are some characteristics of genomes that might lead to errors during the annotation process?
What is annotation by simple metrics?
this is the simplest level of annotation and provides general information about the genome
What are some things you can annotate with simple metrics annotation?
What is the average GC content in the genome and in genes (in humans?
What is the relationship between the GC content and gene density?
Regions of high GC content (62-68%) have higher relative gene density than regions of lower GC content.
What is the relationship between exon and intron length and the GC content?
What are some other characteristics we can measure apart from the GC content?
What is an ab initio prediction?
prediction just from the sequence
Why is gene prediction easier in bacteria than in eukaryotes?
How do bacterial gene finders work and is it a reasonable assumption?
Why is looking at the length of ORFs not enough in genomes with high GC contents?
What is Prodigal?
What are the steps in Prodigal?
Describe 1. Constructing of a training set fro protein coding
(prodigal)
Describe 2. Building log-likelihood coding statistics from the training data
Describe sharpening coding scores
Describe Length factor to coding
A length factor is added to the coding score. This factor is higher in high GC genomes, and lower in low GC genomes.
Describe iterative start training
Describe final dynamic programming (trying every possible combination
What genome does Prodigal use for training?
it only uses the current genome it analyses for training! not external data!
Why are eukaryotic genes more difficult to annotate?
What can you use to describe the functional sites in an eukaryotic genome?