ab initio gene prediction Flashcards

1
Q

bacteria vs eukaryotes

A
  • much easier for bacteria
  • no introns (single coding region)
  • smaller intergenic regions
    • genes easier to find
    • 2-3% eukaryote genome is genes
  • look for largest ORFs
    • accurate for low GC
    • high GC → fewer stop codons
      • many ORFs will be by chance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

gene finding programs

A
  • artemis - widely used
  • prodigal
    • doesn’t just look for ORFs
    • log-likelihood information
    • accuracy >90%
    • performs well with high GC
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

prodigal

A
  • create training set for protein-coding regions
    • look for G/C bias at each position of ORFs
    • build model of predicted ORFs with positional bias
  • dicodon bias also used
  • penalise ORFs downstream of another larger ORF
    • difference between 2 scores removed from smaller ORF score
    • add length factor to each
      • higher in genome with lwoer GC
  • iteration
  • dynamic programming
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

log-likelihood

A
  • dicodon bias in exons vs introns
  • store statistics and look at random chance of them appearing
  • score as log-likelihood of signal to background for each potential gene
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

prodigal iteration

A
  • on sequences with coding score above threshold
  • store initiation site with highest score for each ORF
  • exmaine starts for ATG/GTG/TTG frequency and RBS/SD
    • rescore
  • new set of starts with highest score in each ORF selected
  • continue iterating and refining
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

prodigal dynamic programming

A
  • performed over all start-stop pairs
  • score each gene on start and dicodon scores
  • allow some overlap
    • opposite strands particularly
    • smaller overlap on same strand
  • determine final gene prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

eukaryotic gene prediction

A
  • introns and exons
  • variable number and length
  • multiple transcripts
    • mechanism not fully understood
    • look for enhancers etc.
  • large intergenic regions
  • need to identify functional sites and patterns around them
    • PWM
How well did you know this?
1
Not at all
2
3
4
5
Perfectly