genome annotation Flashcards

1
Q

annotation

A
  • describing features of raw sequence:
    • genes
    • regulatory features
    • repeats
    • areas of conservation
    • syntenic regions
      • evolutionary relationships between genomes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

process of annotation

A
  • assign gene functions
    • includes variable splice site identification
  • identify regulatory features and functions
    • can be subtle
  • identify repeats
  • comparative studies with other genomes
  • mainly automated but also VEGA manual intervention in places
  • context is important
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

simple metrics

A
  • first step
  • counting and analysing structure
  • includes base confidence scores
  • rolling GC content
  • di/trinucleotide bias
    • patterns present more often than expected by chance → role
  • codon use
    • increased use → common tRNAs → gene more easily expressed
    • gene with rare codons expressed less
    • housekeping genes → more common codons
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

GC content

A
  • important for gene prediction
  • influences survival in the environment
  • varies widely across genome and within genomes
  • correlation between exon length and gene content
    • much higher GC with shorter introns
    • 65% (300 base) vs 30% (2300 base)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

human GC content

A
  • ~38%, varies widely
  • between 35% and 60% in a 100kB fragment
  • in genes:
    • more uniform
    • 45-50%
  • regions of high GC generally higher gene density
    • software based prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

bacterial replication

A
  • oriC on both strands → replication in both directions
  • reverse strand delayed due to opening and okazaki fragments
  • ssDNA exposed → deamination of C to T 100x more frequently
  • TG mismatch → mutate to TA in next round of replication
    • loss of C on reverse strand
    • relative decrease in G on reverse strand
    • decrease in C on forward strand
  • plot skew
    • minimum - origin
    • maximum - termination
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

GC skew in higher organisms

A
  • multiple origins of replication
  • 3 minima indicates 3 oriC e.g. archaea
  • more difficult in eukaryotes
    • yeast - 400 ori
    • need to look for other patterns (ARS)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

eukaryotic GC skew

A
  • ARS consensus sequence in yeast
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

other measures

A
  • dicodon counts
    • frequency of occurrence of successive codon pairs
  • 3rd base periodicity
    • e.g. same nucleotide at n, n+2 etc
  • length and occurrence of ORFs
    • stretches between stop codons
  • promoters and TF binding sites (subtle)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

subtlety in human genome

A
  • G and C more prevalent in first 50 nt of intron
    • GGG 4x more frequent
  • VWG consensus in exons
    • not T, A/T, G
    • minimal periodicity of 10 nt
    • weaker in introns
    • phase bending potential towards major groove
      • increased accessibility
How well did you know this?
1
Not at all
2
3
4
5
Perfectly