sequencing & assembly Flashcards

Question 1

Q

stages of genome sequencing

Answer

A

fragmentation/cloning
- fragment library, amplification
sequencing
- from both ends to get multiple reads
processing
- base calling, quality assessment, repeat masking
- trim ends (decrease on polymerase affinity)
assembly
- overlapping reads → contigs
- contigs → scaffolds

Question 2

Q

pac bio

Answer

A

3rd gen
longest reads ~20,000
high error rate but random
- mutliple reads → consensus
99.999% accuracy

Question 3

Q

phred scores

Answer

A

quality score
estimated confidence in each base call
use to:
- filter and trim reads
- create consensus
- distinguish between variants and errors

Question 4

Q

Q value

Answer

A

given by phred
QV = -10log10(Pe)
- Pe = probability that base call is an error
ignore call if lower than 30
- <99.9% accuracy

Question 5

Q

chastity filter

Answer

A

illumina base call algorithm
assign and filter intensity score for nucleotides at each position
highest score divided by sum of highest and 2nd highest score for that position
less than threshold (0.6) base marked N
if higher assign base call

Question 6

Q

factors affecting quality

Answer

A

end of read deterioration (pol affinity)
adaptor attached to reads
high AT or GC content
- reduced complexity
homopolymeric tracts
- unsure of length
- SNPs ignored (assumed as error)

Question 7

Q

depth of coverage

Answer

A

eliminate errors
depends on genome complexity, read length, sequencer error rate
HGP - 12x or greater
- each base present in 12 reads on average

Question 8

Q

paired end sequencing

Answer

A

sequence both fragment ends
distance known → filter fragments by size
knowing one position anchors the other
better read alignment
important for repeats
improved prediction of structural variations

Question 9

Q

repeats

Answer

A

fragments with identical repeat regions can be assembled together
in between sequences lost
sequencing may be impossible

Question 10

Q

repeats and paired end reads

Answer

A

pair of overlapping reads, 1 unique, 1 repetitive
map unique read
position second as distance known
enough paired reads allows sequencing across whole repeat region
small repeats only

Question 11

Q

mate pairs

Answer

A

longer than paired ends
- kb vs 500bp
bridge across repeats or structural rearrangements
don’t sequence repeat but don’t lose information
fill gaps with paired ends
helps resolve correct order of repeat fragments

Question 12

Q

scaffolding

Answer

A

resolution of conflicting areas
order non-overlapping contigs into scaffolds
- gaps with known or predicted size
- spanned by N (unknown sequence)
bridge contigs with mate pairs
de novo assembly - gaps remain
- need wet-lab work and paired end reads

Question 13

Q

rearrangments and paried end reads

Answer

A

compare to reference genome mapping
decrease in size → deletion
increase in size → insertion
wrong way round → inversion
maps to different region → translocation

Question 14

Q

finishing

Answer

A

fill in gaps, resequencing, different technology or longer reads
design primer probe for PCR to reach end
improve ocnsensus
expensive

Question 15

Q

limitations of assembly

Answer

A

next gen small read lengths
AT rich genomes
repetitive genomes
de novo sequencing
- no reference
- use multiple technologies

Question 16

Q

entamoeba histolytica

Answer

Study These Flashcards

A

AT rich
long runs of Ts
many contig ends are T
unknown ploidy
1500 contigs, 20Mb

Question 17

Q

blumeria araminis

Answer

Study These Flashcards

A

repeat rich
7000 contigs, 120Mb
contig assembly difficult
need longer reads or targeted sequencing

sequencing & assembly Flashcards

(17 cards)