The human genome
About 25,000 genes in the genome. Two types:
How are males and females different
Structure of DNA and RNA
Typical structure of a gene
No two genes are the same, but there are some common features
No two genes are the same- there are similarities. Coding part is the exons- in between the exons are introns. Upstream there are promoters and enhancers (involved in gene expression). During transcription… introns get spliced out - mature messenger RNA
The genetic code
Key point – some substitutions change the amino acid (nonsynonymous)
Others result in the same (synonymous) amino acid; e.g. CCA and CCG are both proline
Only relevant to the coding region
Redundancy of the genetic code, each amino acid is encoded by a codon. Some amino acids are encoded by more than one codon.
This means a place where the mutation happens- can impact amino acid.
Genetic Variation
SNPs are fundamental to the entire module
Whats a minor allele frequency
Individuals can either have homozygous (e.g. GG or AA) or heterozygous (GA)
How can SNPs have a greater impact on some proteins than others
How do we screen genetic variation? Capillary sequencing
How do we screen genetic variation? Next-generation sequencing
Note the Log scale
1 Mbp costs about 1/1,000,000 of the price it did in 2001
Moores law - relates to the speed of computer processes- DNA sequencing followed moores law until 2007
1 million times cheaper - huge change- can do things a lot more effectively
e.g. Illuminia sequencing
Illumina sequencing key points
Comparison of sequencing methods
Capillary sanger sequencing: ddNTP termination and fluorescent detection, max read 850, run time 1, Gb / run / machine «_space;0.001, Pros: Accurate, useful for validation of NGS data, Cons: Low throughout, expensive
Illumina NovaSeqX: Polymerase-based sequence by synthesis, max reads: 2 x 150 (paired end), run time 2, Gb / run / machine 8000, Pros: Massive throughput, Cons: Short reads make assembly challenging
PacBio Revio: Single molecular real time sequencing, Max. read length 15,000-20,000, Run time 1, Gb / run / machine 90, Pros: Very high throughput, Very long reads, Cons: Less throughput than Illumina
Oxford Nanopore Promethion: Single molecular real time sequencing, Max. read length/ bp 10,000-100,000, run time: 3, GB/ run/ machine: 50-110, Pros: Very high throughput; Possibly longest reads, Cons: Slightly higher error rate; less throughput than Illumina
Errors during sequencing?
Exome sequencing
What is sequence capture
works by using ‘baits’ with biotin, which can then be attached to magnetic beads. The beads are then stripped of DNA which can be sequenced.
Cataloguing human genetic variation
SNP genotyping
The 1000 genomes project
Much more comprehensive description of human variation
Estimate of mutation rates and detection of regions under selection
Loss-of-function mutants detected and shown to be common in all of us
Managed to estimate mutation rates for the first time.
The ENCODE Project