Alignment Based Methods
Edit distance invented by Levenshtein (1965)
Jeweils eine Änderung (Hinzufügen/Löschen eines Buchstaben, verändern…) = Distance +1
Damerau
• flip operations are one change
• brid (old english) ñ bird (new english) ñ 1
operation
• mistyping as “ebya”is more easily recognized by search engines in the web
• used as well in biology, spell checking, …
Global Aligment “Needleman-Wunsch”
• gaps can get different scoring points than edits
• exchange matrix for different letter changes
• find global alignment –> Needleman-Wunsch
• opening and closing a gap can be punished
differentially –> Needleman-Wunsch-Gotoh
• find best local alignment –> Smith-Waterman
• the exchange matrix has smaller punish values for more similar letters
• example: as d/t are both dental sounds or leucin and isoleucin have similar biophysical properties
Smith-Waterman algorithem
Differences FASTA vs BLAST
FASTA: not so time consuming, first FAST
Algorithm
• FASTA and BLAST start with small good
alignments, try to extend, finally optimize best hits
• FASTA is derived from dot-plot
1) Identify common k-words (Nucleotides 6 letters, AA 2 letter)
2) Score dotplot diagonals
3) Rescore possibly by exchange matrix
4) Join regions over gaps, penalise for gaps
5) Dynamic programming to finalize alignments
➔ BLAST hat ein anderes Prinzip: Es wird zuerst nach der perfekten Übereinstimmung gesucht und dann nach verschieden langen anderen ähnlichen Stücken…
Alignment Significance
FASTA Variants
Protein:
• protein-protein FASTA (fasta).
• protein-protein Smith-Waterman (ssearch).
• global protein-protein (Needleman-Wunsch)
(ggsearch)
Nucleotide:
• nucleotide-nucleotide (DNA/RNA fasta)
• ordered nucleotides vs nucleotide (fastm)
• unordered nucleotides vs nucleotide (fasts)
multiple sequence alignement
MSA is for comparing homologous sequences
• Homologs: gene related to a second gene by descent from a common ancestral DNA sequence
- Orthologs: genes in different species that evolved from a common ancestral gene by speciation, normally retain function
- Paralogs: genes related by duplication within a genome,
might acquire new functions
three or more biological sequences (protein or nucleic
acid) of similar length. From the output, homology can be
inferred and the evolutionary relationships between the sequences studied.
By contrast, Pairwise Sequence Alignment tools are used to
identify regions of similarity that may indicate functional,
structural and/or evolutionary relationships between two biological sequences.
Progressive Alignments
Iterative Alignment Methods
Clustal Omega
Solves the problem of beeing fast and accurate.
Clustal Omega is a multiple sequence alignment program.
It produces biologically meaningful multiple sequence alignments of divergent sequences. Evolutionary relationships can be seen via viewing Cladograms or Phylograms.