sequence alignment Flashcards

Question 1

Q

bioinformatics resources

Answer

A

algorithm
- set of rules to perform an operation
- same one can be used by different programs
program
- code that implements an algorithm
- can use stored data, but not its aim (e.g. PSI BLAST)
database
- organised searchable source of biological data
- aim is to store data

Question 2

Q

protein evolution

Answer

A

duplication can lead to divergent evolution
- homologous proteins with related sequences and structures
- often but not always related function
analyse by alignment

Question 3

Q

alignment features

Answer

A

identity (:)
gap
- insertion or deletion
substitution
- can be conservative (.)
- same characteristics e.g. hydrophobicity
end gap
- one sequence longer than the other

Question 4

Q

paralogs

Answer

A

homolog created by gene duplication within a species
can result in change of function
- original copy can maintain function
- second copy free to mutate and adopt novel function

Question 5

Q

orthologs

Answer

A

homolog created by speciation
both species now have a single copy of the same gene
only one copy per species
- less likely to change
- function needs to be retained

Question 6

Q

requirements of a pairwise protein sequence alignment

Answer

A

scoring scheme of residue similarity
algorithm to establish the alignment
aim to combine algorithm and scoring scheme to generate the best alignment in biological terms
potential to be extended to database searching

Question 7

Q

scoring schemes

Answer

A

simplest would be 1 for identity and 0 for different
better to include similarity of residues
- conservative subsitutions indicate more recent changes
- residues tend to retain chemical properties so that function is modified, not destroyed
gaps also indicate increased distance

Question 8

Q

BLOSUM

Answer

A

blocks substitution matrix
aligned segments of protein families (blocks)
blosum62:
- clustered sequences in blocks where pairwise identity >62%
- most widely used

Question 9

Q

blosum62

Answer

A

substitution matrix
- score for changing one residue to another
- represents chemical similarity
e.g. cys - disulfide formation means high conservation
- presence in both sequences indicate similarity
- high score (9)
low negative score if properties change
- e.g. hydrophobic to charged
empirical
gaps considered later

Question 10

Q

affine gap penalty

Answer

A

penalise insertions/deletions
penalty = o + el
- o = gap opening constant
- e = gap extension constant
- l = length of gap extension
o>e
- gap introduction is the major event
- extending the gap is minor

Question 11

Q

protein domains

Answer

A

protein seqeucnes formed of domains
each domain originates from a different homologous family
domains are the evolutionary unit
methods need to take this into account
- don’t have to align whole sequence

Question 12

Q

local vs global alignment methods

Answer

A

different algorithms
part or all of a query can match part or all of a database sequence
gaps may be needed to get a suitable alignment

Question 13

Q

dotplot

Answer

A

used to assign identities
one sequence on each axis
assign dot where they match along the diagonal
best path has the highest number of dots
need closely related sequences

Question 14

Q

needleman wunsch algorithm

Answer

A

dynamic programming
maximises similarity score to give maximum match
- largest number of residues of one sequence that can be matched with another allowing for all possible insertions/deletions
finds best global alignment
iterative matrix method
- 2D array of all possible pairs of residues (bases or amino acids)
- one sequence on each axis
- all possible alignments represnted by paths through the array

Question 15

Q

NW similarity values

Answer

A

S_ij = numerical value assigned to every cell in the array
depends on similarity of the 2 residues
value of 1 indicates identity

Question 16

Q

NW alignment construction

Answer

A

add Sij from left to right along a path through the array to get cumulative alignment score
best alignment has highest score = maximum match
- maximum match always in outer row/column
- work backwards from here to construct alignment
- gaps can be inserted

Question 17

Q

smith waterman algorithm

Answer

A

compares segments of all possible lengths instead of looking at each sequence as a whole
- local alignment
choose whichever maximises similarity measure
allow for gaps
variation on NW
- dynamic programming

Brainscape's Knowledge GenomeTM

sequence alignment Flashcards

Brainscape's Knowledge Genome^TM