prediction of protein function Flashcards

1
Q

sequence similarity

A
  • most common method of function prediction
    • high identity indicates similar function
  • but proteins with similar sequences can have different function
    • one position changing can change funciton e.g. binding
  • experimental confirmation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

MSAs

A
  • show where to look for evolutionary constraints
  • highly conserved residues important
  • e.g. pml
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

pml

A
  • human TF with RING domains
    • zinc finger protein
  • Zn coordinating residues for binding
    • C..H..C..C
      • found in MSA
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

regular expressions

A
  • characteristic sequence patterns in PROSITE
  • more plasticity than a consensus sequence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

sequence logos

A
  • more detailed representation than regular expression
  • total column height (bits) = degree of conservation
  • height of each symbol = relative frequency of that residue in that position
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

profile HMM

A
  • needed to find more remote homologs
  • machine learning approach to funciton prediction
  • profile = based on MSA
  • hidden = states not directly observed
  • markov = state depends only on state before
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Pfam HMMs

A
  • MSA of full domain
  • characterise columns on conservation
    • absolute/high conservation (capital)
    • some/no conservation (lower case)
    • insert (generates new column)
  • conserved columns make up a consensus
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

HMM states

A
  • insert
    • generate a residue for an inserted column
  • delete
    • remove a normally conserved column
  • match
    • generate a residue for a conserved column
    • according to residue frequencies in MSA
  • HMM machine for generating sequence belonging to that protein family
    • probabilistic model
    • each path through model has associated probability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

forward algorithm

A
  • dynamic programming in pfam
  • finds probability that model could generate a given sequence
  • align HMM with sequence
    • get likelihood for the sequence-profile alignment
    • bit score and E value
    • identities and similar residues
    • posterior probability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

targetP

A
  • profile HMMs to identify signal peptides characteristic of specific cellular compartments
  • mitochondrion, chloroplast or secretory system
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

TMHMM

A
  • HMM to detect TM helices
  • predict architecture of integral membrane proteins
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

STRING

A
  • uses multiple sources of information to predict functional interactions
    • genomic context
    • high throughput experiments
    • coexpression
    • previous knowledge
  • some aspects of function not directly related to sequence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

sources used by STRING

A
  • genomic context:
    • conserved proximity of 2 genes in the genome
    • e.g. trpA trpB
      • can sometimes fuse into one protein coding gene
    • indication of involvement in same pathway
  • confirm with protein interaction experiments
  • gene coexpression experiments
    • conserved expression pattern indicates proximity
  • text mining
    • search abstracts for co-mentions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly