What is the protein folding problem?
over 200 mil sequences but only 110k structures
- working hypothesis - sequence of a protein in an environment determined its strucutre
- aim to develop theoretical approach to predict structures from sequeence
What is the protein folding problem?
over 200 mil sequences but only 110k structures
- working hypothesis - sequence of a protein in an environment determined its strucutre
- aim to develop theoretical approach to predict structures from sequeence
Why predict protein structure?
-the sequence- structure gap
structure can inform about function
- to guide rational drug design
- to guide mutagenesis studies
- to help solve structures from experimental data
- focuses om fundamental understanding of thee chemistry of protein structure
How do you calculate. the similarity of two proteins?
-superpose the structure (often just the main chain) and quantify on. average the separation between equivalent positions
- quantified as root mean square devaition of equivalence positions
How do you quantify the accuracy of predicted models?
What is the TM score?
What TM do you need to say that the fold of your protein is good ?
Tom > 0.5 means overall food of protein is good
>0.75 means a good predicted structure
How do we know that predictions work?
-evaluate on known structures
-if you know the answer you have an advantage even if you predict that you don’t
What is CASP?
-critical assessment of protein structure prediction
-blind trial required to evaluate the different approaches
-sequences sent to predictors prior to experimental coordinates revealed
-every two years with manual evaluation of results
- Manual interventions and server- only predictions - let’s the community know what servers are good
What are ab initio energy calculations?
What is the potential energy of a protein in a particular conformation
Bond length + bond angle + bond dihedral rotation + van Dee walls contacts + electrostatic interactins
Secondary structure predictions
Secondary structure predictions
Wiat Information abolutnie the strukturę dań you gest from the sequence?
What is the current state of secondary prediction?
What are three major approaches to protein prediction?
How does template based modelling work?
Describe how Phyre 2 works
How do you do loop modelling?
-fragment the pdb
-find sequences similar to insertion and deletion
-check end point distances
-check backbone geometry
-fit fragment to core structure
Loop modelling accuracy
Insertion and deletions relative to template modelled by loop library up to 15aas in lneght
-short loops under 5 good. Longer loop less trustworthy
-be wary of basing any interaction of the structural effect of point mutations
Side chain modelling
-fit most probable rotated at each position
- according to given backbone angles
- whilst avoiding clashes
Side chain modelling - accuaracy
Sidechainswillbemodelledwith~80%accuracy(chi 1) IF……the backbone is correct.
* Clasheswillsometimesoccurandiffrequent, indicate probably a wrong alignment or poor template
* AnalysewithPhyreInvestigator
Interpreting results - sequence identity and model accuracy
Highconfidence(>90%)andhighseq.id.(>35%): almost always very accurate: TM score>0.7, RMSD 1- 3Å.
* Highconfidence(>90%)andlowseq.id.(<30%) almost certainly the correct fold, accurate in the core (2-4Å) but may show substantial deviations in loops and non-core regions
What is the structural coverage of human proteome
53% — 36% Phyre and 17% pdb