Protein structure prediction Flashcards by Ola Żyto

What is the protein folding problem?

over 200 mil sequences but only 110k structures
- working hypothesis - sequence of a protein in an environment determined its strucutre
- aim to develop theoretical approach to predict structures from sequeence

How well did you know this?

Not at all

Perfectly

What is the protein folding problem?

How well did you know this?

Not at all

Perfectly

Why predict protein structure?

-the sequence- structure gap
structure can inform about function
- to guide rational drug design
- to guide mutagenesis studies
- to help solve structures from experimental data
- focuses om fundamental understanding of thee chemistry of protein structure

How well did you know this?

Not at all

Perfectly

How do you calculate. the similarity of two proteins?

-superpose the structure (often just the main chain) and quantify on. average the separation between equivalent positions
- quantified as root mean square devaition of equivalence positions

How well did you know this?

Not at all

Perfectly

How do you quantify the accuracy of predicted models?

superpose predicted and x-ray structure
RMSD used for close structures
typically 70 out of 90 superposed residues have an RMSD of 2.6A
arbitrary decision such as choice of maximum difference between equivalent residues

How well did you know this?

Not at all

Perfectly

What is the TM score?

Template modelling removes arobotrary choices
score between 0 and 1 includes all equivalences and is scaled for number of residues in the protein

How well did you know this?

Not at all

Perfectly

What TM do you need to say that the fold of your protein is good ?

Tom > 0.5 means overall food of protein is good
>0.75 means a good predicted structure

How well did you know this?

Not at all

Perfectly

How do we know that predictions work?

-evaluate on known structures
-if you know the answer you have an advantage even if you predict that you don’t

How well did you know this?

Not at all

Perfectly

What is CASP?

-critical assessment of protein structure prediction
-blind trial required to evaluate the different approaches
-sequences sent to predictors prior to experimental coordinates revealed
-every two years with manual evaluation of results
- Manual interventions and server- only predictions - let’s the community know what servers are good

How well did you know this?

Not at all

Perfectly

What are ab initio energy calculations?

original idea describe interactions between atoms and search for conformation of lowest energy
-methods are evenrgy minimalists ion and molecular dynamics

How well did you know this?

Not at all

Perfectly

What is the potential energy of a protein in a particular conformation

Bond length + bond angle + bond dihedral rotation + van Dee walls contacts + electrostatic interactins

How well did you know this?

Not at all

Perfectly

Secondary structure predictions

aim to identify local secondary structures
theory is that to a large extent local sequence determines local structure
current ,ethos use multiply aligned sequences to provide extra information

How well did you know this?

Not at all

Perfectly

Secondary structure predictions

aim to identify local secondary structures
theory is that to a large extent local sequence determines local structure
current ,ethos use multiply aligned sequences to - provide extra information

How well did you know this?

Not at all

Perfectly

Wiat Information abolutnie the strukturę dań you gest from the sequence?

How well did you know this?

Not at all

Perfectly

What is the current state of secondary prediction?

nearly every helix identified
most beta strands but short edge strands still poorly predicted
errors tend to be defined the precise ends
programs such as PsiPred

How well did you know this?

Not at all

Perfectly

What are three major approaches to protein prediction?

Study These Flashcards

template based : reliable; protein fold space is limited< 50% of typical proteome covered
template free - sometimes reliable : deep learning with multiply aligned sequences can sometimes but not always give you good results
hybrids - deep learning with templates produces excellent models like alpha fold

How does template based modelling work?

Study These Flashcards

magenta protein structure unknown
-cyan protein structure known
-via sequence search find magenta sequence is similar to cyan sequence
-predict structure of magenta protein from structure of cyan protein

Describe how Phyre 2 works

Study These Flashcards

How do you do loop modelling?

Study These Flashcards

-fragment the pdb
-find sequences similar to insertion and deletion
-check end point distances
-check backbone geometry
-fit fragment to core structure

Loop modelling accuracy

Study These Flashcards

Insertion and deletions relative to template modelled by loop library up to 15aas in lneght
-short loops under 5 good. Longer loop less trustworthy
-be wary of basing any interaction of the structural effect of point mutations

Side chain modelling

Study These Flashcards

-fit most probable rotated at each position
- according to given backbone angles
- whilst avoiding clashes

Side chain modelling - accuaracy

Study These Flashcards

Sidechainswillbemodelledwith~80%accuracy(chi 1) IF……the backbone is correct.
* Clasheswillsometimesoccurandiffrequent, indicate probably a wrong alignment or poor template
* AnalysewithPhyreInvestigator

Interpreting results - sequence identity and model accuracy

Study These Flashcards

Highconfidence(>90%)andhighseq.id.(>35%): almost always very accurate: TM score>0.7, RMSD 1- 3Å.
* Highconfidence(>90%)andlowseq.id.(<30%) almost certainly the correct fold, accurate in the core (2-4Å) but may show substantial deviations in loops and non-core regions

What is the structural coverage of human proteome

Study These Flashcards

53% — 36% Phyre and 17% pdb

What is another template based modelling program?

Sissmodel

Describe the template free approach

1. You take fragments 2. Predict the possible structures for the given fragment 3. Trial structure for local sequence taken from database of segments of known 3D structure 4. You put fragments together and check if if makes sense 5. You can make changes to check if the change solution gives you a better structure and then you can either discard it or keep it

Are fragment based methods reliable?

-Fragment-based methods could sometimes give reasonable predictions but sometimes fail * Can be integrated with template methods to fil gaps or uncertain regions * I-TASSER (Zhang) and Robetta (Baker) widely used * Now superseded by deep learning e.g. AlphaFold

How do you use sequence correlation in multiple sequence alignment to predict contacts?

-residues that interact with each other tend to evolve together as well - coevolution - so coevolution gives you some info about the structure

Alpha fold approach 1

The input is a multiple sequence alignment (MSA) of the query sequence  In additions, known PDB structures provide structural data known as “templates”  Two track learning called evoformer and structure  First stage called evoformer features including residue-residue contacts at different distances (distograms)

Alpha fold approach 2

The second stage of learning is the “structure” network  Each residue is an independent unit (termed “gas”) and they are not linked together.  Position of the main-chain residues then predicted  Then the side-chains fitted  The learning is termed “end-to-end” so the function optimised (“loss function”) is the difference between the final model and the true structure and al steps learnt together  The algorithms also predicts the expected accuracy of each part of the model (see later slides)

AlphaFold approach - 3

Finally the structure is refined using molecular dynamics using Amber – but this did not improve the model in terms of RMSD but did correct some local stereochemistry.  AlphaFold does not distinguish between template-based and ab- initio approaches.  AlphaFold does use the information from homologous structures but this is within the deep learning  AlphaFold does not use the Phyre/SwissModel approach of starting with a known template and using that as the starting point

AlphaFold database: pLDDT accuracy metric

Per residue confidence metric pLDDT (colour coded on EBI models) on scale of 0 – 100 * pLDDT stands for predicted Local Distance Difference Test * LDDT measures local agreement between two protein structures * pLDDT > 90 are expected to be modelled to high accuracy. * pLDDT between 70 and 90 are expected to be modelled well (a generally good backbone prediction). * pLDDT between 50 and 70 are low confidence and should be treated with caution. * pLDDT < 50 often shown as having a ribbon-like appearance and should not be interpreted – often disordered regions

AlphaFold database: PAE accuracy metric

PAE – Predicted Alignment Error * How well predicted is the distance between two residues * Assess confidence of domain packing * Colour coded Regions with very low PAE can be totally misplaced relative one another Below the extracellular and intracellular regions pack which is biologically impossible

What’s the but with alpha fold

Models for >200M proteins – Amazing resource!  Models for 98.5% of human proteins  But only ~58% of residues in human proteome predicted with high confidence  Compare PDB + Phyre which is ~ 53% of residues

What can you use for tertiary predictions?

CASP15

Protein structure prediction Flashcards

(35 cards)