Advancing Protein Structure Analysis

 
RDC’s and Databases
 
 
In progress
 
Introduction
 
New database easily accessible by the user
BMRB has most of the data?
Used to build PDB’s
 
150 proteins, all assigned
An additional approximate 200 proteins
 
Statistics of this set of data
Improve the use of the data with restrictions on the data, e.g. well-defined
regions
 
Databases
 
 
150 NESG proteins
Fully assigned
Many rdc measurements (150 proteins, 100 rdc’s each)
Easily accessible rdc database
 
Protein structures deposited in the PDB
20 structures typically for each
 
Additional proteins
 
Approximately 200 additional proteins
Not all are in the BMRB or used in the PDB
100 rdc’s for each
 
Not examined yet
 
Many FID’s – not examined the way these rdc’s are (should mention in
paper)
 
Check PDB structures
 
Back calculations of rdc’s can be used to validate the measurements
and PDB structures
Improved back calculations can be made by restricting to well-defined
regions where the is a good convergence of the deposited structures
 
Detailed analysis for each protein – comparing measured to back
calculation
General summary can be made for all proteins by statistically
examining all the proteins together
 
Example back calculated rdcs from pdb’s
 
PDB 2KK8 – 20 structures
 
Some points are absent due to
missing rdc’s or missing H’s
 
Not well-defined region
corrected
 
RMSD statistics of all protein structures
 
 
1)
All models (300,000 rdc’s),
 
1)
Best model from each pdb
 
1)
nlsq fit to population of
models
 
1)
Expect larger mean, median,
std for larger proteins
 
Can the statistics be improved?
 
Question:
 
Can back calculated rdc’s fit the measurements better if only well-defined regions of proteins are used?
Should if structures are more reliable
Back calculation involves all of the protein N-H’s together
 
Restriction to well-defined regions could improve the individual rmsd’s of the measurement-calculated
 
Dihedral angle order parameter restrictions can be used to define these regions (PdbStat-DAOP, Cyrange)
Variance metric technique restrictions can (PdbStat-FindCore2)
 
 
Dihedral resctrictions
 
Flexible regions not included in the 150
proteins
 
PdbStat-DAOP (blue) and Cyrange (orange)
 
PdbStat-FindCore2 not yet included
 
Rdc statistics not done
 
Chemical shift predictions
 
Can this improve prediction software by region restriction?
 
How much data is there?
 
 
E.g., each protein has 100 residues
 
Rdc’s, H and N chemical shifts
 
Approximately 45000 measurements
PDB has 20 structures for each, 900,000 points in total to match
 
Important to validate the measurement input into the PDB
 
To do
 
Compare statistically PdbStat-DAOP, PdbStat-FindCore2, and Cyrange in overall improvement of rdc measurement match
to PDB files
Present the database in an easy to use on-line MySql form (MySql part is done)
150 proteins (additional 200)
 
Secondary-
Maybe use MD to improve the not well-defined regions
Provide more documentation of PdbStat
Check if use of PdbStat and Cyrange increase the reliability of chemical shift predictions.  These databases use hundreds
not thousands of proteins in single frame predictions (i.e. one model)
 
 
 
 
Some slides about using ‘core regions’ from protiens
 
No missing rdc measurements are included and only core residues are used
 
Improving the calculations
 
 
 
Figure of core regions by residues from DAOP and Cyrange, blue is Cyrange
and orange is DAOP, generally about 80 percent of residues per protein
 
Number of core residues by different programs
 
Pdb-Stat DAOP evaluation of rdc’s with core residues
 
Pdb-Stat FindCore2 residues and rdc’s
 
Cyrange core residues and rdc’s
 
Can improve the rmsd by using the core residues as determined by Pdb-Stat and Cyrange
 
Maybe 30% in general.
 
Number of core residues is about 80% of total on average.
Slide Note
Embed
Share

Introduction to RDCs, databases, and new strategies for protein data analysis. Information on accessible databases like BMRB for building PDB structures. Detailed insights on 150 NESG proteins with fully assigned data. Discussion on utilizing back calculations of RDCs for validation and improving protein structures. Exploring the impact of well-defined regions on data accuracy and potential improvements in RMSD statistics.

  • Protein structure
  • Data analysis
  • RDCs
  • Databases
  • Protein data

Uploaded on Mar 02, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. RDCs and Databases In progress

  2. Introduction New database easily accessible by the user BMRB has most of the data? Used to build PDB s 150 proteins, all assigned An additional approximate 200 proteins Statistics of this set of data Improve the use of the data with restrictions on the data, e.g. well-defined regions

  3. Databases 150 NESG proteins Fully assigned Many rdc measurements (150 proteins, 100 rdc s each) Easily accessible rdc database Protein structures deposited in the PDB 20 structures typically for each

  4. Additional proteins Approximately 200 additional proteins Not all are in the BMRB or used in the PDB 100 rdc s for each Not examined yet Many FID s not examined the way these rdc s are (should mention in paper)

  5. Check PDB structures Back calculations of rdc s can be used to validate the measurements and PDB structures Improved back calculations can be made by restricting to well-defined regions where the is a good convergence of the deposited structures Detailed analysis for each protein comparing measured to back calculation General summary can be made for all proteins by statistically examining all the proteins together

  6. Example back calculated rdcs from pdbs PDB 2KK8 20 structures Some points are absent due to missing rdc s or missing H s Not well-defined region corrected

  7. RMSD statistics of all protein structures Can the statistics be improved? 1) All models (300,000 rdc s), 1) Best model from each pdb 1) nlsq fit to population of models 1) Expect larger mean, median, std for larger proteins

  8. Question: Can back calculated rdc s fit the measurements better if only well-defined regions of proteins are used? Should if structures are more reliable Back calculation involves all of the protein N-H s together Restriction to well-defined regions could improve the individual rmsd s of the measurement-calculated Dihedral angle order parameter restrictions can be used to define these regions (PdbStat-DAOP, Cyrange) Variance metric technique restrictions can (PdbStat-FindCore2)

  9. Dihedral resctrictions Flexible regions not included in the 150 proteins PdbStat-DAOP (blue) and Cyrange (orange) PdbStat-FindCore2 not yet included Rdc statistics not done

  10. Chemical shift predictions Can this improve prediction software by region restriction?

  11. How much data is there? E.g., each protein has 100 residues Rdc s, H and N chemical shifts Approximately 45000 measurements PDB has 20 structures for each, 900,000 points in total to match Important to validate the measurement input into the PDB

  12. To do Compare statistically PdbStat-DAOP, PdbStat-FindCore2, and Cyrange in overall improvement of rdc measurement match to PDB files Present the database in an easy to use on-line MySql form (MySql part is done) 150 proteins (additional 200) Secondary- Maybe use MD to improve the not well-defined regions Provide more documentation of PdbStat Check if use of PdbStat and Cyrange increase the reliability of chemical shift predictions. These databases use hundreds not thousands of proteins in single frame predictions (i.e. one model)

  13. Improving the calculations Some slides about using core regions from protiens No missing rdc measurements are included and only core residues are used

  14. Figure of core regions by residues from DAOP and Cyrange, blue is Cyrange and orange is DAOP, generally about 80 percent of residues per protein

  15. Number of core residues by different programs

  16. Pdb-Stat DAOP evaluation of rdcs with core residues

  17. Pdb-Stat FindCore2 residues and rdcs

  18. Cyrange core residues and rdcs

  19. Can improve the rmsd by using the core residues as determined by Pdb-Stat and Cyrange Maybe 30% in general. Number of core residues is about 80% of total on average.

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#