Sparse Matrix Analysis for Drug Discovery Research

undefined
SPARSE-MATRIX ANALYSIS
OF SMALL MOLECULES
AND PROTEIN TARGETS
FOR DRUG DISCOVERY
Gerald J. Wyckoff, UMKC
What drives our research?
The pharmaceutical industry is facing spiraling drug development costs
while R&D productivity remains stalled
6 of the 10 highest-grossing branded products will or have lost patent
exclusivity this year (2014)
Reuters notes that the industry spent $65 billion on drug R&D in the U.S. in
2009, but approval rates have sunk 44% over the past 13 years
Background
Importance of identifying valid targets and therapeutic compounds
Tools currently in use:
Structure-based virtual screening
Receptor-based virtual screening
Other computational tools
Drawbacks to current implementation of  high-throughput virtual screening:
Computationally intensive
Limited access due to high cost of infrastructure
GCP/ICH compliance?
Solution:
Virtual screening in the cloud
Provides computational resources scalably and only when needed
Sparse-Matrix Maps
Don’t lose data after screening
Maslow’s Hammer
Solution
Treat Small Molecule Drug Discovery like a “Big Data” Problem
Sparse matrix maps of clustered small molecules and phylogenetic representations
of protein targets.
Maps represent opportunities to find novel targets of existing drugs, and novel
drugs for existing targets.
Create representations of data already familiar to most pharmaceutical scientists
Rests on two existing technologies at Zorilla Research and in our lab:
SABLE (Structural Alignment By Likelihood Estimation) integration to existing cloud-based suite of drug development tools
Developed at UMKC
Performs extremely detailed protein alignments
Allows 
prediction
 of interactions, aiding both drug repurposing and off-target effect analysis
Chemical Information Fingerprinting
Developed by the PI in a previous STTR grant
Gives a Bitwise score of three-dimensional information
Allows for rapid cluster analysis of small molecules AND protein targets
Clustering algorithms deployed in R
The Process
There are approximately 40,000 proteins and approximately 15 million
distinct small molecules.
600,000,000,000 (600 Billion) combinations.
This is a big data problem.
Gather all known interactions
Cluster all small molecules (fingerprinting)
Fingerprint generates a bitwise score- important for proper functioning of cluster
tools.
Cluster all proteins
Known methods
Map all interactions
Treat this exactly like other big data problems in biology.
Map interaction pathways on proteins, ADMET on small molecules
Absorption, Distribution, Metabolism, Discretion and Toxicity
Record interaction strength/rank (from modeling/docking)
LOTS of distribution data
Total Values
25567735
AvgValue
-7.170062456
StDev
0.722973362
3 SD
-9.338982541
# at ≥3SD
34299
% at ≥3SD
0.134149544
4 SD
-10.0619559
# at ≥4SD
1869
% at ≥4SD
0.007309994
Problem with Data organization
For targets:
How to build an appropriate
distance measure
May be three or four that would
work appropriately
Come up with a single distance
measure
This distance allows confidence in
groups
For small molecules
Same problems
More acute:
Not clear that chirality and such
should be dealt with at all
Different measures could mean
radically different placement
Ideally we handle this in a
similar way to targets
Predicted to form 9 hydrogen
bonds involving 7 different
residues: 
Arg286, Asn318,
Ser323, Glu383, Asp397, Arg405,
Val446
Organize the data
Sample of data for each ligand docked into the individual protein structures
Each row is an experiment
Rescoring
Not enough to use one view of the data
Rescore all data in order to assure best
possible view of data
NNScore 2.0
SABLE
SABLE
 
(Structural Alignment By Likelihood Estimation) 
integration to existing cloud-based suite of
drug development tools
Developed at UMKC
Protected by a provisional patent
Performs extremely detailed protein structure alignments
Allows 
prediction
 of interactions, aiding both drug repurposing and off-target effect analysis
Brings off-target and repurposing screens 
in silico 
Cost-savings to drug developers
Applicable in early and late stages of development
undefined
As can be seen above, the SABLE technology allows for a more complete and
accurate alignment of proteins, leading to better visualization and modeling of
functional sites that are the target of drug discovery.
Large Phylogenies
Enabled by both amino acid
and structural data
Organized data in target
fields
No loss of data even when
a target isn’t screened
Inference across data
Visualization of Clustered Data
Clustered Data sets
Small molecule data is on the
top (X-axis).
Protein data is at left (Y-axis).
Data has been clustered using
hierarchical methods.
Red/Blue data is
interaction/non-interaction data.
Clear patterns for testing
potential drug/target pairs exist
from this visualization.
Framework allows pathway and
ADMET data incorporation early.
Small Molecules
Protein Targets
Clustered data sets
Smooth combined data – across
data we have versus data that
is not available.
Build in smoothing function for
all data
Top level data
Smoothing function
In Silico data
Experimental data
Literature data
Combined likelihood
Score - Bayseian
What next?
Find nodes within the sparse matrix.
Superposition proteins in a cluster downstream of a node.
Use SABLE
Map interaction domains using SCIPDB
Analyze superposition of alternative small molecules within the cluster.
Dock and model promising leads.
Consider off-target effects, ADMET up front
This is precisely where analysts have said the market needs to go
Send for bench screening of leads.
Process cuts down on mass bench screening
This is faster and cheaper than current processes
Future Goals
Build integrated suite of tools (including Zorilla applications)
Improve ancestral protein prediction in phylogenetic analysis
Answer fundamental evolutionary questions relating to
structure/function
For Further Information, contact: 
For Further Information, contact: 
wyckoffg@umkc.edu
wyckoffg@umkc.edu
Acknowledgments
The Wyckoff Lab
Lee Likins, Scott Foy, Ming Yang
Ada Solidar (B-tech Consulting)
HaRo Pharmaceuticals
Tomasz Skorski (Temple University)
The Miziorko Lab (UMKC)
John VanNice
Andrew Skaff
Jeff Murphy (Nickel City Software)
Brian Geisbrecht (K-State)
And his lab
John Walker (SLU)
NIH 1 R41 GM 088922-01A1
NIH 2 R44 GM097902-02A1
NIH 1 R21 AI113552-01
VaSSA Informatics, LLC for major
funding
Digital Sandbox KC
Missouri Technology Corporation
UMKC SBS, UMRB, UMKC FRG,
KCALSI for additional funding
Slide Note
Embed
Share

The pharmaceutical industry is grappling with rising costs and stagnant R&D productivity. To address this, research is focused on utilizing sparse matrix analysis of small molecules and protein targets. By applying innovative technology and cloud-based solutions, researchers aim to streamline drug discovery processes, enhance target identification, and optimize therapeutic compound selection.

  • Drug Discovery
  • Sparse Matrix Analysis
  • Pharmaceutical Industry
  • Protein Targets
  • Small Molecules

Uploaded on Feb 23, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. SPARSE-MATRIX ANALYSIS OF SMALL MOLECULES AND PROTEIN TARGETS FOR DRUG DISCOVERY Gerald J. Wyckoff, UMKC

  2. What drives our research? The pharmaceutical industry is facing spiraling drug development costs while R&D productivity remains stalled 6 of the 10 highest-grossing branded products will or have lost patent exclusivity this year (2014) Reuters notes that the industry spent $65 billion on drug R&D in the U.S. in 2009, but approval rates have sunk 44% over the past 13 years Drug Lead Optimization 3 years Product Realization 4.5 years Drug Lead Generation 5 years Assays & In Vivo PhIII Registration Target Identification Target Validation Drug Lead Identification PhI / IIa PhII Formal Preclinical Fail Rate: 12% Fail Rate: 28% Fail Rate: 17% Fail Rate (combined): 18% Fail Rate: 34% Fail Rate: 22% Fail Rate: 82% Fail Rate: ~50%

  3. Background Importance of identifying valid targets and therapeutic compounds Tools currently in use: Structure-based virtual screening Receptor-based virtual screening Other computational tools Drawbacks to current implementation of high-throughput virtual screening: Computationally intensive Limited access due to high cost of infrastructure GCP/ICH compliance? Solution: Virtual screening in the cloud Provides computational resources scalably and only when needed Sparse-Matrix Maps Don t lose data after screening

  4. Maslows Hammer

  5. Solution Treat Small Molecule Drug Discovery like a Big Data Problem Sparse matrix maps of clustered small molecules and phylogenetic representations of protein targets. Maps represent opportunities to find novel targets of existing drugs, and novel drugs for existing targets. Create representations of data already familiar to most pharmaceutical scientists Rests on two existing technologies at Zorilla Research and in our lab: SABLE (Structural Alignment By Likelihood Estimation) integration to existing cloud-based suite of drug development tools Developed at UMKC Performs extremely detailed protein alignments Allows prediction of interactions, aiding both drug repurposing and off-target effect analysis Chemical Information Fingerprinting Developed by the PI in a previous STTR grant Gives a Bitwise score of three-dimensional information Allows for rapid cluster analysis of small molecules AND protein targets Clustering algorithms deployed in R

  6. The Process There are approximately 40,000 proteins and approximately 15 million distinct small molecules. 600,000,000,000 (600 Billion) combinations. This is a big data problem. Gather all known interactions Cluster all small molecules (fingerprinting) Fingerprint generates a bitwise score- important for proper functioning of cluster tools. Cluster all proteins Known methods Map all interactions Treat this exactly like other big data problems in biology. Map interaction pathways on proteins, ADMET on small molecules Absorption, Distribution, Metabolism, Discretion and Toxicity Record interaction strength/rank (from modeling/docking)

  7. LOTS of distribution data Total Values 25567735 AvgValue -7.170062456 StDev 0.722973362 3 SD -9.338982541 # at 3SD 34299 % at 3SD 0.134149544 4 SD -10.0619559 # at 4SD 1869 % at 4SD 0.007309994

  8. Problem with Data organization For targets: How to build an appropriate distance measure May be three or four that would work appropriately Come up with a single distance measure This distance allows confidence in groups For small molecules Same problems More acute: Not clear that chirality and such should be dealt with at all Different measures could mean radically different placement Ideally we handle this in a similar way to targets

  9. Organize the data E383 4LEJ V446 Pose VINAValue 1 2 3 4 5 6 7 8 9 NNScore 527.32 1.26 2.33 146.81 3.86 1.71 317.74 260.64 256.9 -9.2 -9.1 -8.5 -8.2 -7.7 -7.4 -7.3 -7 -6.8 pM uM uM nM uM uM nM nM uM R405 S323 N318 D397 Predicted to form 9 hydrogen bonds involving 7 different residues: Arg286, Asn318, Ser323, Glu383, Asp397, Arg405, Val446 R286

  10. Each row is an experiment 1AIV 1AVS 1BLF 1BR1 1BR2 1DS3 1F6R 1F6S 1FXZ 1HLU 1IC2d 1IC2m zinc_85881626 -9.6 -6.5 -9.3 -8.1 -9 -9.5 -6.5 -9 -7.9 -8.7 -9.3 -6.5 -9 -7.8 -8.3 -9.2 -6.3 -8.7 -7.5 -8 -9.2 -6.3 -8.5 -7.3 -7.9 -9.1 -6.2 -8.5 -7.3 -7.8 -8.7 -6.2 -8.4 -6.9 -7.7 -8.5 -6.1 -8.4 -6.9 -7.7 -8.4 -6 -8.4 -6.9 -7.7 -6.3 -6.2 -6.2 -6.2 -6 -6 -6 -5.9 -5.8 -7 -6.9 -6.9 -6.9 -6.9 -6.9 -6.8 -6.8 -6.7 -6.4 -6.4 -6.3 -6.2 -6.1 -6.1 -6.1 -6.1 -6 -9 -9 -8.5 -8.4 -8.1 -8.1 -7.9 -7.8 -7.8 -8 -7.4 -7.4 -7.3 -7.3 -7.2 -7.2 -7.2 -7.2 -7.1 -6.8 -6.7 -6.6 -6.5 -6.5 -6.5 -6.5 -6.5 -6.2 -6 -6 -6 -5.9 -5.9 -5.9 -5.9 -5.8 Sample of data for each ligand docked into the individual protein structures

  11. Rescoring Not enough to use one view of the data Rescore all data in order to assure best possible view of data NNScore 2.0

  12. SABLE SABLE(Structural Alignment By Likelihood Estimation) integration to existing cloud-based suite of drug development tools Developed at UMKC Protected by a provisional patent Performs extremely detailed protein structure alignments Allows prediction of interactions, aiding both drug repurposing and off-target effect analysis Brings off-target and repurposing screens in silico Cost-savings to drug developers Applicable in early and late stages of development

  13. As can be seen above, the SABLE technology allows for a more complete and accurate alignment of proteins, leading to better visualization and modeling of functional sites that are the target of drug discovery.

  14. Large Phylogenies Enabled by both amino acid and structural data Organized data in target fields No loss of data even when a target isn t screened Inference across data

  15. Visualization of Clustered Data Clustered Data sets Small molecule data is on the top (X-axis). Protein data is at left (Y-axis). Data has been clustered using hierarchical methods. Red/Blue data is interaction/non-interaction data. Clear patterns for testing potential drug/target pairs exist from this visualization. Framework allows pathway and ADMET data incorporation early. Small Molecules 3833876 3872141 3872142 3872143 3872144 3984042 4134477 4521332 12402849 12402850 21985599 35270772 35270774 35270775 NP_000850 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 BAH12375 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 BAG61573 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 BAH13256 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 XP_005265026 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Protein Targets EAW64826 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NP_036367 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 BAA12111 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 AAH27207 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 AAH63302 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 BAG62081 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 BAG60932 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 AAI21062 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 BAB70816 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 AAI07140 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NP_001030014 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0

  16. Clustered data sets Top level data Smoothing function Smooth combined data across data we have versus data that is not available. In Silico data Experimental data Build in smoothing function for all data Combined likelihood Score - Bayseian Literature data

  17. What next? Find nodes within the sparse matrix. Superposition proteins in a cluster downstream of a node. Use SABLE Map interaction domains using SCIPDB Analyze superposition of alternative small molecules within the cluster. Dock and model promising leads. Consider off-target effects, ADMET up front This is precisely where analysts have said the market needs to go Send for bench screening of leads. Process cuts down on mass bench screening This is faster and cheaper than current processes

  18. Future Goals Build integrated suite of tools (including Zorilla applications) Improve ancestral protein prediction in phylogenetic analysis Answer fundamental evolutionary questions relating to structure/function

  19. Acknowledgments NIH 1 R41 GM 088922-01A1 The Wyckoff Lab Lee Likins, Scott Foy, Ming Yang Ada Solidar (B-tech Consulting) HaRo Pharmaceuticals Tomasz Skorski (Temple University) The Miziorko Lab (UMKC) NIH 2 R44 GM097902-02A1 NIH 1 R21 AI113552-01 VaSSA Informatics, LLC for major funding John VanNice Digital Sandbox KC Andrew Skaff Jeff Murphy (Nickel City Software) Brian Geisbrecht (K-State) Missouri Technology Corporation UMKC SBS, UMRB, UMKC FRG, KCALSI for additional funding And his lab John Walker (SLU) For Further Information, contact: wyckoffg@umkc.edu

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#