Advancing Multi-Omics Research with Integrated Methods

 
M
E
S
A
 
M
u
l
t
i
-
O
m
i
c
s
:
M
e
t
h
o
d
s
 
D
e
v
e
l
o
p
m
e
n
t
 
on behalf of the
MESA Multi-Omics Pilot Project Committee
 
 
W
h
y
 
m
u
l
t
i
-
o
m
i
c
s
 
m
u
l
t
i
-
v
a
r
i
a
t
e
 
m
e
t
h
o
d
s
?
 
Obtaining the “big picture” of the entire “data
landscape” requires 
integration
 
of the multiple “omics”
datasets with each other
 
eg
 Integrate phenotypes, metabolites, expression, methylation, and SNPs,
into a single analysis
 
Yet 
also
 obtaining the “needle(s) in the haystack”
requires 
 feature selection 
appropriate to high
dimensional data
 
eg
 Linear regression on high dimensional data can lead to erroneous
conclusions
 
Lange et al 2014. 
Ann Rev Stat Appl 
1:279-300.
Donoho, 2000. 
High Dimensional Data Analyses: The Curses and Blessings of Dimensionality.
Buhlmann & van de Geer 2011. 
Statistics for high dimensional data.
James, et al 2013. 
Statistical Learning.
 
O
v
e
r
v
i
e
w
 
o
f
 
M
e
t
h
o
d
s
 
Matrix-based methods
Integration of datasets
Dimension reduction
Robust to noise
Examples:
principal components association
partial least squares
canonical correlation analyses
Sparse methods
Feature selection
Examples
Penalized linear regression
LASSO or “Ridge” penalty
 
 
E
x
a
m
p
l
e
 
1
:
C
u
r
r
e
n
t
 
M
E
S
A
 
m
u
l
t
i
o
m
i
c
s
 
d
a
t
a
 
1191 subjects across 3 ethnic groups
Affy 6.0 GWAS data from MESA SHARE
CardioMetabochip data
Expression chip and methylation chip data from Lu
ancillary
Further reduced:
CAD snps with MAF>0.05 for CM chip: 4655 SNPs
Expression data with SD in 4
th
 quartile: 14,619 probes
Methylation data withSD in 4
th
 quartile: 121,441
methylation sites
 
 
C
a
n
o
n
i
c
a
l
 
c
o
r
r
e
l
a
t
i
o
n
 
a
n
a
l
y
s
i
s
 
Variation In
Expression NOT
Associated
 
Variation In
Methylation NOT
Associated
Variation Associated
Across BOTH
 
Expression
 
Methylation
Expression
Methylation
Expression
Methylation
C14orf167,
 aka DHRS4-AS
Control
 of DHRS gene cluster
 
short
 chain dehydrogenase
metabolism
mitochondrial
 protein
 
E
x
a
m
p
l
e
 
2
:
 
I
R
A
S
 
D
a
t
a
 
L
a
n
d
s
c
a
p
e
 
Phenotype:
Sensitivity Index measured by FSIGT
Residuals after covariate adjustment for age, gender,
BMI
SNP
:
773,9
65
 nuclear SNPs with MAF > 0.1
14 mitochondrial SNPs
Metabolite
s:
LC-Mass Spec by Metabolon Inc.
848 targeted metabolites
 
 
Sparse partial least squares regression applied to data
blocks
5 principal components
 analyzed
10 variates per component retained (sparsity)
 
Data Landscape as 
M
ulti-
O
mic Data 
“B
locks”
Loadings on component 1
Block “nuclear snps”
 
Metabolites
 
MT
 SNPS
 
Genomic SNPs
 
Insulin Sens
Results:
Loadings
from the
first
component
associated
with 
Insulin
Sensitivity
for Each
Block
Loadings on component
1
Block “metabolites”
1-(1-enyl-palmitoyl)-2-oleoyl-GPC (P-
16:0/18:1)*
1-(1-enyl-palmitoyl)-2-linoleoyl-GPC (P-
16:0/18:2)*
1-(1-enyl-stearoyl)-2-oleoyl-GPC (P-
18:0/18:1)
1-(1-enyl-stearoyl)-2-linoleoyl-GPC (P-18:0/18:2)*
1-(1-enyl-palmitoyl)-2-palmitoleoyl-GPC
(P-16:0/16:1)*
1-(1-enyl-palmitoyl)-2-docosahexaenoyl-GPC
(P-16:0/22:6)*
1-(1-enyl-palmitoyl)-2-arachidonoyl-GPC (P-
16:0/20:4)*
1-(1-enyl-palmitoyl)-2-palmitoyl-GPC (P-
16:0/16:0)*
M
e
t
a
b
o
l
i
t
e
 
b
l
o
c
k
:
P
l
a
s
m
a
l
o
g
e
n
 
p
a
t
h
w
a
y
 
m
e
t
a
b
o
l
i
t
e
s
 
 
G
e
n
o
m
i
c
 
S
N
P
 
b
l
o
c
k
:
S
m
o
o
t
h
 
m
u
s
c
l
e
 
m
y
o
s
i
n
 
(
M
Y
H
1
1
)
 
RP11-871F6.3 to REGL
 
Chr 12
 
MYH11
 
Smooth Muscle Myosin
 
Chr 11
M
i
t
o
c
h
o
n
d
r
i
a
l
 
H
a
p
l
o
g
r
o
u
p
 
B
l
o
c
k
:
M
i
t
o
c
h
o
n
d
r
i
a
l
 
h
a
p
l
o
g
r
o
u
p
s
 
C
,
D
 
a
r
e
 
n
e
g
a
t
i
v
e
l
y
 
a
s
s
o
c
i
a
t
e
d
w
i
t
h
 
S
I
 
C
o
m
p
o
n
e
n
t
 
1
 
M
T
 
h
a
p
l
o
g
r
o
u
p
s
 
C
,
D
 
f
o
r
m
 
o
n
e
 
b
r
a
n
c
h
o
f
 
M
T
 
h
a
p
l
o
g
r
o
u
p
s
 
i
n
 
L
a
t
i
n
o
/
H
i
s
p
a
n
i
c
s
 
C
o
m
p
a
r
i
s
o
n
 
w
i
t
h
 
a
 
l
i
n
e
a
r
 
m
o
d
e
l
 
Adjusted R
2
 0.318;
 F statistic 71.53 on 6 and 905 DF;
pvalue<2.2e-16
 
D
i
s
c
u
s
s
i
o
n
 
A “multi-omic” analysis of several data blocks,
insulin sensitivity phenotype, metabolite,mt-SNP, &
genomic-SNP dataset using partial least squares
suggested:
Smooth muscle is a location of insulin resistance
in Latino/Hispanics
Changes in plasmalogen metabolites are
involved in the physiology
The observed association occurs follows the “A
& B” mitochondrial haplogroup lineage
 
F
u
r
t
h
e
r
 
w
o
r
k
 
Increase workspace/computer infrastructure
Develop “data shaping” tools
Increase ability to perform analyses on ever larger
matrices
“BigData” methods
Increase speed of matrix calculations
Allows cross-validation procedure to determine
the value of the penalty to use
“fastPCA” methods perform matrix calculations
using approximation shortcuts
Parallelization and gpu’s under exploration
Slide Note
Embed
Share

Exploring the importance of multi-variate methods in multi-omics research to integrate diverse datasets such as phenotypes, metabolites, expression, methylation, and SNPs. The overview covers matrix-based methods, sparse methods for feature selection, and an example analysis from the MESA Multi-Omics Pilot Project. Canonical correlation analysis is also discussed in relation to expression and methylation data.

  • Multi-Omics Research
  • Integrated Methods
  • Canonical Correlation Analysis
  • Feature Selection
  • High-Dimensional Data

Uploaded on Oct 10, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. MESA MESA Multi Methods Development Methods Development Multi- -Omics Omics : : on behalf of the MESA Multi-Omics Pilot Project Committee

  2. Why multi Why multi- -omics omics multi multi- -variate variate methods? methods? Obtaining the big picture of the entire data landscape requires integration of the multiple omics datasets with each other eg Integrate phenotypes, metabolites, expression, methylation, and SNPs, into a single analysis Yet also obtaining the needle(s) in the haystack requires feature selection appropriate to high dimensional data eg Linear regression on high dimensional data can lead to erroneous conclusions Lange et al 2014. Ann Rev Stat Appl 1:279-300. Donoho, 2000. High Dimensional Data Analyses: The Curses and Blessings of Dimensionality. Buhlmann & van de Geer 2011. Statistics for high dimensional data. James, et al 2013. Statistical Learning.

  3. Overview of Methods Overview of Methods Matrix-based methods Integration of datasets Dimension reduction Robust to noise Examples: principal components association partial least squares canonical correlation analyses Sparse methods Feature selection Examples Penalized linear regression LASSO or Ridge penalty

  4. Example 1: Example 1: Current MESA Current MESA multiomics multiomics data data 1191 subjects across 3 ethnic groups Affy 6.0 GWAS data from MESA SHARE CardioMetabochip data Expression chip and methylation chip data from Lu ancillary Further reduced: CAD snps with MAF>0.05 for CM chip: 4655 SNPs Expression data with SD in 4thquartile: 14,619 probes Methylation data withSD in 4thquartile: 121,441 methylation sites

  5. Canonical correlation analysis Canonical correlation analysis Expression Methylation Variation In Expression NOT Associated Variation Associated Across BOTH Variation In Methylation NOT Associated

  6. Expression Methylation

  7. Expression Methylation C14orf167, aka DHRS4-AS Control of DHRS gene cluster short chain dehydrogenase metabolism mitochondrial protein

  8. Example 2: IRAS Example 2: IRAS Da Phenotype: Sensitivity Index measured by FSIGT Residuals after covariate adjustment for age, gender, BMI SNP: 773,965 nuclear SNPs with MAF > 0.1 14 mitochondrial SNPs Metabolites: LC-Mass Spec by Metabolon Inc. 848 targeted metabolites Data Landscape ta Landscape

  9. Data Landscape as Multi-Omic Data Blocks P (features) N (subjects) SI Sensitivity Index residuals after covariate adjustment 1 906 M metabolites 848 906 MT MT snps 14 906 G SNPs 773,965 906 Sparse partial least squares regression applied to data blocks 5 principal components analyzed 10 variates per component retained (sparsity)

  10. Results: Loadings from the first component associated with Insulin Sensitivity for Each Block Metabolites MT SNPS Loadings on component 1 Block nuclear snps Genomic SNPs Insulin Sens

  11. Metabolite block: Metabolite block: Plasmalogen Plasmalogenpathway metabolites pathway metabolites Loadings on component 1 Block metabolites 1-(1-enyl-palmitoyl)-2-arachidonoyl-GPC (P- 16:0/20:4)* 1-(1-enyl-palmitoyl)-2-palmitoyl-GPC (P- 16:0/16:0)* 1-(1-enyl-palmitoyl)-2-docosahexaenoyl-GPC (P-16:0/22:6)* 1-(1-enyl-palmitoyl)-2-palmitoleoyl-GPC (P-16:0/16:1)* 1-(1-enyl-stearoyl)-2-oleoyl-GPC (P- 18:0/18:1) 1-(1-enyl-stearoyl)-2-linoleoyl-GPC (P-18:0/18:2)* 1-(1-enyl-palmitoyl)-2-linoleoyl-GPC (P- 16:0/18:2)* 1-(1-enyl-palmitoyl)-2-oleoyl-GPC (P- 16:0/18:1)*

  12. Genomic SNP block: Genomic SNP block: Smooth muscle myosin (MYH11) Smooth muscle myosin (MYH11) RP11-871F6.3 to REGL Chr 12 MYH11 Smooth Muscle Myosin Chr 11

  13. Mitochondrial Mitochondrial Haplogroup Mitochondrial Mitochondrial haplogroups with SI Component 1 with SI Component 1 HaplogroupBlock: haplogroups C,D are negatively associated C,D are negatively associated Block:

  14. MT MT haplogroups haplogroups C,D form one branch C,D form one branch of MT of MT haplogroups haplogroups in Latino/Hispanics in Latino/Hispanics

  15. Comparison with a linear model Comparison with a linear model Coefficients Intercept Estimate 5.23 pvalue <2E-16 Age -0.024 6.4E-10 BMI -0.133 <2E-16 Gender 0.119 ns 1-(1-enyl-palmitoyl)-2-oleoyl-GPC 1.43 2.9E-14 (P-16:0/18:1)* rs8045778_T & MitoG15044A_G rs8045778_T & MitoG15044A_A Adjusted R20.318; F statistic 71.53 on 6 and 905 DF; pvalue<2.2e-16 -0.345 -0.076 0.00064 ns

  16. Discussion Discussion A multi-omic analysis of several data blocks, insulin sensitivity phenotype, metabolite,mt-SNP, & genomic-SNP dataset using partial least squares suggested: Smooth muscle is a location of insulin resistance in Latino/Hispanics Changes in plasmalogen metabolites are involved in the physiology The observed association occurs follows the A & B mitochondrial haplogroup lineage

  17. Further work Further work Increase workspace/computer infrastructure Develop data shaping tools Increase ability to perform analyses on ever larger matrices BigData methods Increase speed of matrix calculations Allows cross-validation procedure to determine the value of the penalty to use fastPCA methods perform matrix calculations using approximation shortcuts Parallelization and gpu s under exploration

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#