Ridge Regression in Genomic Selection

 
Statistical Genomics
 
Zhiwu Zhang
Washington State University
 
Lecture 25: Ridge Regression
 
Homework 6 (last) due April 28, Friday, 3:10PM
Final exam: May 4, 120 minutes (3:10-5:10PM), 50
Course evaluation starts on April 17
 
Administration
Outline
 
Concept development
Ridge Regression
rrBLUP package
 
Development of genomic Selection
 
MAS
 
Over-fit
 
CV
 
works for a few genes
 
Inaccurate
 
Does not works for
polygenes
 
Whole genome
 
Concept in 1990s
implement in 2000s
 
RR
 and Bayes
 
gBLUP
 
=RR
 
Pedigree+Marker
 
cBLUP/sBLUP
undefined
Concept development
 
Over fitting
Governed by less parameters
Free fixed effects into random effects
Only regulate their distribution
Random effects = total genetic effects of individuals
Random effects = effects of markers
 
Specific interest, nothing behind, e.g. a fertilizer
Limited levels, e.g. M and F only for sex
Access to any specific level
No distribution
Fixed effect
 
Population behind, e.g. average and variance
Many levels, e.g. individuals genetic effects
Distribution
No control to access a specific level
Random effect
 
Pioneers of implementation
 
RR and Bayes
 
Fixed effect model
 
y
 
1
 
x1
 
x2
 
observation
 
mean
 
PC2
 
[
 
]
 
b0
 
b1
 
[
 
b=
 
y = Xb +e
 
]
 
x3
 
x5
 
x6
 
Fixed effect model over-fitting
 
y
 
1
 
x1
 
x2
 
observation
 
mean
 
PC2
 
[
 
]
 
b0
 
b1
 
[
 
b=
 
y = Xb +e
 
]
 
x3
 
x9
 
x10
 
BLUP of individuals
 
y
 
1
 
x1
 
x2
 
observation
 
mean
 
PC2
 
[
 
]
 
=X
 
b0
 
b1
 
[
 
]
 
b=
 
y = Xb + Zu +e
 
Z
 
u=
 
[
 
]
 
Switch individuals to SNPs
 
y
 
1
 
x1
 
observation
 
mean
 
PC2
 
[
 
]
 
=X
 
b0
 
b1
 
[
 
b=
 
y = Xb + Ms +e
 
s=
 
[
 
]
 
]
BLUP on individuals
y = Xb + Zu + e
BLUP on markers
(Z to M, and u to s)
y = Xb + Ms + e
 
Independently invented in many contexts
Different names: e.g. Tikhonov regularization (1963), Phillips–
Twomey method, and constrained linear inversion
Tikhonov, A. N. (1963). "О решении некорректно поставленных
задач и методе регуляризации". 
Doklady Akademii Nauk SSSR
 
151
:
501–504.. Translated in "Solution of incorrectly formulated
problems and the 
regularization
 method". 
Soviet Mathematics
 
4
:
1035–1038.
Phillips, D. L. (1962). "A Technique for the Numerical Solution of
Certain Integral Equations of the First Kind". 
Journal of the ACM
 
9
:
84. 
doi:
10.1145/321105.321114.
 
Ridge Regression
rrBLUP vs. gBLUP
y=
x
1
s
1
 + 
x
2
s
2
 + … + 
x
p
s
p
 + e
 
    ~N(0,
 
s~N(0, I σ
r
2
)
 
U
 
A
 
σ
a
2
)
 
rrBLUP
 
gBLUP
 
u=Ms
 
rrBlupMethod6
ridge
Lm.ridge (from MASS): library(MASS)
rrBLUP
 
 
R packages for ridge regression
 
Ridge Regression + BLUP
EMMA to estimate variance components
rrBLUP R package
 
rrBLUP on CRAN
 
rrBLUP: Ridge Regression and Other Kernels for Genomic Selection
Software for genomic prediction with the RR-BLUP mixed model. One application is to estimate marker effects by ridge regression;
alternatively, BLUPs can be calculated based on an additive relationship matrix or a Gaussian kernel.
Version:
 
4.4
Depends:
 
R (≥ 2.14)
Suggests:
 
parallel
Published:
 
2015-10-28
Author:
 
Jeffrey Endelman
Maintainer:
 
Jeffrey Endelman <endelman at wisc.edu>
License:
 
GPL-3
URL:
 
http://potatobreeding.cals.wisc.edu/software
NeedsCompilation:
 
no
Citation:
 
rrBLUP citation info
Materials:
 
NEWS
CRAN checks:
 
rrBLUP results
Downloads:
Reference manual:
 
rrBLUP.pdf
Package source:
 
rrBLUP_4.4.tar.gz
Windows binaries:
 
r-devel: 
rrBLUP_4.4.zip, r-release: 
rrBLUP_4.4.zip, r-oldrel: 
rrBLUP_4.4.zip
OS X Snow Leopard binaries:
 
r-release: 
rrBLUP_4.4.tgz, r-oldrel: 
rrBLUP_4.3.tgz
OS X Mavericks binaries:
 
r-release: 
rrBLUP_4.4.tgz
Old sources:
 
rrBLUP archive
Reverse dependencies:
Reverse depends:
 
GeneticSubsetter
Reverse imports:
 
PopVar
 
Setup GAPIT
 
#Import GAPIT
#source("http://www.bioconductor.org/biocLite.R")
#biocLite("multtest")
#install.packages("EMMREML")
#install.packages("gplots")
#install.packages("scatterplot3d")
library('MASS') # required for ginv
library(multtest)
library(gplots)
library(compiler) #required for cmpfun
library("scatterplot3d")
library("EMMREML")
source("http://www.zzlab.net/GAPIT/emma.txt")
source("http://www.zzlab.net/GAPIT/gapit_functions.txt")
 
Import data and simulation
 
 
#Import demo data
myGD=read.table(file="http://zzlab.net/GAPIT/data/mdp_numeric.txt",head=T)
myGM=read.table(file="http://zzlab.net/GAPIT/data/mdp_SNP_information.txt",hea
d=T)
myCV=read.table(file="http://zzlab.net/GAPIT/data/mdp_env.txt",head=T)
 
#Simultate 10 QTN on the first half chromosomes
X=myGD[,-1]
index1to5=myGM[,2]<6
X1to5 = X[,index1to5]
taxa=myGD[,1]
 
set.seed(99164)
GD.candidate=cbind(taxa,X1to5)
mySim=GAPIT.Phenotype.Simulation(GD=GD.candidate,GM=myGM[index1to5,],h2=.
5,NQTN=20, effectunit =.95,QTNDist="normal",CV=myCV,cveff=c(.01,.01))
Ridge Regression vs. gBLUP
#Import rrBLUP
#install.packages("rrBLUP")
library(
rrBLUP
)
#prepare data
y <- mySim$Y[,2]
M=as.matrix(X)
#Ridge Regression
ans1 <- mixed.solve(y=y,Z=M)
#gBLUP
K <- tcrossprod(M) #K = MM'
ans2 <- mixed.solve(y=y,K=K)
#Compare GEBV
plot(M%*%ans1$u, ans2$u)
rrBLUP vs GAPIT
myGAPIT <- GAPIT(
Y=mySim$Y,
GD=myGD,
GM=myGM,
group.from=
1000
,
group.to=
1000
)
order.raw=
match
(taxa,myGAPIT$Pred[,1])
plot(ans2$u, myGAPIT$Pred[order.raw,5])
 
first=c("c","a","b","d")
second=c("a","d","c","e","f")
match(first,second)
[1]  3  1 NA  2
Highlight
 
Concept development
Ridge Regression
rrBLUP package
Slide Note
Embed
Share

Explore the concept of ridge regression in genomic selection, involving the development of genomic selection methods, pioneers in implementation, fixed and random effects, and the over-fitting phenomenon. Learn how ridge regression addresses issues of over-fitting by introducing regularization parameters and balancing fixed and random effects to improve the accuracy of predicting genetic values in individuals.

  • Ridge Regression
  • Genomic Selection
  • Over-fitting
  • Fixed Effects
  • Random Effects

Uploaded on Sep 16, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Statistical Genomics Lecture 25: Ridge Regression Zhiwu Zhang Washington State University

  2. Administration Homework 6 (last) due April 28, Friday, 3:10PM Final exam: May 4, 120 minutes (3:10-5:10PM), 50 Course evaluation starts on April 17

  3. Outline Concept development Ridge Regression rrBLUP package

  4. Development of genomic Selection MAS works for a few genes Over-fit CV Does not works for polygenes Inaccurate Concept in 1990s implement in 2000s Whole genome RR and Bayes gBLUP =RR Pedigree+Marker cBLUP/sBLUP

  5. Concept development Over fitting Governed by less parameters Free fixed effects into random effects Only regulate their distribution Random effects = total genetic effects of individuals Random effects = effects of markers

  6. Fixed effect Specific interest, nothing behind, e.g. a fertilizer Limited levels, e.g. M and F only for sex Access to any specific level No distribution

  7. Random effect Population behind, e.g. average and variance Many levels, e.g. individuals genetic effects Distribution No control to access a specific level

  8. Pioneers of implementation RR and Bayes

  9. Fixed effect model SNP1 SNP2 S1 0 2 SNP4 SNP5 S4 2 0 observation mean b= PC2 b1 ] S2 1 2 S5 0 2 [ b0 2 0 0 2 2 0 2 0 x6 x5 y [ 1 x1 x2 x3 ] y = Xb +e

  10. Fixed effect model over-fitting SNP1 SNP2 S1 0 2 SNP9 SNP10 S9 2 0 observation mean b= PC2 b1 ] S2 1 2 S10 0 2 [ b0 2 0 0 2 2 0 2 0 x10 x9 y [ 1 x1 x2 x3 ] y = Xb +e

  11. BLUP of individuals Ind1 u1 1 0 Ind2 u2 0 1 Ind19 Ind20 u19 0 0 observation mean b= PC2 b1 u= [ ] u20 0 0 [ ] b0 0 0 0 0 1 0 0 1 y [ 1 x1 x2 ] =X Z y = Xb + Zu +e

  12. Switch individuals to SNPs SNP1 SNP2 S1 0 2 SNPm-1 SNPm Sm-1 2 0 observation mean b= PC2 b1 s= [ ] S2 1 2 Sm 0 2 [ ] b0 2 0 0 2 2 0 2 0 M y [ 1 x1 ] =X y = Xb + Ms +e

  13. BLUP on individuals y = Xb + Zu + e 2),e~N(0, I??2), u~N(0, A?? ? ? ? ? ? ? ? ?=? ? ? ? +??2 2? 1 ? ? ?? 1 ? ? ? ? ? ? ? ? ? ? ? ? +??2 = ? ? 2? 1 ??

  14. BLUP on markers (Z to M, and u to s) y = Xb + Ms + e 2),e~N(0, I??2),? = ? s~N(0, A?? ? ? ? ? ? ? ? ?=? ? ? ? +??2 2? 1 ? ? ?? 1 ? ? ? ? ? ? ? ? ? ? ? ? +??2 = 2? 1 ?? ??

  15. Ridge Regression Independently invented in many contexts Different names: e.g. Tikhonov regularization (1963), Phillips Twomey method, and constrained linear inversion Tikhonov, A. N. (1963). " ". Doklady Akademii Nauk SSSR151: 501 504.. Translated in "Solution of incorrectly formulated problems and the regularization method". Soviet Mathematics4: 1035 1038. Phillips, D. L. (1962). "A Technique for the Numerical Solution of Certain Integral Equations of the First Kind". Journal of the ACM9: 84. doi:10.1145/321105.321114.

  16. rrBLUP vs. gBLUP s~N(0, I r2) rrBLUP y=x1s1 + x2s2+ + xpsp + e A a2) gBLUP ~N(0, U

  17. u=Ms if A=MM

  18. R packages for ridge regression rrBlupMethod6 ridge Lm.ridge (from MASS): library(MASS) rrBLUP

  19. rrBLUP R package Ridge Regression + BLUP EMMA to estimate variance components

  20. rrBLUP on CRAN rrBLUP: Ridge Regression and Other Kernels for Genomic Selection Software for genomic prediction with the RR-BLUP mixed model. One application is to estimate marker effects by ridge regression; alternatively, BLUPs can be calculated based on an additive relationship matrix or a Gaussian kernel. Version: 4.4 Depends: R ( 2.14) Suggests: parallel Published: 2015-10-28 Author: Jeffrey Endelman Maintainer: Jeffrey Endelman <endelman at wisc.edu> License: GPL-3 URL: http://potatobreeding.cals.wisc.edu/software NeedsCompilation: no Citation: rrBLUP citation info Materials: NEWS CRAN checks: rrBLUP results Downloads: Reference manual: rrBLUP.pdf Package source: rrBLUP_4.4.tar.gz Windows binaries: r-devel: rrBLUP_4.4.zip, r-release: rrBLUP_4.4.zip, r-oldrel: rrBLUP_4.4.zip OS X Snow Leopard binaries: r-release: rrBLUP_4.4.tgz, r-oldrel: rrBLUP_4.3.tgz OS X Mavericks binaries: r-release: rrBLUP_4.4.tgz Old sources: rrBLUP archive Reverse dependencies: Reverse depends: GeneticSubsetter Reverse imports: PopVar

  21. Setup GAPIT #Import GAPIT #source("http://www.bioconductor.org/biocLite.R") #biocLite("multtest") #install.packages("EMMREML") #install.packages("gplots") #install.packages("scatterplot3d") library('MASS') # required for ginv library(multtest) library(gplots) library(compiler) #required for cmpfun library("scatterplot3d") library("EMMREML") source("http://www.zzlab.net/GAPIT/emma.txt") source("http://www.zzlab.net/GAPIT/gapit_functions.txt")

  22. Import data and simulation #Import demo data myGD=read.table(file="http://zzlab.net/GAPIT/data/mdp_numeric.txt",head=T) myGM=read.table(file="http://zzlab.net/GAPIT/data/mdp_SNP_information.txt",hea d=T) myCV=read.table(file="http://zzlab.net/GAPIT/data/mdp_env.txt",head=T) #Simultate 10 QTN on the first half chromosomes X=myGD[,-1] index1to5=myGM[,2]<6 X1to5 = X[,index1to5] taxa=myGD[,1] set.seed(99164) GD.candidate=cbind(taxa,X1to5) mySim=GAPIT.Phenotype.Simulation(GD=GD.candidate,GM=myGM[index1to5,],h2=. 5,NQTN=20, effectunit =.95,QTNDist="normal",CV=myCV,cveff=c(.01,.01))

  23. Ridge Regression vs. gBLUP #Import rrBLUP #install.packages("rrBLUP") library(rrBLUP) 15 10 #prepare data y <- mySim$Y[,2] M=as.matrix(X) 5 ans2$u 0 -5 #Ridge Regression ans1 <- mixed.solve(y=y,Z=M) -10 -10 -5 0 5 10 15 M %*% ans1$u #gBLUP K <- tcrossprod(M) #K = MM' ans2 <- mixed.solve(y=y,K=K) #Compare GEBV plot(M%*%ans1$u, ans2$u)

  24. rrBLUP vs GAPIT myGAPIT <- GAPIT( Y=mySim$Y, GD=myGD, GM=myGM, group.from=1000, group.to=1000) myGAPIT$Pred[thematch, 5] 10 order.raw=match(taxa,myGAPIT$Pred[,1]) plot(ans2$u, myGAPIT$Pred[order.raw,5]) 5 0 first=c("c","a","b","d") -5 second=c("a","d","c","e","f") match(first,second) -10 -10 -5 0 5 10 15 [1] 3 1 NA 2 ans2$u

  25. Highlight Concept development Ridge Regression rrBLUP package

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#