Advancements in Statistical Genomics: FarmCPU and Method Development
Exploring the evolution of statistical genomics techniques, this lecture delves into the history of FarmCPU and BLINK, addressing challenges in GWAS and the development of models like PC+SNP+e and PC+Kinship+e. It also covers popular software packages in the field and the importance of moving beyond traditional tools like PLINK.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Statistical Genomics Lecture 21: FarmCPU Zhiwu Zhang Washington State University
Outline History of method and software development FarmCPU BLINK
Models y = PC + SNP + e QTNs + y = PC + Kinship + e y = PC + QTNs + e BLINK: -2LL FarmCPU: -2LL QTNs y = PC + Kinship + SNP + e QTNs +
Problems in GWAS Computing difficulties: millions of markers, individuals, and traits False positives, ex: Amgen scientists tried to replicate 53 high-profile cancer research findings, but could only replicate 6 , Nature, 2012, 483: 531 False negatives
Q PC PC+K EMMA EMMAx Q+K GWAS Stream SELECT MLMM CMLM P3D GCTA ECMLM FST-LMM GEMMA FarmCPU GenAbel BLINK
Speed Power t test improvement improvement GLM Computing speed GenABEL FaST-LMM CMLM ECMLM Select GEMMA P3D/EMMAX SUPER EMMA MLMM MLM Power | type I error
Usage of Software Packages Software Leading Authors Corresponding authors Language Released Citation PUMA Gabriel E. Hoffman Jason G. Mezey C++ 2013 27 TATES Sophie van der Sluis Sophie van der Sluis Fortran 2013 76 GAPIT Lipka AE Zhang Z R 2012 284 MLMM Vincent S Nordborg M R/python 2012 226 GEMMA Zhou X FastLMMChristoph L, Listgarten J, Heckerman D Stephens M Christoph L, Listgarten J, Heckerman D C++ 2012 445 C++ 2011 348 Qxpak M. P rez-Enciso M. P rez-Enciso Fortran 2004 141 EMMAX Kang HM Sabatti C & Eskin E C++ 2010 813 GCTA Jian Y Jian Y C++ 2011 1338 GenABEL Aulchenko YS Aulchenko YS R 2007 990 TASSEL Bradbury, Zhang, and Kroon Bradbury PJ Java 2006 1596 PLINK Purcell S Purcell S C++ 2007 12111 65%
Why human geneticists not go beyond PLINK?
MLM was more enriched on Flowering time genes
Model Development Si: Testing marker Adjustment on marker Q: Population structure K: Kinship Adjustment on covariates S: Pseudo QTNs
SUPER algorithm y = PC + SNP + e y = PC + Kinship + e Bins -2LL QTNs y = PC + Kinship + SNP + e
FarmCPU algorithm y = PC + SNP + e y = PC + Kinship + e Bins -2LL QTNs y = PC + QTNs + SNP + e
Speed Power t test improvement improvement GLM Computing speed GenABEL BLINK FarmCPU FaST-LMM CMLM ECMLM Select GEMMA P3D/EMMAX SUPER EMMA MLMM MLM Power | type I error
FARM-CPU (Fixed And Random Model Circuitous Probability Unification) Fixed model y = M1+ + Mt + mi + e Substitution SNP p1 Pt1 NA NA pl Ptl Mt Ptj Ptk Pt M2 M1 P21 P11 m1 P2j P1j mj P2k P1k mk P2l P1l ml P2 P1 Optimization Random model y = u + e with Var(u) SVD(M)
Re-analysis of Arabidopsis data Xiaolei Liu
It is time for human geneticists to move forward
FarmCPU is computing efficient Testing 60K SNPs
Half million individuals, half million SNPs three days But, PINK new version is faster
Summary History of method and software development FarmCPU