Understanding Y Chromosomal Haplogroups in Genetic Studies
Exploring the utility of non-recombining paternal ancestry information in Genome-Wide Association Studies (GWAS) through the analysis of Y chromosomal haplogroups. This review delves into the implications of using Y chromosome and mitochondrial DNA data in tracing human migrations, ancestry, bottlenecks, and disease patterns. The study also highlights the importance of considering Y-DNA haplogroups in association studies to uncover hidden stratifications and differential phenotypic effects. The research involves investigating the relationship between Y-DNA haplogroups and traits such as BMI and cardio-metabolic diseases using large sample sizes from established studies.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Using Y chromosomal haplogroups in genetic association studies and suggested implications Erzurumluoglu et al., Apr 2016 Translation: Can (non-recombining) paternal ancestry information be of any use in GWAS? Nice review: Use of Y chromosome and mitochondrial DNA population structure in tracing human migrations. Underhill et al., 2007, Ann Rev Genet Fantastic website: http://www.eupedia.com/europe/origins_haplogroups_europe.shtml Journal club: 15/06/16 Mesut Erzurumluoglu
Introduction to Population Genetics Father Mother Children Example of transmission of autosomes (and X chromosome) to offspring Recombination event during meiosis
Inheriting Y-DNA and mtDNA National Geographic
Y-DNA phylogenetic tree Underhill et al, Dec 2007
Haplogroups Group of haplotypes which share a common ancestor with a SNP mutation Exercise next slide Can give information about ancestry, bottlenecks (e.g. war, catastrophe), migrations and disease Where anthropology, history and genetics meets
Haplogroups Exercise SNP 1 SNP 1 SNP 4 SNP 1 SNP 3 SNP 5 SNP 8 SNP 6 SNP 2 SNP 6 SNP 2 SNP 2 SNP 7 6 individuals (males) genotyped at 8 Y-DNA SNP loci How many haplogroups (clusters) are there?
Haplogroups SNP 1 SNP 4 SNP 6 SNP 2 SNP 2 SNP 1 SNP 1 SNP 7 SNP 3 SNP 8 SNP 5 SNP 6 SNP 2 Haplogroup A Ancestral SNP: SNP1 Haplogroup B1 Ancestral SNP: SNP6 Haplogroup B Ancestral SNP: SNP2
Introduction to study In apparently homogeneous populations (as defined by PCA), there is still Y-DNA haplogroup variation which will result from population history. Therefore, hidden stratification and/or differential phenotypic effects by Y-DNA haplogroups could exist To test this, we hypothesised that stratifying individuals according to their Y-DNA haplogroups before testing associations between autosomal SNPs and phenotypes will yield difference in association Chose to study BMI as a few Y-DNA haplogroups have been directly associated with cardio-metabolic diseases Larger sample sizes in ALSPAC and 1958BC BMI studies were well-established at the time of study (2012)
Theory Y-DNA haplogroup R and I? PC1 PC2 Is there Y-DNA variability and sub-clustering within a PCA defined homogenous cluster (e.g. in CEU)? A real life example would be the North/South sub-clustering that one observes in some populations (e.g. Italian)
Aims and objectives Derive Y-DNA haplogroups of 6,537 males from two cohorts ALSPAC (N=5,080, 816 Y-DNA SNPs) 1958 Birth Cohort (N=1,457, 1,849 Y-DNA SNPs) Carry out association study within different strata using 32 (published) BMI SNPs Explore (sub)stratification within PCA defined groups Does it exist? What effect? If any
Methods Genome-wide SNP chip genotyping of 9,912 individuals QC PLINK Individuals Mismatch of sex Minimal or excessive heterozygosity <0.320 and >0.345 for the Sanger data <0.310 and >0.330 for the LabCorp data Individual missingness >3% Cryptic relatedness >10% IBD Non-European (and White) ancestry SNPs MAF <1% Call rate <95% HWE P<5e-7 Pseudo-autosomal region Imputation MaCH
Methods (continued) Y-DNA haplogroup determination Yfitter Haplogroups clustered under Clade name Association study between BMI and the Y-DNA haplogroups in ALSPAC - Stata Individuals in haplogroup R used as baseline (coded as 0 ), and individuals in haplogroup I coded as 1 Covariates: Age, age^2 and top 10 PCs Analysis of the effects of Y-DNA haplogroups on SNPs associated with BMI - Stata 32 SNPs associated with BMI (Speliotes et al, 2010) Covariates: Age, age^2 and top 10 PCs Likelihood ratio test Compare the two regression models: With and without an interaction term (genotype dosage x Y-DNA haplogroup) Replication in 1958BC - Stata
Results ALSPAC 9,912 participants with 500,527 genotyped SNPs QC 8,365 unrelated individuals Y-DNA Haplogroup 5,080 males with Y-DNA haplogroup data Y-DNA Clades Y-DNA haplogroup clades R (72%) and I (19%)
Results ALSPAC Y-DNA profile Y-DNA haplogroups in ALSPAC ALSPAC has within it individuals belonging to 12 of the major Y-DNA haplogroups (C, E, G, H, I, J, L, N, O, Q, R, T), albeit only 5 of the groups have 50 (>1%) or more individuals in them. These five clades are E, G, I, J and R and have 153 (3%), 94 (1.9%), 960 (19%), 142 (2.8%) and 3564 (72%) individuals in them respectively.
Results 1958BC Y-DNA profile Y-DNA haplogroups in 1958 Birth Cohort (Dataset: EGAD00000000022) 1958BC has within it individuals belonging to five major Y-DNA haplogroups (E, I, J, C, R), albeit 2 of the groups have less than 50 individuals in them. The clades E, I, J, C and R have 44 (3%), 296 (20%), 34 (2%), 1 and 1078 (74%) individuals in them respectively.
Results Direct Association Linear regression between BMI and Y-DNA haplogroup I in ALSPAC and 1958BC The z-test for heterogeneity shows that the effect size of Y-DNA haplogroup I on BMI is differential depending on the cohort (albeit not to be taken seriously as result has not been replicated somewhere else see discussion slide)
Results Interaction ALSPAC 1958BC ALSPAC + 1958BC Figures a-c: Subgroup analysis comparing effect size of rs8050136 on BMI in two Y-DNA haplogroups The statistics above represent p values from the likelihood ratio test for interaction between Y-DNA haplogroup I and rs8050136 (FTO). Heterogeneity tests (z test) comparing Y-DNA haplogroups I and R yielded p values of 0.005, 0.4169 and 0.014 for a, b and c respectively. ggplot2 package in R was used to create the plot.
Results Sub-clustering PC1 PC2 Y-DNA haplogroup vs top two principal components in ALSPAC individuals Plotting the Y-DNA haplogroup clades on a PCA plot reveals that there is no apparent sub-clustering within the ALSPAC individuals. Thus adding Y-DNA haplogroup information as covariates to control for additional population stratification in ALSPAC is not needed
Conclusions Haplogroup R is the most frequent (72%) and I the second most common Y-DNA haplogroup (19%) in ALSPAC (and 1958BC) There was no strong evidence of association between Y-DNA haplogroups and BMI P=0.066 in ALSPAC P=0.107 in 1958 cohort One instance of heterogeneity (p= 0.008) between the two haplogroups was observed Beta=0.266 (SE=0.066) and 0.079 (SE=0.032) in Y- DNA haplogroup I and R respectively But result not replicated in 1958BC No sub-clustering observed in ALSPAC
Discussion Population stratification is a potential confounder in genetic association studies. Haplotypic variation and sub-clustering can still be present even after accounting for principal components Largest Y-DNA haplogroup analysis Largest Y-DNA haplogroup profile for the (South-West of the) UK First analysis of its kind i.e. a stratified approach using Y-DNA haplogroups No sub-clustering observed in ALSPAC Structure in common variant analysis does not seem to be a problem after controlling for principal components If result was replicated, Y-DNA haplogroup I could potential have been a marker for an (unknown) epigenetic (GxE) or epistatic (GxG) interaction following slide
Discussion Stratified analysis of Y-DNA haplogroups could be used to: (i) Inform genetic association studies by identifying which haplogroup(s) account for the association (ii) Assess the strength of known associations and observing whether it still holds in all Y-DNA haplogroups (iii) Observe an effect modification (iv) In relation to point (iii), determine whether the effect modification observed gives a hint about epigenetic (GxE) interactions or epistasis (GxG) occurring between the autosomal loci being analysed and the Y- DNA loci associated with the haplogroup(s) in which one observes that effect modification (i) Y-DNA (or mtDNA) haplogroup info could be a way of identifying sub-groups of individuals for different traits/diseases. Real-life example: BiDil (a drug for the treatment of heart failure in self-identified black patients)
Limitations of Y-DNA haplogroup studies Y-DNA haplogroup analyses excludes females Sample size, especially in deeper branches of the Y-DNA phylogenetic tree Very hard to find independent cohorts to replicate studies ALSPAC and 1958BC studies kids and adults Even Europe-based (non-UK) cohorts have remarkably different Y-DNA haplogroup profiles
My haplogroups Y-DNA haplogroup = R1b1b2a R1b1b2 is the most common haplogroup in western Europe (>50% of men). Ancient representatives of the haplogroup were among the first people to repopulate the western part of Europe after the Ice Age ended about 12,000 years ago. Region: Europe Example Populations: Irish, Basques, British, French 23andme
My haplogroups mtDNA haplogroup = H1 Haplogroup H1 is widespread in Europe, especially the western part of the continent. It originated about 13,000 years ago (likely in the Iberian peninsula), not long after the Ice Age ended Region: Europe, Near East, Central Asia, Northwestern Africa Example Populations: Norway (30%), Spanish (25%), Berbers, Lebanese Highlight: H1 appears to have been common in Doggerland, an ancient land now flooded by the North Sea. 23andme