Y Chromosomal Haplogroups in Genetic Studies

Using Y chromosomal haplogroups in

genetic association studies and

suggested implications

Translation:

 “Can (non-recombining) paternal ancestry

         information be of any use in GWAS?”

Erzurumluoglu

 et al

., Apr 2016

Journal club: 15/06/16

Mesut Erzurumluoglu

Nice review

Use of Y chromosome and mitochondrial DNA population structure in tracing

human migrations. Underhill

et al.

, 2007, Ann Rev Genet

Fantastic website

http://www.eupedia.com/europe/origins_haplogroups_europe.shtml

Introduction to Population Genetics

Example of transmission of

autosomes (and X

chromosome) to offspring

Father

Mother

Children

Recombination event

during meiosis

Inheriting Y-DNA and mtDNA

National Geographic

Y-DNA phylogenetic tree

Underhill

et al

, Dec 2007

Haplogroups



“Group of haplotypes which share a

common ancestor with a SNP mutation”

◦

Exercise next slide



Can give information about ancestry,

bottlenecks (e.g. war, catastrophe),

migrations and disease

◦

Where anthropology, history and genetics

meets

Haplogroups – Exercise

SNP 1

SNP 1

SNP 1

SNP 2

SNP 2

SNP 2

SNP 3

SNP 4

SNP 5

SNP 6

SNP 7

SNP 8

SNP 6

6 individuals (males) genotyped at 8 Y-DNA SNP loci

How many haplogroups (clusters) are there?

Haplogroups

SNP 1

SNP 1

SNP 1

SNP 2

SNP 3

SNP 4

SNP 5

SNP 7

Haplogroup A

Ancestral SNP: SNP1

Haplogroup B

Ancestral SNP: SNP2

Haplogroup B1

Ancestral SNP: SNP6

Introduction to study



In apparently homogeneous populations (as defined by

PCA), there is still Y-DNA haplogroup variation which

will result from population history.

◦

Therefore, hidden stratification and/or differential

phenotypic effects by Y-DNA haplogroups could exist



To test this, we hypothesised that stratifying individuals

according to their Y-DNA haplogroups before testing

associations between autosomal SNPs and phenotypes

will yield difference in association



Chose to study BMI as a few Y-DNA haplogroups have

been directly associated with cardio-metabolic

diseases

◦

Larger sample sizes in ALSPAC and 1958BC

◦

BMI studies were well-established at the time of study

(2012)

Theory

Is there Y-DNA variability and sub-clustering within a PCA defined

‘homogenous’ cluster (e.g. in CEU)?

A real life example would be the North/South sub-clustering that one observes

in some populations (e.g. Italian)

PC1

PC2

Y-DNA haplogroup

R and I?

Aims and objectives



Derive Y-DNA haplogroups of 6,537 males

from two cohorts

◦

ALSPAC (N=5,080, 816 Y-DNA SNPs)

◦

1958 Birth Cohort (N=1,457, 1,849 Y-DNA

SNPs)



Carry out association study within different

strata using 32 (published) BMI SNPs



Explore (sub)stratification within PCA

defined groups

◦

Does it exist?

◦

What effect? If any…

Methods



Genome-wide SNP chip genotyping of 9,912 individuals



QC – PLINK

◦

Individuals



Mismatch of sex



Minimal or excessive heterozygosity



<0.320 and >0.345 for the Sanger data



<0.310 and >0.330 for the LabCorp data



Individual missingness >3%



Cryptic relatedness >10% IBD



Non-European (and White) ancestry

◦

SNPs



MAF <1%



Call rate <95%



HWE P<5e-7



Pseudo-autosomal region



Imputation – MaCH

Methods (continued)



Y-DNA haplogroup determination – Yfitter

◦

Haplogroups clustered under ‘Clade’ name



Association study between BMI and the Y-DNA

haplogroups in ALSPAC - Stata

◦

Individuals in haplogroup R used as baseline (coded as ‘0’),

and individuals in haplogroup I coded as ‘

’

◦

Covariates: Age, age^

 and top 10 PCs



Analysis of the effects of Y-DNA haplogroups on SNPs

associated with BMI - Stata

◦

32 SNPs associated with BMI (Speliotes

et al

, 2010)

◦

Covariates: Age, age^

 and top 10 PCs

◦

Likelihood ratio test



Compare the two regression models: With and without an

interaction term (genotype dosage x Y-DNA haplogroup)



Replication in 1958BC - Stata

Results – ALSPAC

9,912 participants with

500,527 genotyped SNPs

Y-DNA haplogroup clades R

(72%) and I (19%)

8,365 unrelated

individuals

QC

Y-DNA

5,080 males with Y-DNA

haplogroup data

Y-DNA Clades

Haplogroup

Results – ALSPAC Y-DNA profile

Results – ALSPAC Y-DNA profile

Y-DNA haplogroups in ALSPAC

ALSPAC has within it individuals belonging to 12 of the major Y-DNA haplogroups (C, E, G, H, I,

J, L, N, O, Q, R, T), albeit only 5 of the groups have 50 (>1%) or more individuals in them. These

five clades are E, G, I, J and R and have 153 (3%), 94 (1.9%), 960 (19%), 142 (2.8%) and 3564

(72%) individuals in them respectively.

Results – 1958BC Y-DNA profile

Results – 1958BC Y-DNA profile

Y-DNA haplogroups in 1958 Birth Cohort (Dataset: EGAD00000000022)

1958BC has within it individuals belonging to five major Y-DNA haplogroups (E, I, J, C, R), albeit

2 of the groups have less than 50 individuals in them. The clades E, I, J, C and R have 44 (3%),

296 (20%), 34 (2%), 1 and 1078 (74%) individuals in them respectively.

Results – Direct Association

Results – Direct Association

Linear regression between BMI and Y-DNA haplogroup I in ALSPAC and 1958BC

The z-test for heterogeneity shows that the effect size of Y-DNA haplogroup I on BMI is

differential depending on the cohort (albeit not to be taken seriously as result has not been

replicated somewhere else – see discussion slide)

Figures a-c: Subgroup analysis comparing effect size of rs8050136 on BMI in two Y-DNA

haplogroups

The statistics above represent p values from the likelihood ratio test for interaction between Y-DNA

haplogroup I and rs8050136 (

FTO

). Heterogeneity tests (z test) comparing Y-DNA haplogroups I

and R yielded p values of 0.005, 0.4169 and 0.014 for a, b and c respectively. ggplot2 package in R

was used to create the plot.

Results – Interaction

Results – Interaction

ALSPAC

1958BC

ALSPAC + 1958BC

Results – Sub-clustering

Results – Sub-clustering

PC1

PC2

Y-DNA haplogroup vs top two principal components in ALSPAC individuals

Plotting the Y-DNA haplogroup clades on a PCA plot reveals that there is

no

apparent sub-clustering

within the ALSPAC individuals. Thus adding Y-DNA haplogroup information as covariates to control

for additional population stratification in ALSPAC is not needed

Conclusions



Haplogroup R is the most frequent (72%) and I

the second most common Y-DNA haplogroup

(19%) in ALSPAC (and 1958BC)



There was no strong evidence of association

between Y-DNA haplogroups and BMI

◦

P=0.066 in ALSPAC

◦

P=0.107 in 1958 cohort



One instance of heterogeneity (p= 0.008)

between the two haplogroups was observed

◦

Beta=0.266 (SE=0.066) and 0.079 (SE=0.032) in Y-

DNA haplogroup I and R respectively

◦

But result not replicated in 1958BC



No sub-clustering observed in ALSPAC

Discussion



Population stratification is a potential confounder in genetic

association studies. Haplotypic variation and sub-clustering

can still be present even after accounting for principal

components



Largest Y-DNA haplogroup analysis



Largest Y-DNA haplogroup profile for the (South-West of

the) UK



First analysis of its kind

◦

i.e. a stratified approach using Y-DNA haplogroups



No sub-clustering observed in ALSPAC

◦

Structure in common variant analysis does not seem to be a

problem after controlling for principal components



If result was replicated, Y-DNA haplogroup I could potential

have been a marker for an (unknown) epigenetic (GxE) or

epistatic (GxG) interaction – following slide

Discussion



Stratified analysis of Y-DNA haplogroups could be

used to:

(i)

Inform genetic association studies by identifying which

haplogroup(s) account for the association

(ii)

Assess the strength of known associations and

observing whether it still holds in all Y-DNA

haplogroups

(iii)

Observe an effect modification

(iv)

In relation to point (iii), determine whether the effect

modification observed gives a hint about epigenetic

(GxE) interactions or epistasis (GxG) occurring

between the autosomal loci being analysed and the Y-

DNA loci associated with the haplogroup(s) in which

one observes that effect modification

(i)

Y-DNA (or mtDNA) haplogroup info could be a way of

identifying sub-groups of individuals for different

traits/diseases. Real-life example: BiDil (a drug for the

treatment of heart failure in self-identified black patients)

Limitations of Y-DNA haplogroup

studies



Y-DNA haplogroup analyses excludes

females



Sample size, especially in deeper branches

of the Y-DNA phylogenetic tree



Very hard to find independent cohorts to

replicate studies

◦

ALSPAC and 1958BC studies kids and adults

◦

Even Europe-based (non-UK) cohorts have

remarkably different Y-DNA haplogroup

profiles

Appendices

European Y-DNA haplogroups

European mtDNA haplogroups

My haplogroups



Y-DNA haplogroup = R1b1b2a

R1b1b2 is the most common haplogroup in western Europe (>50% of

men). Ancient representatives of the haplogroup were among the first

people to repopulate the western part of Europe after the Ice Age

ended about 12,000 years ago.

Region: Europe

Example Populations: Irish, Basques, British, French

23andme

My haplogroups



mtDNA haplogroup = H1

Haplogroup H1 is widespread in Europe, especially the western part of

the continent. It originated about 13,000 years ago (likely in the Iberian

peninsula), not long after the Ice Age ended

Region: Europe, Near East, Central Asia, Northwestern Africa

Example Populations: Norway (30%), Spanish (25%), Berbers, Lebanese

Highlight: H1 appears to have been common in Doggerland, an ancient land now flooded by

the North Sea.

23andme

Slide Note

Embed Share

Download

Exploring the utility of non-recombining paternal ancestry information in Genome-Wide Association Studies (GWAS) through the analysis of Y chromosomal haplogroups. This review delves into the implications of using Y chromosome and mitochondrial DNA data in tracing human migrations, ancestry, bottlenecks, and disease patterns. The study also highlights the importance of considering Y-DNA haplogroups in association studies to uncover hidden stratifications and differential phenotypic effects. The research involves investigating the relationship between Y-DNA haplogroups and traits such as BMI and cardio-metabolic diseases using large sample sizes from established studies.

anos133 Follow

Uploaded on Oct 01, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Using Y chromosomal haplogroups in genetic association studies and suggested implications Erzurumluoglu et al., Apr 2016 Translation: Can (non-recombining) paternal ancestry information be of any use in GWAS? Nice review: Use of Y chromosome and mitochondrial DNA population structure in tracing human migrations. Underhill et al., 2007, Ann Rev Genet Fantastic website: http://www.eupedia.com/europe/origins_haplogroups_europe.shtml Journal club: 15/06/16 Mesut Erzurumluoglu

Introduction to Population Genetics Father Mother Children Example of transmission of autosomes (and X chromosome) to offspring Recombination event during meiosis

Inheriting Y-DNA and mtDNA National Geographic

Y-DNA phylogenetic tree Underhill et al, Dec 2007

Haplogroups Group of haplotypes which share a common ancestor with a SNP mutation Exercise next slide Can give information about ancestry, bottlenecks (e.g. war, catastrophe), migrations and disease Where anthropology, history and genetics meets

Haplogroups Exercise SNP 1 SNP 1 SNP 4 SNP 1 SNP 3 SNP 5 SNP 8 SNP 6 SNP 2 SNP 6 SNP 2 SNP 2 SNP 7 6 individuals (males) genotyped at 8 Y-DNA SNP loci How many haplogroups (clusters) are there?

Haplogroups SNP 1 SNP 4 SNP 6 SNP 2 SNP 2 SNP 1 SNP 1 SNP 7 SNP 3 SNP 8 SNP 5 SNP 6 SNP 2 Haplogroup A Ancestral SNP: SNP1 Haplogroup B1 Ancestral SNP: SNP6 Haplogroup B Ancestral SNP: SNP2

Introduction to study In apparently homogeneous populations (as defined by PCA), there is still Y-DNA haplogroup variation which will result from population history. Therefore, hidden stratification and/or differential phenotypic effects by Y-DNA haplogroups could exist To test this, we hypothesised that stratifying individuals according to their Y-DNA haplogroups before testing associations between autosomal SNPs and phenotypes will yield difference in association Chose to study BMI as a few Y-DNA haplogroups have been directly associated with cardio-metabolic diseases Larger sample sizes in ALSPAC and 1958BC BMI studies were well-established at the time of study (2012)

Theory Y-DNA haplogroup R and I? PC1 PC2 Is there Y-DNA variability and sub-clustering within a PCA defined homogenous cluster (e.g. in CEU)? A real life example would be the North/South sub-clustering that one observes in some populations (e.g. Italian)

Aims and objectives Derive Y-DNA haplogroups of 6,537 males from two cohorts ALSPAC (N=5,080, 816 Y-DNA SNPs) 1958 Birth Cohort (N=1,457, 1,849 Y-DNA SNPs) Carry out association study within different strata using 32 (published) BMI SNPs Explore (sub)stratification within PCA defined groups Does it exist? What effect? If any

Methods Genome-wide SNP chip genotyping of 9,912 individuals QC PLINK Individuals Mismatch of sex Minimal or excessive heterozygosity <0.320 and >0.345 for the Sanger data <0.310 and >0.330 for the LabCorp data Individual missingness >3% Cryptic relatedness >10% IBD Non-European (and White) ancestry SNPs MAF <1% Call rate <95% HWE P<5e-7 Pseudo-autosomal region Imputation MaCH

Methods (continued) Y-DNA haplogroup determination Yfitter Haplogroups clustered under Clade name Association study between BMI and the Y-DNA haplogroups in ALSPAC - Stata Individuals in haplogroup R used as baseline (coded as 0 ), and individuals in haplogroup I coded as 1 Covariates: Age, age^2 and top 10 PCs Analysis of the effects of Y-DNA haplogroups on SNPs associated with BMI - Stata 32 SNPs associated with BMI (Speliotes et al, 2010) Covariates: Age, age^2 and top 10 PCs Likelihood ratio test Compare the two regression models: With and without an interaction term (genotype dosage x Y-DNA haplogroup) Replication in 1958BC - Stata

Results ALSPAC 9,912 participants with 500,527 genotyped SNPs QC 8,365 unrelated individuals Y-DNA Haplogroup 5,080 males with Y-DNA haplogroup data Y-DNA Clades Y-DNA haplogroup clades R (72%) and I (19%)

Results ALSPAC Y-DNA profile Y-DNA haplogroups in ALSPAC ALSPAC has within it individuals belonging to 12 of the major Y-DNA haplogroups (C, E, G, H, I, J, L, N, O, Q, R, T), albeit only 5 of the groups have 50 (>1%) or more individuals in them. These five clades are E, G, I, J and R and have 153 (3%), 94 (1.9%), 960 (19%), 142 (2.8%) and 3564 (72%) individuals in them respectively.

Results 1958BC Y-DNA profile Y-DNA haplogroups in 1958 Birth Cohort (Dataset: EGAD00000000022) 1958BC has within it individuals belonging to five major Y-DNA haplogroups (E, I, J, C, R), albeit 2 of the groups have less than 50 individuals in them. The clades E, I, J, C and R have 44 (3%), 296 (20%), 34 (2%), 1 and 1078 (74%) individuals in them respectively.

Results Direct Association Linear regression between BMI and Y-DNA haplogroup I in ALSPAC and 1958BC The z-test for heterogeneity shows that the effect size of Y-DNA haplogroup I on BMI is differential depending on the cohort (albeit not to be taken seriously as result has not been replicated somewhere else see discussion slide)

Results Interaction ALSPAC 1958BC ALSPAC + 1958BC Figures a-c: Subgroup analysis comparing effect size of rs8050136 on BMI in two Y-DNA haplogroups The statistics above represent p values from the likelihood ratio test for interaction between Y-DNA haplogroup I and rs8050136 (FTO). Heterogeneity tests (z test) comparing Y-DNA haplogroups I and R yielded p values of 0.005, 0.4169 and 0.014 for a, b and c respectively. ggplot2 package in R was used to create the plot.

Results Sub-clustering PC1 PC2 Y-DNA haplogroup vs top two principal components in ALSPAC individuals Plotting the Y-DNA haplogroup clades on a PCA plot reveals that there is no apparent sub-clustering within the ALSPAC individuals. Thus adding Y-DNA haplogroup information as covariates to control for additional population stratification in ALSPAC is not needed

Conclusions Haplogroup R is the most frequent (72%) and I the second most common Y-DNA haplogroup (19%) in ALSPAC (and 1958BC) There was no strong evidence of association between Y-DNA haplogroups and BMI P=0.066 in ALSPAC P=0.107 in 1958 cohort One instance of heterogeneity (p= 0.008) between the two haplogroups was observed Beta=0.266 (SE=0.066) and 0.079 (SE=0.032) in Y- DNA haplogroup I and R respectively But result not replicated in 1958BC No sub-clustering observed in ALSPAC

Discussion Population stratification is a potential confounder in genetic association studies. Haplotypic variation and sub-clustering can still be present even after accounting for principal components Largest Y-DNA haplogroup analysis Largest Y-DNA haplogroup profile for the (South-West of the) UK First analysis of its kind i.e. a stratified approach using Y-DNA haplogroups No sub-clustering observed in ALSPAC Structure in common variant analysis does not seem to be a problem after controlling for principal components If result was replicated, Y-DNA haplogroup I could potential have been a marker for an (unknown) epigenetic (GxE) or epistatic (GxG) interaction following slide

Discussion Stratified analysis of Y-DNA haplogroups could be used to: (i) Inform genetic association studies by identifying which haplogroup(s) account for the association (ii) Assess the strength of known associations and observing whether it still holds in all Y-DNA haplogroups (iii) Observe an effect modification (iv) In relation to point (iii), determine whether the effect modification observed gives a hint about epigenetic (GxE) interactions or epistasis (GxG) occurring between the autosomal loci being analysed and the Y- DNA loci associated with the haplogroup(s) in which one observes that effect modification (i) Y-DNA (or mtDNA) haplogroup info could be a way of identifying sub-groups of individuals for different traits/diseases. Real-life example: BiDil (a drug for the treatment of heart failure in self-identified black patients)

Limitations of Y-DNA haplogroup studies Y-DNA haplogroup analyses excludes females Sample size, especially in deeper branches of the Y-DNA phylogenetic tree Very hard to find independent cohorts to replicate studies ALSPAC and 1958BC studies kids and adults Even Europe-based (non-UK) cohorts have remarkably different Y-DNA haplogroup profiles

Appendices

European Y-DNA haplogroups

European mtDNA haplogroups

My haplogroups Y-DNA haplogroup = R1b1b2a R1b1b2 is the most common haplogroup in western Europe (>50% of men). Ancient representatives of the haplogroup were among the first people to repopulate the western part of Europe after the Ice Age ended about 12,000 years ago. Region: Europe Example Populations: Irish, Basques, British, French 23andme

My haplogroups mtDNA haplogroup = H1 Haplogroup H1 is widespread in Europe, especially the western part of the continent. It originated about 13,000 years ago (likely in the Iberian peninsula), not long after the Ice Age ended Region: Europe, Near East, Central Asia, Northwestern Africa Example Populations: Norway (30%), Spanish (25%), Berbers, Lebanese Highlight: H1 appears to have been common in Doggerland, an ancient land now flooded by the North Sea. 23andme

Y Chromosomal Haplogroups in Genetic Studies

Download Presentation

Presentation Transcript

Related

More Related Content