Understanding Randomization and Permutation Tests in Statistical Analysis
Compare treatment effects or means using randomization and permutation tests in independent samples. Learn how these tests work, their procedures, models, algorithms, and applications with examples from NBA and WNBA players' body mass indices and England Premier League matches.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Randomization/Permutation Tests Body Mass Indices Among NBA & WNBA Players Home Field Advantage in England Premier League
Background Goal: Compare 2 (or More) Treatment Effects or Means based on sample measurements Independent Samples: Units in different treatment conditions are independent of one another. In controlled experiments they have been randomized to treatments. Observed data are: Y11, Y1n1and Y21, ,Y2n2 Paired Samples: Units are observed under each condition (treatment), and the subsequent difference has been obtained: dj= Y1j Y2j j=1, ,n Procedure: Working under null hypothesis of no differences in treatment effects, how extreme is observed treatment difference relative to many (in theory all) possible randomizations/permutations of the observed data to the treatment labels.
Independent Samples 2 Treatments = + + = = = Model: 1,2; 1,..., 0 Y i j n E ij i ij i ij where: Overall population Mean Effect of Treatment subject to No Treatment effect (differences in population means) Y = + This observation is its mean + its random error ij H = + = = = + 0 i 1 2 i i i = 0 1 2 ij : All observed data come from same population and labels "random" 0 1 2 = Test Statistic used to compare 2 Treatments (One of many): T Y Y 1 2 in = Y ij Y n = 1 n j = where : i Y i i i Algorithm: o Compute Test Statistic for Observed Data and save o Obtain large number of permutations (N) of observed values to treatment labels o For each permutation, compute the Test Statistic and save o P-value = (# Permuted TS Observed TS)/(N+1)
Example NBA and WNBA Players BMI Groups: Male: NBA(i=1) and Female: WNBA(i=2) Samples: Random Samples of n1 = n2 = 20 from 2013 seasons (2013/2014 for NBA) = = lbs kg = = Males: 24.95 Y 1 703 BMI Females: 23.35 = Y 2 2 inches metres 2 = 24.95 23.35 1.60 = Test Statistic: T Y Y 1 2 Player Giannis Antetokounmpo Joel Anthony Alex Len Erik Murphy Ersan Ilyasova Kevin Garnett Chauncey Billups Juwan Howard Vladimir Radmanovic Tiago Splitter Jarvis Varnado Alexey Shved Jermaine O`Neal Michael Kidd-Gilchrist Metta World Peace Tim Hardaway Jr. Greivis Vasquez Daniel Gibson Terrence Ross Chris Kaman id Gender Height Weight BMI Player id Gender Height Weight BMI 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 81 81 85 82 82 83 75 81 82 83 81 78 83 79 79 78 78 74 79 84 205 245 255 230 235 253 202 250 235 240 230 190 255 232 260 205 211 200 197 265 21.97 Tamika Catchings 26.25 Courtney Clements 24.81 Allie Quigley 24.05 Quanitra Hollingsworth 24.57 Katie Smith 25.82 Tayler Hill 25.25 Allison Hightower 26.79 Kara Braxton 24.57 Eshaya Murphy 24.49 Michelle Campbell 24.64 Briann January 21.95 Jasmine James 26.02 Kelsey Bone 26.13 Jia Perkins 29.29 Ebony Hoffman 23.69 Shavonte Zellous 24.38 Matee Ajavon 25.68 Karima Christmas 22.19 Erika de Souza 26.40 Jayne Appel 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 73 72 70 77 71 70 70 78 71 74 68 69 76 68 74 70 68 72 77 76 167 22.03059 155 21.01948 140 20.08571 203 24.06966 175 24.40488 145 20.80306 139 19.94224 225 25.99852 164 22.87086 183 23.49324 144 21.89273 175 25.84016 200 24.34211 155 23.5651 215 27.60135 155 22.23776 160 24.32526 180 24.40972 190 22.52825 210 25.55921 10 11 12 13 14 15 16 17 18 19 20
Permutation Samples Generate Permutations of the 40 integers using a random number generator (like pulling 1:40 from hat, one-at-a-time without replacement) Assign the first 20 players (based on id) selected to Treatment 1, last 20 to Treatment 2 Compute and save Test Statistic: Continue for many (N total) samples Count number as large or larger than observed Test Statistic (in absolute value, if 2-sided test) P-value obtained as (Count+1)/(N+1) = T Y Y 1 2
Permutation Samples (EXCEL) Group id BMI Ran1 0.12415 0.23551 0.60869 0.41569 0.53313 0.49959 0.08631 0.77255 0.66982 0.09680 0.01077 0.92364 0.66018 0.52730 0.12967 0.19153 0.82900 0.38224 0.45036 0.76531 0.63860 0.88937 0.80044 0.08289 0.87924 0.23389 0.96878 0.25471 0.89297 0.81115 0.45735 0.93426 0.11182 0.06690 0.37605 0.54241 0.69176 0.08209 0.51106 0.25141 sort_ran1 sort_id 0.01077 0.06690 0.08209 0.08289 0.08631 0.09680 0.11182 0.12415 0.12967 0.19153 0.23389 0.23551 0.25141 0.25471 0.37605 0.38224 0.41569 0.45036 0.45735 0.49959 0.51106 0.52730 0.53313 0.54241 0.60869 0.63860 0.66018 0.66982 0.69176 0.76531 0.77255 0.80044 0.81115 0.82900 0.87924 0.88937 0.89297 0.92364 0.93426 0.96878 sort_BMI Group 24.6441 23.5651 24.4097 24.0697 25.2455 24.4912 24.3421 21.9654 29.2870 23.6875 20.8031 26.2513 25.5592 25.9985 27.6014 25.6757 24.0467 22.1905 21.8927 25.8178 22.5283 26.1330 24.5695 22.2378 24.8118 22.0306 26.0219 24.5695 24.3253 26.4024 26.7871 20.0857 23.4932 24.3808 24.4049 21.0195 22.8709 21.9543 25.8402 19.9422 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 3 4 5 6 7 8 9 21.9654 26.2513 24.8118 24.0467 24.5695 25.8178 25.2455 26.7871 24.5695 24.4912 24.6441 21.9543 26.0219 26.1330 29.2870 23.6875 24.3808 25.6757 22.1905 26.4024 22.0306 21.0195 20.0857 24.0697 24.4049 20.8031 19.9422 25.9985 22.8709 23.4932 21.8927 25.8402 24.3421 23.5651 27.6014 22.2378 24.3253 24.4097 22.5283 25.5592 11 34 38 24 7 10 33 1 15 16 26 2 40 28 35 18 4 19 31 6 39 14 5 36 3 21 13 9 37 20 8 23 30 17 25 22 29 12 32 27 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Original Sample (Column C) Group Mean 1 2 Difference Comments: Column 4: (Ran1) has smallest number (.01077) corresponding to id=11. Thus player 11 is first player in group 1 in Permutation sample. Next smallest is .06690 (id=34) The sort columns (5-8) give the first permutation samples for the 2 groups. The difference in BMI for groups 1 and 2 in the original sample is 1.5957 The difference in BMI for groups 1 and 2 in the permutation sample is 0.8568 24.9466 23.3510 1.5957 Permutation Sample (Column G) Group Mean 1 24.5772 2 23.7204 Difference 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 0.8568
R Program ### Download dataset nba.bmi <- read.csv("http://www.stat.ufl.edu/~winner/data/wnba_nba_bmi.csv", header=T) attach(nba.bmi); names(nba.bmi) ### Obtain sample sizes, sample means, and observed Test Statistic (n1 <- length(BMI[Gender==1])); (n2 <- length(BMI[Gender==2])) (ybar1.obs <- mean(BMI[Gender==1])); (ybar2.obs <- mean(BMI[Gender==2])) (TS.obs <- ybar1.obs-ybar2.obs); (n.tot <- n1+n2) ### Choose number of permutations and initialize TS vector to save Test Statistics ### set seed to be able to reproduce permutation samples N <- 9999; TS <- rep(0,N); set.seed(97531) ### Loop through N samples, generating Test Stat each time for (i in 1:N) { perm <- sample(1:n.tot,size=n.tot,replace=F) if (i == 1) print(perm) ybar1 <- mean(BMI[perm[1:n1]]) ### mean BMI of first n1 elements of perm ybar2 <- mean(BMI[perm[(n1+1):(n1+n2)]]) ### mean BMI of next n2 elements of perm TS[i] <- ybar1-ybar2 } ### Count # of cases where abs(TS) >= abs(TS.obs) for 2-sided test and obtain p-value (num.exceed <- sum(abs(TS)>=abs(TS.obs))) (p.val.2sided <- (num.exceed+1)/(N+1)) ### Draw histogram of distribution of TS, with vertical line at TS.obs hist(TS,xlab="Mean1 - Mean2",breaks=seq(-2.5,2.5,0.25), main="Randomization Distribution for BMI") abline(v=TS.obs)
R Output > ### Obtain sample sizes, sample means, and observed Test Statistic > (n1 <- length(BMI[Gender==1])) [1] 20 > (n2 <- length(BMI[Gender==2])) [1] 20 > (ybar1.obs <- mean(BMI[Gender==1])) [1] 24.94665 > (ybar2.obs <- mean(BMI[Gender==2])) [1] 23.35099 > (TS.obs <- ybar1.obs-ybar2.obs) [1] 1.595653 > (n.tot <- n1+n2) [1] 40 ### First permutation of 1:40 [1] 26 31 12 20 4 28 23 13 2 19 9 35 34 5 16 14 29 11 32 24 39 10 7 3 36 [26] 30 21 27 1 38 17 22 15 25 8 18 6 40 33 37 > ### Count # of cases where abs(TS) >= abs(TS.obs) for 2-sided test and obtain p-value > (num.exceed <- sum(abs(TS)>=abs(TS.obs))) [1] 121 > (p.val.2sided <- (num.exceed+1)/(N+1)) [1] 0.0122
Normal t-test (Equal Variances Assumed) ( ) = + + = + = = 2 Model: 1,2; 1,..., ~ 0, Y i j n NID ij i ij i ij i ij n n ( ) 2 i i Y Y Y ( ) i ij ij 2 i 1 n s 2 Y n = = 1 n 1 j j = = = i 2 i 2 n 2 i ~ , ~ i Y Y N s Y s i i i 1 i 2 1 n n i i i i i ( ) ( ) Y Y 1 n 1 n 1 2 ( ) 1 2 + = 2 ~ , ~ 0,1 Y Y N Z N 1 2 1 2 1 n 1 n 1 2 + 2 1 2 ( ) ( ) ( ) ( ) ( ) n ( ) + + 2 1 2 2 2 1 2 2 2 1 + 2 2 1 1 1 1 1 1 n s n s n s n s n s n s + = = 1 2 1 2 1 2 2 n 2 p ~ Pooled Sample Variance: s + 2 n 2 2 2 2 n 1 2 1 2 ( ) ( ) + 2 1 2 2 1 1 n s n s 1 2 ( ) ( ( n ) 2 p + + 2 1 2 2 s ( 1 1 n s n n s 2 Z W + 2 ( ) = = = = = 1 2 ~ Student's W W Z T t ) 2 2 2 ) 2 n n 1 2 1 2 ) ( Y Y 1 2 1 2 1 n 1 n + 2 ( ) ( ( ) ( ) ) Y Y Y Y 2 1 2 1 2 ( ) 1 2 1 2 1 s 2 = = = + ~ Student's 2 T t n n 1 2 2 p s 2 p 1 n 1 n 1 n 1 n + + 2 2 p s 2 1 2 1 2
t-test for NBA vs WNBA BMI = : 0 : 0 H H 0 1 2 1 2 A ( ) 0 Y Y H 1 2 ( ) 0 = + : ~ 2 TS t t n n 1 2 obs 1 n 1 n + 2 p s 1 2 Data (From EXCELSpreadsheet): = = = = = = = 2 1 2 2 20 20 1 3.0919 24.9466 + + 23.3510 3.0919 4.2694 n n Y Y s s 1 2 1 2 ( ) ( ) 20 1 4.2694 0 = 2 p 3.6806 s 20 20 2 ( ) 24.9466 23.3510 = = 2.6301 t obs 1 20 1 20 + 3.6806 ( ) ( ) P value = = :2 38 2.6301 2(.0061) .0122 P t Note: the Permutation and t-tests give the same P-value to 4 decimal places Normal Data
Paired Samples Data Consists of n Pairs of Observations (Y1j,Y2j) j=1, ,n Data are on same subject (individuals matched on external criteria) under 2 conditions (often Before/After) Construct the differences: dj = Y1j - Y2j The true population mean difference is: d = 1 2 Wish to test H0: d = 0 with a 1-sided or 2-sided alternative = + + = + = = = 1,2; 1,..., 0 Y i j n E ij i ij i ij ij ( ) ( ) = = + + + + = + = + = d Y Y 1 2 1 1 2 2 1 2 1 2 1 2 j j j j j j j j j j E d = = = = Under : 0: 0 H d 0 1 2 d H j j j Thus: Under , once a difference is observed, it could have just as easi ly been +/- 0
Procedure Compute an observed Test Statistic that measures the treatment effect in some manner (such as the sample mean of the differences) For many randomization samples: Generate a series of n U(0,1) random variables: U1, ,Un If (say) Uj< 0.5 set dj* = -dj where dj* is difference for case j in this sample, otherwise, set dj* = dj Compute the Test Statistic for this sample and save Compare the observed Test Statistic with the sample Test Statistics in a manner similar to Independent Sample Case: Computing the proportion of sample Test Statistics as extreme or more than the observed Test Statistics
Example: English Premier League Football - 2012 Interested in Determining if there is a home field effect League has 20 teams, all play all 19 opponents Home and Away (190 pairs of teams, each playing once on each team s home field). No overtime. Label teams in alphabetical order: 1=Arsenal, 20=Wigan Let Y1jk = (Hj-Ak) j < k Differential when j at Home, k is Away Let Y2jk = (Aj-Hk) j < k Differential when j is Away, k is at Home djk = Y1jk Y2jk = (Hj+Hk) - (Aj+Ak) j < k Note: d represents combined Home Goals Combined Away Goals for the Pair of teams No home effect should mean d = 0
Representative Games from the Sample Team.j Arsenal Chelsea Fulham Manchester City Newcastle United Norwich City Southampton Sunderland West Ham United Team.k Aston Villa Everton Liverpool Manchester United Queens Park Rangers Wigan Athletic Stoke City West Bromwich Albion Wigan Athletic H.j A.k Y.1jk H.k A.j Y.2jk d.jk Ran1 d.jk*(1) Ran2 d.jk*(2) 2 2 1 2 1 2 1 2 2 1 1 3 3 0 1 1 4 0 1 1 0 1 4 1 1 1 3 2 2 0 2 0 2 2 0 3 1 1 0 1 1 0 2 0.3686 0.6741 0.5002 0.0414 0.8097 0.6642 0.9612 0.1422 0.9499 -1 0 2 2 0 2 0 1 3 0.4514 0.0780 0.1319 0.9600 0.0184 0.4300 0.9095 0.0997 0.5974 -1 0 -2 -2 0 -2 0 1 3 -2 -1 1 1 0 -2 2 -4 1 1 -1 0 -1 -1 -2 0 2 0 -1 3 Average 0.556 1.000 -0.333 Comments (regarding these 9 pairs, and these 2 samples - Full Analysis next slide): For the original sample, the Test Statistic is the Average Difference: 0.556 For the first random sample, games 1,4,8 had Ran1 < 0.5, and their djk switched sign. The new sampled test statistic was 1.000 For the second random sample, games 1,2,3,5,6,8 had Ran2 < 0.5, and their djk switched sign. The new sampled test statistic was -0.333 The p-value for a 1-tailed (HA: d > 0) would be p = (1+1)/(2+1) = 2/3 as both the original sample and Ran1 have Test Statistics 0.556. The 2-sided is also p = 2/3
R Program epl2012 <- read.csv("http://www.stat.ufl.edu/~winner/data/epl_2012_home_perm.csv", header=T) attach(epl2012); names(epl2012) ### Obtain Sample Size and Test Statistic (Average of d.jk) (n <- length(d.jk)) (TS.obs <- mean(d.jk)) ### Choose the number of samples and initialize TS, and set seed N <- 9999; TS <- rep(0,N); set.seed(86420) ### Loop through samples and compute each TS for (i in 1:N) { ds.jk <- d.jk # Initialize d*.jk = d.jk u <- runif(n)-0.5 # Generate n U(-0.5,0.5)'s u.s <- sign(u) # -1 if u.s < 0, +1 if u.s > 0 ds.jk <- u.s * ds.jk TS[i] <- mean(ds.jk) # Compute Test Statistic for this sample } summary(TS) (num.exceed1 <- sum(TS >= TS.obs)) # Count for 1-sided (Upper Tail) P-value (num.exceed2 <- sum(abs(TS) >= abs(TS.obs))) # Count for 2-sided P-value (p.val.1sided <- (num.exceed1 + 1)/(N+1)) # 1-sided p-value (p.val.2sided <- (num.exceed2 + 1)/(N+1)) # 2-sided p-value ### Draw histogram of distribution of TS, with vertical line at TS.obs hist(TS,xlab="Mean Home-Away",main="Randomization Distribution for EPL 2012 Home Field Advantage") abline(v=TS.obs)
R Output > > ### Obtain Sample Size and Test Statistic (Average of d.jk) > (n <- length(d.jk)) [1] 190 > (TS.obs <- mean(d.jk)) [1] 0.6368421 > > summary(TS) Min. 1st Qu. Median Mean 3rd Qu. Max. -0.573700 -0.110500 -0.005263 -0.002513 0.100000 0.542100 > (num.exceed1 <- sum(TS >= TS.obs)) # Count for 1-sided (Upper Tail) P-value [1] 0 > (num.exceed2 <- sum(abs(TS) >= abs(TS.obs))) # Count for 2-sided P-value [1] 0 > (p.val.1sided <- (num.exceed1 + 1)/(N+1)) # 1-sided p-value [1] 1e-04 > (p.val.2sided <- (num.exceed2 + 1)/(N+1)) # 2-sided p-value [1] 1e-04 The observed Mean difference (0.6368) exceeded all 9999 sampled values: (min = -0.5737, max = 0.5421) Thus, both P-values = (0+1)/(9999+1) = .0001
Normal Paired t-test = + + = + = = = = = 2 2 1,2; 1,..., 0 , Y i j n E V COV 1 2 ij i ij i ij ij ij j j ( ) ( 1 V d ) = = + + + + = + = + = d Y Y 1 2 1 1 2 2 1 2 1 2 1 2 j j j j j j j j j j = = 0 0 = = 0 E E E E 1 2 1 2 j j j j j ( ) = = + = = = 2 2 2 2 2 , 2 2 2 1 V V V V COV 1 2 2 1 2 j j j j j j j 2 E d E d = = = = 2 V d 1 2 1 2 j j n Under Normality of (unlikely here): d j ( ) ( ) 2 d 1 d n s ( ) = 1 2 2 n 2 d ~ 0,1 ~ Z N d s 1 2 2 n By same argument as for Indepent Samples t-test: ( ) d 1 2 2 ( ) ( ) d d 2 n ( ) = = = 1 2 1 s n 2 ~ 1 T t n ( ) 2 d s 2 d 2 2 d 1 n s ( ) 1 n n 2
Paired t-test for EPL 2012 Home vs Away Goals = : 0 : 0 H H 0 1 2 1 2 A ( ) d 0 H ( ) 0 = : ~ 1 TS t t n obs 2 d s n Data (From EXCELSpreadsheet): = = = 2 d 190 0.6368 4.3912 n d s ( ) 0.6368 4.3912 190 :2 P t 0 = = 4.1888 t obs ( ) ( ) P value = = 189 4.1888 2(.00002) .00004 Note: the t-test gives smaller P-value, but Permutation test was limited to number of samples