Statistical Analysis of Sugarcane Juice Weight Under 11 Pest Conditions
This study investigates the impact of various pest conditions on the weight of sugarcane juice through a 1-way ANOVA analysis. Experimental units consisted of grouped canes with different treatments including healthy control and various infestations. The analysis includes model diagnostics, population variance estimation, sample variance calculation, and estimated weighted least squares. Basic calculations and F-tests are conducted to assess the treatment effects.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Model Diagnostics and Remedies 1-Way ANOVA Juice Weight of Sugarcane Under 11 Pest Conditions P.C. Mahalanobis and S.S. Bose (1934). "A Statistical Note on the Effect of Pests on the Yield of Sugarcane and the Quality of Cane-Juice," Sankhya: The Indian Journal of Statistics, Vol.1, #4, pp.399-406. J. Volaufova (2009). Heteroscedastic ANOVA: Old p Values, New Views, Statistical Papers, Vol. 50, pp. 943-962.
Data Description Experimental units=25 grouped canes, 6 reps per treatment Responses: Weight of Canes, Weight of Juice (lbs) Treatments: 1=Healthy (Control) 2=Top-shoot Borer 3=Stem Borer 4=Top-shoot & Stem Borer 5=Root Borer 6=Termites 7=Stem & Root Borer 8=Root Borer & Termites 9=Top-Shoot & Root Borer 10=Top-Shoot & Stem & Root Borer 11=Top-Shoot Borer & Termites
Introduction Population Variance: Measure of average squared deviation of individual measurements around the mean N 2 ( ) Y i ( ) ( ) f y dy 2 = = = = 2 2 = ( ) [( ) ] 1 i V Y E Y y N Sample Variance: Measure of average squared deviation of a sample of measurements around their sample mean. Unbiased estimator of 2 ( 2 1 i S n ) n 2 Y Y i = = 1
Estimated Weighted Least Squares 0 Y 1 i ( ) Y Y = + = = = = = 2 i 2 i Y 1 I 1,..., ; 1,..., ~ 0, independent Y i g j n N E V ij i ij i ij i i i n i n i i Y in i 2 1 I 0 0 1 0 1 n n I n n n n n n n 1 1 1 2 1 g 1 1 1 1 Y 0 1 0 1 2 2 1 0 0 Y n n n n 2 n n n n n = = = = = = = Y V X V X 2 2 2 2 2 1 2 2 g Y 0 0 Y g 0 0 1 1 2 g 0 0 I g n n n n n n n n n g g g g 1 2 g g g ( ) ( ) 1 1 1/2 Y 1/2 Y 1/2 Y = = = = = * * * 0 * * * * * * 0 * 0 * 0 * 0 * 0 Y V Y X V X X V X P X X X X P X X X X Define: ' ' ' ' 0 W SS g SS N Trt 1 H ( ) ( ) 0 = = = = = = * * * 0 * * * * W W Y P P Y Y I P Y To test : ... : ' ' ~ H SS SS F F N g 0 1 Tr t Err 1, g W g W Err g 2 1 I 0 0 s EW SS g SS N n n n n n 1 1 I 2 1 g Trt 2 2 0 0 s 1 ^ n n n n n = = V V Estimated Weighted Least Squares: Replace with F 2 1 2 2 g Y Y EW EW Err g 2 g 0 0 I s n n n n n 1 2 g g g
Basic Calculations F-Test and WLS F-Stat trt n.trt ybar.trt 6 42.91667 0.641667 6 28.88333 1.805667 8.230384 9.028334 3.322872 2772.095 95.97562 6 29.73333 4.450667 24.51172 22.25333 1.348113 1191.827 40.08388 6 17.35 6.787 644.2413 33.935 0.884043 266.1168 15.33815 6 27.56667 13.55067 0.126936 67.75333 0.442783 6 30.25 2.975 38.64498 14.875 2.016807 1845.504 6 26.81667 2.305667 4.810994 11.52833 2.602284 6 29.48333 8.145667 18.82309 40.72833 0.736588 640.2914 21.71707 6 22.08333 11.70167 190.0997 58.50833 0.512747 250.0533 11.32317 6 25.71667 15.44567 23.89095 77.22833 0.388458 256.9059 9.989858 6 24.03333 20.10667 81.20102 100.5333 0.298408 66 2421.651 439.58 21.90375 26825.43 745.8972 27.71212 7.992364 s2.trt SS_TRT 1387.07 3.208334 9.350649 SS_ERR w=n/s2 w*ybar^2 w*ybar 17222.4 401.2987 1 2 3 4 5 6 7 8 9 336.48 12.20604 61.0084 69.7846 1871.39 10 11 172.361 7.17175 Sum Mean ANOVA Source Trts Error Total g=11 Trts df SS MS F F(.95) P(>F) 10 2421.6510 242.1651 55 439.5800 65 2861.2310 30.2996 2.0078 0.0000 7.9924 F_EWLS (1/(11-1))*(H13-I13^2/G13) 142.5093
Tests Among g 2 Population Variances Hartley s Fmax Test Must have equal sample sizes (n1= = ng) Test based on assumption of normally distributed data Uses special table for critical values Levene s Test No assumptions regarding sample sizes/distributions Uses F-distribution for the test Bartlett s Test Can be used in general situations with grouped data Test based on assumption of normally distributed data Uses Chi-square distribution for the test
Hartleys Fmax Test H0: 12= = g2 (homogeneous variances) Ha: Population Variances are not all equal Data: smax2 is largest sample variance, smin2 is smallest Test Statistic: Fmax = smax2/smin2 Rejection Region: Fmax F* (Values from class website, indexed by (.05, .01), g (number of populations) and df2 (n-1, where n is the individual sample sizes) Pest Data: Fmax = smax2/smin2 = 20.10667/0.641667=31.34 F*(.05,11,6-1=5) = 28.2 Reject H0
Levenes Test H0: 12= = g2 (homogeneous variances) Ha: Population Variances are not all equal Data: For each group, obtain the following quantities: th the measurement from group ij y j = = ( 1,..., 1,..., ) i i g j n i ~ y = sample median for group ( 1,..., ) i i g i ~ y = = = ( 1,..., 1,..., ) z y i g j n ij ij i i n n g i i z z ij ij = = = 1 n 1 1 j i j = = = + + ... z z N n n . .. i 1 g N i g ( ( ) ) 2 ( ) 1 n z z g . .. i i = = Test Statistic: 1 i L n g 2 i ( ) z z N g . i ij = = 1 1 i L j Rejection Region: F N g , 1, g
Levenes Test Pest data pest <- read.table("http://users.stat.ufl.edu/~winner/data/pests_cane.dat", header=F,col.names=c("pest.trt","wt.cane","wt.juice")); attach(pest) g <- length(unique(pest.trt)) N <- length(pest.trt) mean.trt <- as.vector(tapply(wt.juice, pest.trt, mean)) n.trt <- as.vector(tapply(wt.juice, pest.trt, length)) median.trt <- as.vector(tapply(wt.juice, pest.trt, median)) median.trt.long <- rep(median.trt, n.trt) z <- abs(wt.juice - median.trt.long) mean.trt.z <- as.vector(tapply(z, pest.trt, mean)) mean.trt.z.long <- rep(mean.trt.z, n.trt) mean.all.z <- mean(z) SS.Levene.1 <- sum(n.trt * (mean.trt.z - mean.all.z)^2) SS.Levene.2 <- sum((z-mean.trt.z.long)^2) F.Levene <- (SS.Levene.1 / (g-1)) / (SS.Levene.2 / (N-g)) p.F.Levene <- 1 - pf(F.Levene, g-1, N-g) > Levene.out <- cbind(SS.Levene.1, SS.Levene.2, g-1, N-g, F.Levene, qf(.95,g- 1,N-g), p.F.Levene) > colnames(Levene.out) <- c("SSB", "SSW", "dfB", "dfW", "F", "F(.95)","P(>F)") > round(Levene.out, 4) SSB SSW dfB dfW F F(.95) P(>F) [1,] 44.5115 212.2083 10 55 1.1536 2.0078 0.3414
Bartletts Test General Test that can be used in many settings with groups H0: 12= = g2 (homogeneous variances) Ha: Population Variances are not all equal MSE Pooled Variance g 1 C ) ( ) 1 ln ( ) ( ln ) ( = 2 B 2 i : TS X N g MS n s ERR i = 1 i g 1 1 1 = + 1 C ( ) 3 1 1 g n N g = 1 i i ( ) 2 B 2 2 g 2 B : P-value: RR X P X ; 1 1 g
Bartletts Test Pest Data ## Bartlett's Test g <- length(unique(pest.trt)) N <- length(pest.trt) pest.trt <- factor(pest.trt) sd.trt <- as.vector(tapply(wt.juice, pest.trt, sd)) n.trt <- as.vector(tapply(wt.juice, pest.trt, length)) MS_ERR <- sum((n.trt-1) * sd.trt^2) / (N-g) C.B <- 1 + (1/(3*(g-1))) * (sum(1/(n.trt-1)) - 1/(N-g)) X2.B <- (1/C.B) * ((N-g)*log(MS_ERR) - sum((n.trt-1) * log(sd.trt^2))) p.X2.B <- 1 - pchisq(X2.B, g-1) bart.out <- cbind(X2.B, g-1, qchisq(.95,g-1), p.X2.B) colnames(bart.out) <- c("Bartlett X2", "df", "X2(.95)", "P(>X2)") round(bart.out, 4) > round(bart.out, 4) Bartlett X2 df X2(.95) P(>X2) [1,] 20.8832 10 18.307 0.0219
CRD with Non-Normal Data Kruskal-Wallis Test Extension of Wilcoxon Rank-Sum Test to g > 2 Groups Procedure: Rank the observations across groups from smallest (1) to largest (N = n1+...+ng), adjusting for ties Compute the rank sums for each group: T1,...,Tg . Note that T1+...+Tg = N(N+1)/2
Kruskal-Wallis Test H0: The g population distributions are identical HA: Not all g distributions are identical 12 . .: ( N N 2 T n g = + 3( 1) i T S H N + = 1) 1 i i 2 . .: R R H , 1 g 2 : ( ) P val P H An adjustment to H is suggested when there are many ties in the data. H H t t t = th ' number of observations in group of tied ranks j ( 3 ) j 3 j j j 1 N N
Kruskal-Wallis Test Pest Data ## Kruskal-Wallis Test g <- length(unique(pest.trt)) N <- length(pest.trt) pest.trt <- factor(pest.trt) n.trt <- as.vector(tapply(wt.juice, pest.trt, length)) rank.wj <- rank(wt.juice) ranksum.trt <- as.vector(tapply(rank.wj, pest.trt, sum)) X2.KW <- (12/(N*(N+1))) * sum(ranksum.trt^2/n.trt) - 3*(N+1) p.X2.KW <- 1 - pchisq(X2.KW, g-1) kw.out <- cbind(X2.KW, g-1, qchisq(.95,g-1), p.X2.KW) colnames(kw.out) <- c("Kruskal-Wallis X2", "df", "X2(.95)", "P(>X2)") round(kw.out, 4) > round(kw.out, 4) Kruskal-Wallis X2 df X2(.95) P(>X2) [1,] 46.8797 10 18.307 0
Welchs Test Unequal Variances g n s = = i w w w i i 2 i = 1 i 2 g w y i i g 1 2 = 1 i = (same as last slide) F w y EW i i 1 g w = 1 i ( g ) 1 2 2 2 1 g g 1 w w = = + 1 1 i C m C W W W 2 1 n = 1 i i 1 3 approx ~ = = C F m F F 1, W W W W EW g 2 1 g W
Kenward-Roger Method 2 g w y 2 i i g g g 1 1 n s w w 2 = 1 i = = = = 2 1 i i w w w F w y a i i EW i i 2 i 1 1 g w n = = = 1 1 1 i i i i ( ) )( ( 2 ) + + + + 2 2 7 + 2 7 2 + 4 g g + g 21 g = = = c c c ( )( ( ( )( ) ) ) 1 2 3 + + 2 2 2 3 2 5 1 2 3 2 5 1 2 3 2 5 1 g g g g g g g g 1 + 1 2 c a V E a = = = 1 E V 1 KR ( ) ( ) ( ) KR KR 2 2 1 1 g g 1 1 2 c a c a 2 3 KR + 1 g approx ~ = + = = 4 m F m F F KR ( ) ( ) 1, KR KR KR KR EW g 1 1 2 g E KR KR KR
Satterthwaite Method 1 1 1 0 0 1 0 0 ( ) = = L L 1 I 1) Determine a full row rank 1 Contrast matrix ' e.g.: ' g g 1 1 1 g g 1 0 0 1 y 1 1 2 i 1 s n ^ ^ = = = = * C Y Y L L CL L Y 2) Compute: diag ' ' ' F F EW 1 g i y g L ^ ^ ( ) ( 1 ) = Y L CL 3) ' ' 1 with full rank. V g g 1 ^ ^ ^ = = 1 L CL P P 4) Spectral Decomposition: ' ' where eigenvectors and diagonal matrix of eigenvalues of ' ' ' P L CL L CL P P 2 1 1 P L Y g g ( ) ' ' 1 1 1 1 2 = = = = = = * 1 PP P P I Y LP P L Y P L Y P P P ' ' ' ' ' ' ' i F 1 1 i g 1 1i 1 g g g = = 1 1 i i i ^ ^ = = = = th P L Y P L CLP P P P P 5) ' ' ' ' ' ' where ' ' ' where is the column of V i P P P P P e e I 1 1 1 i i i i i i i g i i g P L Y ' ' ^ = = P L Y e e ' ' ' ~ Recall that spectral decomposition taken on the Estimated variance-covariance matrix i V t i i i i i i ^ ^ ^ ^ 2 2 i 2 i 2 2 ^ i i i i 2 Consider: ~ 1 2 i i i i E V V i i i i i 2 i ^ i V i i i i i
Satterthwaite Method ( ) ( ) ( ( ) ) 2 i 2 i 4 i 1 1 2 1 1 n S n S n 4 i 2 n S V S V S ^ ( ) = = = i i i 2 n 2 i 2 i 6) ~ 2 1 V n 1 i 2 2 i 2 i 1 n i i i 2 ^ P L CLP ' ' g i i V S ^ ^ ^ ^ ^ ^ ^ = 2 j P L CLP P L CLP approx approx ' ' which by delta method: approx ' ' V V V i i i i i 2 j S = 1 j 2 ^ C ^ C 4 j S 2 i g 1 n S n = = = P L LP e e 2 ' ' where diag ' i i j j 2 j 2 j 2 j 1 S n S S = 1 j j i j 2 2 ^ C ( ) 2 4 j 4 j 4 j P L e S S S ' ' n g g g 1 n = = i i P L LP P L e 2 ' ' 2 ' ' 2 i i i i 2 j 2 j 1 1 1 S n n n = = = 1 1 1 j j j j j j j 2 ^ 2 i 2 2 i 7) ( ) i 2 ^ 4 j P L e S ' ' n g V i i 2 i 2 j 1 n = 1 j j 2 2 P L Y P L Y 1 g ' ' ' ' 1 E t E F = = * * 2 8) ~ 2 i S i i F F E 1, g S 1 2 2 g S i = 1 i S i i i 1 g 2 i 1 g 2 1 E F = 1 i * where if any 2, they are replaced with 0 in computation i S i S i 1 g 1 2 2 g ( ) = 1 i 1 g i i S 2 = 1 i i