Understanding Sample Size Parameters in Pet Product Studies
This set of additional slides delves into the main parameters used in simulations for determining sample sizes in pet product studies. It covers the methods of simulations, statistical support, new simulations, resulting sample sizes, variations between animals and days, and the rationale behind specific parameter values like SD_Sub and SD_Day. The importance of accurate estimation of variations and the impact of different arthropod proportions are also discussed.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Additional Slides for Sample Size of Pet Product Studies 1
Outline Main parameters used in the simulations A brief about the methods of simulations in the main document Supplemental statistical support from US EPA contract statistician (ICF) New simulations and resulting sample sizes 2
Parameters used in the simulations The true blood-fed arthropods were calculated based on the required minimum retention rate or the proportion blood-fed arthropods The required minimum retention rates or proportions of blood fed arthropods in each of untreated animals were provided by the subject matter experts The number of arthropods per animal on each testing day were provided by the subject matter experts: 50 arthropods/animal 100 arthropods/animal Higher number of arthropods/animal smaller sample size. For example, in tick study design, a sample size of 14 animals is proposed if 50 ticks/animal. However, a sample size of 10 animals is proposed if 100 ticks/animal The parameters (SD_Sub, SD_Day, SubVar, DayVar) that characterize the variation of animals and variation of days in the simulations were proposed by statisticians SD_Sub: parameter of animal variation when random effects are generated from normal distributions SD_Day: parameter of overdispersion in the data when random effects are generated from normal distributions SubVar: parameter of animal variation when random effects are generated from Weibull distributions DayVar: parameter of overdispersion in the data when random effects are generated from Weibull distributions 3
Variations between animals and days Sources of variation in data Random chance of the binomial distribution (different values even same testing condition on same animal) Variation between animals Variation between days Day variation has been accounted for by the random chance of the binomial distribution. Here, we address the variation that cannot be accounted for by the random chance of binomial distributions (i.e. overdispersion) We want to estimate the variation of animals or days (i.e. the variation in data that is not accounted for by random chance of binomial distribution) Reliable estimates of variations between animals or days, require historical datasets with sufficient sample size Limited information was obtained from historical data Few studies: different animals, different arthropod species Small sample sizes 4
Why SD_Sub = 0.33 and SD_Day = 0.166 Assume that different animals have different proportions of blood-fed arthropods. There is a reasonable variation in the proportions of blood-fed arthropods among animals associated with SD_Sub = 0.33. For example, if the true proportion of blood-fed arthropods of the population is 0.45, the 95% coverage of the proportions of the blood-fed arthropods of the animals will range from 0.297 to 0.613. Assume that an animal has different proportions of blood-fed arthropods on different days. There is a reasonable variation in the proportions of blood-fed arthropods among days associated with SD_Day = 0.166. For example, if the true proportion of blood-fed arthropods of an animal = 0.45, the 95% coverage for the proportions of the blood-fed arthropods of this animals ranges from 0.370 to 0.533. Note that this variation indicates the degree of overdispersion in the binomial data. The variations of proportions of blood-fed arthropods associated with SD_Sub = 0.33 and SD_Day = 0.166 in the simulations when random effects are randomly generated from normal distributions True Additional Source variation of variations 95% coverage of true proportions of the animals proportion of population 0.45 0.6 0.75 0.8 0.45 0.6 0.75 0.8 logit SD logit - 2SD logit + 2SD -0.2007 0.4055 1.0986 1.3863 -0.2007 0.4055 1.0986 1.3863 0.33 0.33 0.33 0.33 0.166 0.166 0.166 0.166 -0.8607 -0.2545 0.4386 0.7263 -0.5327 0.0735 0.7666 1.0543 0.4593 1.0655 1.7586 2.0463 0.1313 0.7375 1.4306 1.7183 0.297 0.437 0.608 0.674 0.370 0.518 0.683 0.742 0.613 0.744 0.853 0.886 0.533 0.676 0.807 0.848 Animals Days 5
Simulations in the main document Random animal effects and random day effects in the data (expressed by the variation of logit values) were generated from either normal distributions or Weibull distributions. The number of blood-fed arthropods from a given number of arthropods infested on each animal was randomly generated from binomial distributions Generalized Linear Mixed Model (GLMM) for Poisson distributions was selected for use to conduct the simulations due to the high proportion of non-converged datasets of the GLMM with log-link for binomial distributions. Estimates of efficacy (95% CI) were obtained for each dataset. Proportion datasets with estimated efficacy 0.90 and distance from lower bound of 95% CI to the estimated efficacy 0.05 is the power of the study design. 6
Supplemental statistical support from US EPA contract statistician EPA contractor statistician from ICF, provided mathematical derivation of using estimates from GLMM with logit link function for binomial distributions to calculate the efficacy and its 95% CI Contractor statistician modified EPA SAS code (to include GLMM for binomial distributions with logit link function)and conducted some simulations using different computational methods in GLMM models (methods = RSPL, MSPL, RMPL, MMPL) to explore the options/models to increase the number of converged datasets: Similar proportions of non-converged datasets for all the explored methods 7
New Simulations from Supplemental Document EPA conducted simulations (of 1000 random datasets per each assumed type random effect distributions) for 5 different models: GLMM for Poisson distributions GLMM with log link function for binomial distributions GLMM with logit link function for binomial distributions (ICF) GLMM with NOBOUND for Poisson distributions GLMM with NOBOUND and logit link function for binomial distributions Criteria to evaluate models: Proportion of converged datasets Proportion of datasets with positive-definite G matrix Power to correctly accept a good product (i.e. true efficacy 0.90) 8
Results of New Simulations from Supplemental Document GLMM for Poisson distributions has highest proportion (~80%) of converged datasets and highest power GLMM with logit link for binomial distributions has highest proportion of positive definite G matrix datasets, although all models have low proportions of positive definite G matrix datasets GLMM with log link for binomial distributions, GLMM with NOBOUND option for Poisson distribution, GLMM with NOBOUND option and logit link for binomial distributions have low proportion of converged datasets and low power New simulations (2000 datasets per assumed type of random effect distributions) to compare the power and required sample sizes of GLMM for Poisson distributions and GLMM with logit link for binomial distributions 9
Results of New Simulations from Supplemental Document ICF model 10
Conclusion from New Simulations in Supplemental Document Power/sample size of GLMM for Poisson distributions is robust to types of random effects distributions The model has similar power/sample size when random effects generated from either normal distributions or Weibull distributions. Power/sample size of GLMM with logit link function for binomial distributions is not robust to the types of random effects distributions. GLMM in SAS PROC GLIMMIX assumes random effects are normally distributed Power/sample size of GLMM for binomial distributions is affected by the assumed distribution (normal or Weibull) for random effects Results of power/sample sizes of GLMM with logit link function for binomial distributions when random effects (in logit values) generated from normal distributions are similar to that of the GLMM for Poisson distributions. These are the sample sizes recommended for use. 11