Experimentation and Results in Statistics: Lessons from Past Studies
The content discusses historical experiments in statistics, focusing on the mistakes and outcomes of studies like the Lanarkshire Milk Study. It showcases how experiments can go astray due to factors like inadequate sampling, measurement biases, and human intervention. The narrative emphasizes the importance of proper experimental design and draws attention to key results, such as the negligible difference in weight gain between children consuming raw and pasteurized milk. It serves as a cautionary tale on the pitfalls of flawed experimentation.
Uploaded on Sep 28, 2024 | 0 Views
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Statistics and Experimentation David Salsburg AP Statistics Reading Daytona Beach, Florida June 16, 2011
Harvey was wrong William Harvey, circulation of the blood, 1628 Bishop of Chichester: o Harvey was wrong because he used experimentation and, It is well known that Nature abhors experimentation and will purposely do things wrong if you attempt to experiment.
An experiment that went wrong The Lanarkshire Milk Study (1929) o Question: Does Pasteurization take the good out of the milk? How do you measure the good in milk? o Measure weight gain in children as a surrogate. Yule: In our lust for measurement, we frequently measure that which we can, rather than that which we wish to measure, and forget that there is a difference. o Measures of intelligence
An experiment that went wrong If children were to be used, which children? o in school Where? o Easily available in London or Manchester, but too heterogeneous a population, too much variability in socioeconomic factors. o Lanarkshire County, Scotland, population 300,000,evenly divided into small factory towns and rural communities. How many children? o Neyman-Pearson concept of power not yet published.
Final Design 20,000 children, 200-400 per school, several grades o 5,000 randomly assigned an extra daily ration of raw milk o 5,000 randomly assigned an extra daily ration of Pasteurized milk o 10,000 randomly assigned to no extra milk controls Study ran from Feb-June, 1930, the children weighed at the beginning of the study and at the end.
Results Average weight gain for children on raw milk almost exactly the same as average weight gain for children on Pasteurized milk. Average weight gain for children kept as controls (no extra milk) three times the average weight gain of the two other groups. No loss of good in milk (as measured by weight gain) when pasteurized Best not to give children any extra milk, raw or pasteurized! 1) 2)
An experiment that went wrong Royal Commission sent to investigate o William Sealy Gossett ( Student ) chairman http://www.jstor.org/literatum/publisher/jstor/journals/covergifs/biometrika/cover.gif Conclusion: The teachers had been told to randomly assign but many of them took pity on the sickly and poor students and assigned them the extra milk.
HOW DO YOU RANDOMIZE? Can you do it with haphazard choice by humans? o Problem of digit preference Can you let nature do it? o Toxicological studies of mice.
How did R.A. Fisher randomize? Last two digits of populations of English towns in the 1921 census. o A table of 7500 two digit numbers arranged in blocks of 25. First block of 25: 03 47 43 73 86 36 96 47 36 61 46 98 63 71 62 33 26 16 80 45 60 11 14 10 95
Rand Corporation book of 1 million random digits Martin Gardiner (Scientific American): This is the quintessential book of the Twentieth Century. Not only was no book produced like this in previous centuries, no one would have ever conceived of a book like this in previous centuries.
How do you use a table of random numbers? You do not start at the beginning. Otherwise, all randomizations would be the same. You do not begin haphazardly (at random?). Books tend to have broken binding so haphazard openings often are at the same page. 1) 2)
RAND book preface You open the book haphazardly and pick a point to start haphazardly. You pick out three digits, two digits, two more digits, and one digit. You go to the page indicated by the three digits, the line indicated by the first of the two digits, the column indicated by the second of the two digits. Then you proceed up and to the left (at the top of the page) if the final single digit is odd or down and to the right if it is even. 1. 2.
Applying this method to Fishers 6-page table I open it haphazardly (to page 2) and pick a spot haphazardly, yielding the following sequence 2, 12, 23, 6 I go to page 2, line 12, column 23, and go left and up from there. This yields the sequence: 67, 96, 57, 88, 30, 22, 23, 51, 14, 40, 24, 96,
Comparing Three Treatments Suppose I have three treatments, A, B, and C, to be applied to blocks of three A, B, C / A, B, C / A, I append the sequence of numbers to this sequence of symbols A-67, B-96, C-57/ A-88,B-30,C-22/ A-23, B-51, C-14/ I reorder the symbols A, B, C within each block following the order of the random numbers CAB/CBA/CAB/BAC
Modern Methods Use computer algorithm to generate a pseudo-random sequence. Most popular method, congruence generator: X(i+1) = res( AX(i) + B | C) o A,B,C are mutually prime. o The congruence generator cycles after K values, but K is a function of X(1), A, B, and C and can be calculated.
Philosophical question Can a pseudo-random number generator produce truly random numbers? Fisher: Foolish question. All that is needed is that all possible treatment assignments be equally probable.
Can we do better than random? NAACP and jury lists in Texas counties (1960s) Knut-Vik designs o Student (1932) showed that Knut-Vik designs produce biased (downwards) estimates of the residual variance. o Fisher (1935) random assignment produces the least variance of all unbiased designs.
A study that did work Women s Health Initiative Study of aspirin vrs placebo to prevent heart attacks or cardiovascular death in women. (March, 2005, New England J. of Medicine) Question: Does low dose aspirin prevent cardiovascular problems for women as it does for men? All but one of prior studies had used only men. Consistent finding: 81 mg aspirin a day reduces the incidence of non-fatal heart attacks by app. 30% and the incidence of cardiovascular related death by app. 20%. One study that did use women as well as men enrolled 214 women, reduced incidence of cardiovascular related death by 9% (not statistically significant).
New Study Large number of women (39,876) because incidence of cardiovascular events lower in women than in men. Higher daily dose of aspirin (100 mg) Longer follow-up (10 years vrs 5 in men s studies) Single predefined end-point: Stroke, MI, or cardiovascular related death. Problems with the end-point: 1. Equivocal symptoms when patients arrive in emergency rooms 2. Death certificates unreliable 3. What happens if a patient has multiple events over the 10 year period? Solution: Set up elaborate check-list to define the events of interest. Choose only the first such event in a patient s record to count. 1) 2) 3) 4)
Results 477 women on aspirin had a cardiovascular event 522 women on placebo had a cardiovascular event p-value of the comparison 0.13
Confidence Bounds Neyman s original definition (1934) o On the two different aspects of the representative method, J. Royal Statistical Society, vol. 97, pp. 558-625. The paper establishes the fundamental ideas of survey sampling. It was used by the statisticians in the U.S. Bureau of Labor Statistics to establish the first surveys of unemployment. An appendix establishes the fundamental ideas of confidence intervals. A Confidence interval on a parameter is a set of hypotheses about the value of that cannot be rejected by the data. 1. 2.
Other definitions of a confidence interval Bayesian o The expected coverage of the computed confidence interval is 0.95 regardless of the prior distribution on . Frequentist (derived by Neyman to meet Harold Hottelling s criticism of the Bayesian definition) o 95% of all confidence intervals computed this way will contain the true value of . o Anscombe: What has the statistician s long run probability of error to do with whether this patient should be given this treatment? http://upload.wikimedia.org/wikipedia/en/d/d5/Francis_Anscombe.jpeg
Womens Health Initiative study They computed 95% confidence bounds on the ratio Prob{event|aspirin}/Prob{event|placebo} 95% C.I. = [0.80, 1.03] Interpretation: Use of low dose aspirin in women might reduce incidence of cardiovascular events by as much as 20% (or increase it by as much as 3%)
Can the study be repeated with more subjects and greater statistical power? Modern clinical studies cost more than $10,000 per patient. 100,000 subject study would cost > $1 billion.
Conclusion? L.J. Cohen, philosopher at Oxford University o Critic of the use of statistical models in science. o One can never come to a certain conclusion with statistical models alone. o To reach a scientific conclusion, it is necessary to bring in information external to the experimental study. o (Cohen s solution is to replace hypothesis testing with modal valued logic, a system of symbolic logic that denies the law of the excluded middle.)
Ignoring Cohens solution: What information exists from outside this trial? The pharmacological mechanism of low dose aspirin is firmly established and is not gender related in experimental animals. The cost of a false positive is small. Aspirin is cheap. Low doses of aspirin are very safe for most people. The cost of a false negative, if the use of low dose aspirin decreases CV events by 20%, is immense. Conclusion: Women should be given daily low doses of aspirin to prevent cardiovascular events. 1. 2. 3.
Was it worth doing the study? Side note: All the male studies and this women s study of low dose aspirin have shown a consistent 8-fold increase in the incidence of hemorrhagic stroke for patients on aspirin the comparison sometimes reaching statistical significance.