Experimentation and Results in Statistics: Lessons from Past Studies

undefined
David Salsburg
AP Statistics Reading
Daytona Beach, Florida
June 16, 2011
 
William Harvey, circulation of the blood, 1628
 
 
 
 
 
Bishop of Chichester:
o
Harvey was wrong because he used experimentation and, “It is well
known that Nature abhors experimentation and will purposely do
things wrong if you attempt to experiment.”
 
The Lanarkshire Milk Study (1929)
o
 Question: Does Pasteurization take the “good” out of the milk?
 How do you measure the “good” in milk?
o
 Measure weight gain in children as a surrogate.
Yule: “In our lust for measurement, we frequently measure
that which we can, rather than that which we wish to
measure, and forget that there is a difference.”
o
Measures of “intelligence”
 
 
 
If children were to be used, which children?
o
in school
Where?
o
Easily available in London or Manchester, but too heterogeneous a
population, too much variability in socioeconomic factors.
o
Lanarkshire County, Scotland, population 300,000,evenly divided
into small factory towns and rural communities.
How many children?
o
Neyman-Pearson concept of power not yet published.
 
20,000 children, 200-400 per school, several grades
o
5,000 randomly assigned an extra daily ration of raw milk
o
5,000 randomly assigned an extra daily ration of Pasteurized milk
o
10,000 randomly assigned to no extra milk—controls
Study ran from Feb-June, 1930, the children weighed at the
beginning of the study and at the end.
 
1)
Average weight gain for children on raw milk almost
exactly the same as average weight gain for children on
Pasteurized milk.
2)
Average weight gain for children kept as controls (no extra
milk) three times the average weight gain of the two other
groups.
No loss of “good” in milk (as measured by weight gain)
when pasteurized
Best not to give children any extra milk, raw or pasteurized!
 
Royal Commission sent to investigate
o
William Sealy Gossett (“Student”) chairman
 
 
 
 
Conclusion:   The teachers had been told to “randomly
assign” but many of them took pity on the sickly and poor
students and assigned them the extra milk.
 
 
Can you do it with haphazard choice by humans?
o
Problem of digit preference
 
 
 
 
Can you let “nature” do it?
o
Toxicological studies of mice.
 
 
 
 
 
 
Last two digits of populations of English towns in the 1921
census.
o
A table of 7500 two digit numbers arranged in blocks of 25.  First
block of 25:
 
03 47 43 73 86
 
36 96 47 36 61
 
46 98 63 71 62
 
33 26 16 80 45
 
60 11 14 10 95
Martin Gardiner 
(Scientific American
)
:
  “This is the
quintessential book of the Twentieth Century.  Not only was
no book produced like this in previous centuries, no one
would have ever conceived of a book like this in previous
centuries.”
1)
You do not start at the beginning.  Otherwise, all
randomizations would be the same.
2)
You do not begin haphazardly (at random?).  Books tend to
have broken binding so haphazard openings often are at
the same page.
1.
You open the book haphazardly and pick a point to start
haphazardly.
2.
You pick out three digits, two digits, two more digits, and
one digit.
You go to the page indicated by the three digits, the line
indicated by the first of the two digits, the column indicated
by the second of the two digits.  Then you proceed up and to
the left (at the top of the page) if the final single digit is
odd—or down and to the right if it is even.
I open it haphazardly (to page 2) and pick a spot
haphazardly, yielding the following sequence
  
2, 12, 23, 6
I go to page 2, line 12, column 23, and go left and up from
there.
 This yields the sequence:
 
67, 96, 57, 88, 30, 22, 23, 51, 14, 40, 24, 96,…
Suppose I have three treatments, A, B, and C, to be applied
to blocks of three
 
A, B, C / A, B, C / A,…
I append the sequence of numbers to this sequence of
symbols
 
A-67, B-96, C-57/ A-88,B-30,C-22/ A-23, B-51, C-14/…
I reorder the symbols A, B, C within each block following the
order of the random numbers
 
CAB/CBA/CAB/BAC…
Use computer algorithm to generate a pseudo-random
sequence.
Most popular method, congruence generator:
 
X(i+1) = res( AX(i) + B | C)
o
A,B,C are mutually prime.
o
The congruence generator cycles after K values, but K is a function
of X(1), A, B, and C and can be calculated.
 
Can a pseudo-random number generator produce truly
“random” numbers?
 
 
Fisher:  Foolish question.  All that is needed is that all
possible treatment assignments be equally probable.
 
 
NAACP and jury lists in Texas counties (1960s)
 
 
 
Knut-Vik designs
o
“Student” (1932) showed that Knut-Vik designs produce biased
(downwards) estimates of the residual variance.
o
Fisher (1935) random assignment produces the least variance of all
unbiased designs.
 
Women’s Health Initiative Study of aspirin vrs placebo to
prevent heart attacks or cardiovascular death in women.
(March, 2005, 
New England J. of Medicine
)
Question:  Does low dose aspirin prevent cardiovascular
problems for women as it does for men?
All but one of prior studies had used only men.
Consistent finding:  81 mg aspirin a day reduces the
incidence of non-fatal heart attacks by app. 30% and the
incidence of cardiovascular related death by app. 20%.
One study that did use women as well as men enrolled 214
women, reduced incidence of cardiovascular related death
by 9% (not statistically significant).
1)
Large number of women (39,876) because incidence of
cardiovascular events lower in women than in men.
2)
Higher daily dose of aspirin (100 mg)
3)
Longer follow-up (10 years vrs 5 in men’s studies)
4)
Single predefined end-point:
 
Stroke, MI, or cardiovascular related death.
Problems with the end-point:
1.
Equivocal symptoms when patients arrive in emergency rooms
2.
Death certificates unreliable
3.
What happens if a patient has multiple events over the 10 year period?
Solution:  Set up elaborate check-list to “define” the events of
interest.  Choose only the first such event in a patient’s record to
count.
477 women on aspirin had a cardiovascular event
522 women on placebo had a cardiovascular event
p
-value of the comparison—0.13
 
Neyman’s original definition (1934)
o
“On the two different aspects of the representative method,”  
J.
Royal Statistical Society
, vol. 97, pp. 558-625.
1.
The paper establishes the fundamental ideas of survey sampling.  It
was used by the statisticians in the U.S. Bureau of Labor Statistics
to establish the first surveys of unemployment.
2.
 An appendix establishes the fundamental ideas of confidence
intervals.
A Confidence interval on a parameter θ is a set of
hypotheses about the value of θ that cannot be
rejected by the data.
 
 
Bayesian
o
The expected coverage of the computed confidence interval is
0.95 regardless of the prior distribution on 
θ
.
Frequentist  (derived by Neyman to meet Harold Hottelling’s
criticism of the Bayesian definition)
o
95% of all confidence intervals computed this way will contain
the true value of 
θ
.
o
Anscombe: “What has the statistician’s
 
long run probability of error to do with
 
whether this patient should be given
 
this treatment?”
 
They computed 95% confidence bounds on the ratio
          Prob{event|aspirin}/Prob{event|placebo}
                       95% C.I. = [0.80, 1.03]
Interpretation:  Use of low dose aspirin in women might reduce
incidence of cardiovascular events by as much as 20%
    (or increase it by as much as 3%)
Modern clinical studies cost more than $10,000 per
patient.
100,000 subject study would cost > $1 billion.
L.J. Cohen, philosopher at Oxford University
o
Critic of the use of statistical models in science.
o
One can never come to a certain conclusion with statistical models
alone.
o
To reach a scientific conclusion, it is necessary to bring in
information external to the experimental study.
o
(Cohen’s solution is to replace hypothesis testing with modal valued
logic, a system of symbolic logic that denies the law of the excluded
middle.)
1.
The pharmacological mechanism of low dose aspirin is
firmly established and is not gender related in
experimental animals.
2.
The cost of a false positive is small.  Aspirin is cheap.  Low
doses of aspirin are very safe for most people.
3.
The cost of a false negative, if the use of low dose aspirin
decreases CV events by 20%, is immense.
Conclusion:  Women should be given daily low doses of aspirin
to prevent cardiovascular events.
 
 
 
 
 
 
Side note:   All the male studies and this women’s study of
low dose aspirin have shown a consistent 8-fold increase in
the incidence of hemorrhagic stroke for patients on
aspirin—the comparison sometimes reaching statistical
significance.
Slide Note
Embed
Share

The content discusses historical experiments in statistics, focusing on the mistakes and outcomes of studies like the Lanarkshire Milk Study. It showcases how experiments can go astray due to factors like inadequate sampling, measurement biases, and human intervention. The narrative emphasizes the importance of proper experimental design and draws attention to key results, such as the negligible difference in weight gain between children consuming raw and pasteurized milk. It serves as a cautionary tale on the pitfalls of flawed experimentation.

  • Statistics
  • Experimentation
  • Mistakes
  • Results
  • Historical

Uploaded on Sep 28, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Statistics and Experimentation David Salsburg AP Statistics Reading Daytona Beach, Florida June 16, 2011

  2. Harvey was wrong William Harvey, circulation of the blood, 1628 Bishop of Chichester: o Harvey was wrong because he used experimentation and, It is well known that Nature abhors experimentation and will purposely do things wrong if you attempt to experiment.

  3. An experiment that went wrong The Lanarkshire Milk Study (1929) o Question: Does Pasteurization take the good out of the milk? How do you measure the good in milk? o Measure weight gain in children as a surrogate. Yule: In our lust for measurement, we frequently measure that which we can, rather than that which we wish to measure, and forget that there is a difference. o Measures of intelligence

  4. An experiment that went wrong If children were to be used, which children? o in school Where? o Easily available in London or Manchester, but too heterogeneous a population, too much variability in socioeconomic factors. o Lanarkshire County, Scotland, population 300,000,evenly divided into small factory towns and rural communities. How many children? o Neyman-Pearson concept of power not yet published.

  5. Final Design 20,000 children, 200-400 per school, several grades o 5,000 randomly assigned an extra daily ration of raw milk o 5,000 randomly assigned an extra daily ration of Pasteurized milk o 10,000 randomly assigned to no extra milk controls Study ran from Feb-June, 1930, the children weighed at the beginning of the study and at the end.

  6. Results Average weight gain for children on raw milk almost exactly the same as average weight gain for children on Pasteurized milk. Average weight gain for children kept as controls (no extra milk) three times the average weight gain of the two other groups. No loss of good in milk (as measured by weight gain) when pasteurized Best not to give children any extra milk, raw or pasteurized! 1) 2)

  7. An experiment that went wrong Royal Commission sent to investigate o William Sealy Gossett ( Student ) chairman http://www.jstor.org/literatum/publisher/jstor/journals/covergifs/biometrika/cover.gif Conclusion: The teachers had been told to randomly assign but many of them took pity on the sickly and poor students and assigned them the extra milk.

  8. HOW DO YOU RANDOMIZE? Can you do it with haphazard choice by humans? o Problem of digit preference Can you let nature do it? o Toxicological studies of mice.

  9. How did R.A. Fisher randomize? Last two digits of populations of English towns in the 1921 census. o A table of 7500 two digit numbers arranged in blocks of 25. First block of 25: 03 47 43 73 86 36 96 47 36 61 46 98 63 71 62 33 26 16 80 45 60 11 14 10 95

  10. Rand Corporation book of 1 million random digits Martin Gardiner (Scientific American): This is the quintessential book of the Twentieth Century. Not only was no book produced like this in previous centuries, no one would have ever conceived of a book like this in previous centuries.

  11. How do you use a table of random numbers? You do not start at the beginning. Otherwise, all randomizations would be the same. You do not begin haphazardly (at random?). Books tend to have broken binding so haphazard openings often are at the same page. 1) 2)

  12. RAND book preface You open the book haphazardly and pick a point to start haphazardly. You pick out three digits, two digits, two more digits, and one digit. You go to the page indicated by the three digits, the line indicated by the first of the two digits, the column indicated by the second of the two digits. Then you proceed up and to the left (at the top of the page) if the final single digit is odd or down and to the right if it is even. 1. 2.

  13. Applying this method to Fishers 6-page table I open it haphazardly (to page 2) and pick a spot haphazardly, yielding the following sequence 2, 12, 23, 6 I go to page 2, line 12, column 23, and go left and up from there. This yields the sequence: 67, 96, 57, 88, 30, 22, 23, 51, 14, 40, 24, 96,

  14. Comparing Three Treatments Suppose I have three treatments, A, B, and C, to be applied to blocks of three A, B, C / A, B, C / A, I append the sequence of numbers to this sequence of symbols A-67, B-96, C-57/ A-88,B-30,C-22/ A-23, B-51, C-14/ I reorder the symbols A, B, C within each block following the order of the random numbers CAB/CBA/CAB/BAC

  15. Modern Methods Use computer algorithm to generate a pseudo-random sequence. Most popular method, congruence generator: X(i+1) = res( AX(i) + B | C) o A,B,C are mutually prime. o The congruence generator cycles after K values, but K is a function of X(1), A, B, and C and can be calculated.

  16. Philosophical question Can a pseudo-random number generator produce truly random numbers? Fisher: Foolish question. All that is needed is that all possible treatment assignments be equally probable.

  17. Can we do better than random? NAACP and jury lists in Texas counties (1960s) Knut-Vik designs o Student (1932) showed that Knut-Vik designs produce biased (downwards) estimates of the residual variance. o Fisher (1935) random assignment produces the least variance of all unbiased designs.

  18. A study that did work Women s Health Initiative Study of aspirin vrs placebo to prevent heart attacks or cardiovascular death in women. (March, 2005, New England J. of Medicine) Question: Does low dose aspirin prevent cardiovascular problems for women as it does for men? All but one of prior studies had used only men. Consistent finding: 81 mg aspirin a day reduces the incidence of non-fatal heart attacks by app. 30% and the incidence of cardiovascular related death by app. 20%. One study that did use women as well as men enrolled 214 women, reduced incidence of cardiovascular related death by 9% (not statistically significant).

  19. New Study Large number of women (39,876) because incidence of cardiovascular events lower in women than in men. Higher daily dose of aspirin (100 mg) Longer follow-up (10 years vrs 5 in men s studies) Single predefined end-point: Stroke, MI, or cardiovascular related death. Problems with the end-point: 1. Equivocal symptoms when patients arrive in emergency rooms 2. Death certificates unreliable 3. What happens if a patient has multiple events over the 10 year period? Solution: Set up elaborate check-list to define the events of interest. Choose only the first such event in a patient s record to count. 1) 2) 3) 4)

  20. Results 477 women on aspirin had a cardiovascular event 522 women on placebo had a cardiovascular event p-value of the comparison 0.13

  21. Confidence Bounds Neyman s original definition (1934) o On the two different aspects of the representative method, J. Royal Statistical Society, vol. 97, pp. 558-625. The paper establishes the fundamental ideas of survey sampling. It was used by the statisticians in the U.S. Bureau of Labor Statistics to establish the first surveys of unemployment. An appendix establishes the fundamental ideas of confidence intervals. A Confidence interval on a parameter is a set of hypotheses about the value of that cannot be rejected by the data. 1. 2.

  22. Other definitions of a confidence interval Bayesian o The expected coverage of the computed confidence interval is 0.95 regardless of the prior distribution on . Frequentist (derived by Neyman to meet Harold Hottelling s criticism of the Bayesian definition) o 95% of all confidence intervals computed this way will contain the true value of . o Anscombe: What has the statistician s long run probability of error to do with whether this patient should be given this treatment? http://upload.wikimedia.org/wikipedia/en/d/d5/Francis_Anscombe.jpeg

  23. Womens Health Initiative study They computed 95% confidence bounds on the ratio Prob{event|aspirin}/Prob{event|placebo} 95% C.I. = [0.80, 1.03] Interpretation: Use of low dose aspirin in women might reduce incidence of cardiovascular events by as much as 20% (or increase it by as much as 3%)

  24. Can the study be repeated with more subjects and greater statistical power? Modern clinical studies cost more than $10,000 per patient. 100,000 subject study would cost > $1 billion.

  25. Conclusion? L.J. Cohen, philosopher at Oxford University o Critic of the use of statistical models in science. o One can never come to a certain conclusion with statistical models alone. o To reach a scientific conclusion, it is necessary to bring in information external to the experimental study. o (Cohen s solution is to replace hypothesis testing with modal valued logic, a system of symbolic logic that denies the law of the excluded middle.)

  26. Ignoring Cohens solution: What information exists from outside this trial? The pharmacological mechanism of low dose aspirin is firmly established and is not gender related in experimental animals. The cost of a false positive is small. Aspirin is cheap. Low doses of aspirin are very safe for most people. The cost of a false negative, if the use of low dose aspirin decreases CV events by 20%, is immense. Conclusion: Women should be given daily low doses of aspirin to prevent cardiovascular events. 1. 2. 3.

  27. Was it worth doing the study? Side note: All the male studies and this women s study of low dose aspirin have shown a consistent 8-fold increase in the incidence of hemorrhagic stroke for patients on aspirin the comparison sometimes reaching statistical significance.

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#