Cognitive Biases in Sustaining Bad Science

 
T
h
e
 
r
o
l
e
 
o
f
 
c
o
g
n
i
t
i
v
e
 
b
i
a
s
e
s
 
i
n
 
s
u
s
t
a
i
n
i
n
g
 
b
a
d
 
s
c
i
e
n
c
e
 
Dorothy V. M. Bishop
Professor of Developmental Neuropsychology
University of Oxford
@deevybee
Data dredging
Omitting null results
Weak experimental design
Underspecified methods
Errors (e.g. faulty equipment)
Underpowered studies
Need better training
in methods
 
Possible
solutions:
Emphasis on
both bottom-up
and top-down
changes
 
2
 
Academy of Medical Sciences, 2015
Report on Reproducibility and Reliability of  Biomedical Research
Need to change
incentives
 
Cognitive constraints that can make
it hard to do science well
 
 
Seeing patterns in noise
Systematic misunderstanding of probability
Schemata: Need for narrative
Asymmetric moral judgements
Confirmation bias: selective attention/memory
 
 
 
Cognitive constraints that can make
it hard to do science well
 
 
Seeing patterns in noise
Systematic misunderstanding of probability
Schemata: Need for narrative
Asymmetric moral judgements
Confirmation bias: selective attention/memory
 
 
 
Data analysis
Why is p-hacking so common?
 
5
 
1 contrast
 
Probability of a
‘significant’ p-value
< .05 = .05
 
Large population
database used to explore
link between ADHD and
handedness
 
https://figshare.com/articles/The_Garden_of_Forking_Paths/2100379
 
6
 
Focus just on Young
subgroup:
2 contrasts at this level
 
Large population
database used to explore
link between ADHD and
handedness
 
Probability of at least one ‘significant’ p-value < .05 under null
hypothesis computed as 1 minus probability of NO significant result,
i.e. 1 - .95^2 =   .10
NB. If I had predicted 
this specific association
, then probability  = .05.
    
 Problem arises if I am happy with ANY significant association!
 
7
 
Focus just on Young on
measure of hand skill:
4 contrasts at this level
 
Probability of at least one
‘significant’ p-value < .05
= .19
 
Large population
database used to explore
link between ADHD and
handedness
 
8
 
Focus just on Young,
Females on
measure of hand skill:
8 contrasts at this level
 
Probability of at least one
‘significant’ p-value < .05
= .34
 
Large population
database used to explore
link between ADHD and
handedness
 
9
Focus just on Young, Urban, Females on
measure of hand skill:
16 contrasts at this level
 
If no a priori prediction, then the relevant
value to compute is the probability of
at least one
 ‘significant’ p-value < .05 = .56
 
Large population
database used to explore
link between ADHD and
handedness
 
10
 
11
 
% moral judgements endorsed by members of public  (N = 406)
Pickett, J. T., & Roche, S. P. (2018). Questionable, objectionable or criminal? Public opinion on
data fraud and selective reporting in science. 
Science and Engineering Ethics, 24
(1).
doi:10.1007/s11948-017-9886-2
 
Falsification and selective reporting (p-hacking) in
science
 
Errors of omission vs commission in studies of
negotiation by Rogers et al (2016)
 
 
 
 
 
Lying (untrue statement)
 
Omission of relevant information
 
Stating something that is true, but in a misleading way
(paltering)
 
Omission of information seen as dishonest, but more acceptable than lying
 
12
Rogers, T., Zeckhauser, R., Gino, F., Norton, M. I., & Schweitzer, M. E. (2017).
Artful paltering: The risks and rewards of using truthful statements to mislead
others. Journal of Personality and Social Psychology, 112(3), 456-473.
 
Errors of omission vs commission in studies of
negotiation by Rogers et al (2016)
 
 
 
 
 
Lying (untrue statement)
 
Omission of relevant information
 
Stating something that is true, but in a misleading way
(paltering)
 
Omission of information seen as dishonest, but more acceptable than lying
 
13
P-hacking has features of paltering.
Doesn’t involve changing data: you report what SPSS gives you!
If you don’t understand probability, may seem innocuous
- More like jaywalking than burglary!
 
 
 Confirmation bias
 
14
 
 
 
We find it easier to process and remember information
that agrees with our viewpoint
 
Cherry-picking as confirmation bias
 
15
 
Twin studies of Developmental
Language Disorder
 
  
    
   probandwise
      
   concordance:
      
same-sex twins
      
   MZ
  
DZ
Lewis & Thompson, 1992
 
             .86
  
.48
Bishop et al, 1995
  
             .70
  
.46
Tomblin & Buckwalter, 1998           
 
.96
  
.69
Hayiou-Thomas et al, 2005
 
 
 
 .36                    .33
 
16
A personal example:
 Suppressed memory of relevant research, when it does not fit
Twin concordance
points to genetic
influence when
MZ > DZ
 
Twin studies of DLD
 
  
    
   probandwise
      
   concordance:
      
same-sex twins
      
   MZ
  
DZ
Lewis & Thompson, 1992
 
             .86
  
.48
Bishop et al, 1995
  
             .70
  
.46
Tomblin & Buckwalter, 1998           
 
.96
  
.69
Hayiou-Thomas et al, 2005
 
 
 
.36                     .33
 
17
Twin concordance
points to genetic
influence when
MZ > DZ
 
I failed to mention this in talks for several years – I literally forgot about it!
A personal example:
 Suppressed memory of relevant research, when it does not fit
 
Example of paltering in a literature review
 
“Regardless of etiology, cerebellar neuropathology
commonly occurs in autistic individuals. Cerebellar
hypoplasia and reduced cerebellar Purkinje cell
numbers are the most consistent neuropathologies
linked to autism [
8
, 
9
, 
10
, 
11
, 
12
, 
13
]. MRI studies
report that autistic children have smaller cerebellar
vermal volume in comparison to typically developing
children [
14
].”
 
Example: Study published in 2013
 
Meta-analysis: Traut et al (2018) 
https://doi.org/10.1016/j.biopsych.2017.09.029
Webb et al did find area of vermis smaller in ASD after covarying cerebellum size
 
Just as with reporting of results, omission/paltering in reporting the
literature is not seen as serious
 
20
Can be unintentional – hard to establish blame
Need to tell ‘a good story’ in limited space
Social pressures (‘everyone is doing it’)
 
Just as with reporting of results, omission/paltering in reporting the
literature is not seen as serious
 
21
Can be unintentional – hard to establish blame
Need to tell ‘a good story’ in limited space
Social pressures (‘everyone is doing it’)
Impossible to read everything
No obvious victims
Impact assumed to be small
 
Biased reporting:
How big is the effect?
 
22
 
Consider a series of experiments testing effectiveness of a treatment
Y = significant difference in favour of treatment (T )
N = nonsignificant difference between T and control
Alpha = .05:
   probability of Y when T has no effect = .05
   probability of N when T has no effect = .95
Power = .8
   probability of Y when T is effective = .80
   probability of N when T is effective = .20
At the outset, you think T has a 50:50 chance of working
What would you conclude from this series of results:
                                              
Y N Y N N Y
 
 
23
When sequence of 'significant' results is
Y N Y N N Y
24
Vote Trigger
 
Sequence is: Y N Y N N Y
 
A: Treatment very likely to be
ineffective
B:  Treatment may be effective,
but need more experiments to
be sure
C: Treatment very likely to be
effective
 
25
Log odds of 3 means 20 times more
likely to be effective than ineffective
When sequence of 'significant' results is
Y N N N Y N N N Y N
26
Vote Trigger
 
Sequence is: Y N 
N N 
Y N N 
N
 Y 
N
 
A: Treatment very likely to be
ineffective
B:  Treatment may be
effective, but need more
experiments to be sure
C: Treatment very likely to be
effective
 
27
Trials in red not
reported/not cited
 
Sequence is: Y N 
N N 
Y N N 
N
 Y 
N
 
A: Treatment very likely to be
ineffective
B:  Treatment may be
effective, but need more
experiments to be sure
C: Treatment very likely to be
effective
 
28
Trials in red not
reported/not cited
 
Sequence is: Y N 
N N 
Y N N 
N
 Y 
N
 
A: Treatment very likely to be
ineffective
B:  Treatment may be
effective, but need more
experiments to be sure
C: Treatment very likely to be
effective
 
29
Black line shows situation when p-hacking used, so that
effective alpha is .2 rather than .05
 
 
Silas Boye Nissen, Tali Magidson, Kevin Gross,
Carl T Bergstrom
 
30
 
Silas Boye Nissen, Tali Magidson, Kevin Gross,
Carl T Bergstrom
31
 
Inheritance of bias
 
32
 
When we read a peer-reviewed paper, we tend to trust the citations
that back up a point
 
When we come to write our own paper, we cite the same materials
 
A good scientist won’t cite papers without reading them, but even
this won’t save you from bias – you inherit it from prior papers
 
If prior papers only cite studies agreeing with a viewpoint, that
viewpoint gets entrenched
 
You won’t know – unless you explicitly search – that there are other
studies that give a different picture
 
So errors of omission/paltering in reviews can have serious
cumulative effects – false ‘canonization’ of facts
Overlooked victims:
General public
Esp. potential users of research (patients, etc)
Researchers trying to build on results
Funders
 
33
 
The (partial) solution from clinical trials
 
Always start work in a new area with a systematic review
 
Systematic review
Collecting and summarise all empirical evidence that fits
pre-specified eligibility criteria to address a specific
question
 
But relevant studies found by searching titles and
abstracts. These tend to mention only positive results!
 
34
 
Classic p-hacking
Study looked at association with autism for many “occupational exposures” for
both parents, and found none survived Bonferroni correction.
Abstract just reported the one result that was “significant”
A systematic review of other substances (e.g. pesticides) would not find the
null results from this study when screening Abstracts
Data dredging
Omitting null results
Weak experimental design
Underspecified methods
Errors (e.g. faulty equipment)
Underpowered studies
Need better training
in methods
 
36
 
Academy of Medical Sciences, 2015
Report on Reproducibility and Reliability of  Biomedical Research
Need to change
incentives
Data dredging
Omitting null results
Weak experimental design
Underspecified methods
Errors (e.g. faulty equipment)
Underpowered studies
Need better training
in methods
37
Need to change 
incentives
What’s missing?
How humans think
& reason
 
Professor Dorothy Bishop
Department of Experimental Psychology,
Anna Watts Building,
Woodstock Road,
Oxford, OX2 6GG.
 
 
 
 
@deevybee
 
38
 
Thank you for listening!
 
Longer written version:
https://psyarxiv.com/hnbex/
 
Other slideshows: 
https://www.slideshare.net/deevybishop
Blogposts: 
http://deevybee.blogspot.com/2012/11/
bishopblog-catalogue-updated-24th-nov.html
Slide Note
Embed
Share

Cognitive biases play a significant role in perpetuating bad science by influencing the way research is conducted. Inaccurate interpretations of data, confirmation bias, and selective attention/memory can lead to flawed conclusions. The Academy of Medical Sciences highlights the need for better training in research methods to improve reproducibility and reliability in biomedical research. Cognitive constraints such as seeing patterns in noise and misunderstanding probability further complicate scientific endeavors. P-hacking is a common issue in data analysis, impacting the validity of research findings. Examples of using large population databases to explore associations like ADHD and handedness shed light on the challenges of interpreting statistical results.

  • Cognitive Biases
  • Bad Science
  • Research Methods
  • Biomedical Research
  • P-Hacking

Uploaded on Oct 04, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. The role of cognitive biases in sustaining bad science The role of cognitive biases in sustaining bad science Dorothy V. M. Bishop Professor of Developmental Neuropsychology University of Oxford @deevybee

  2. Academy of Medical Sciences, 2015 Report on Reproducibility and Reliability of Biomedical Research Need to change incentives Possible solutions: Emphasis on both bottom-up and top-down changes Data dredging Omitting null results Weak experimental design Underspecified methods Errors (e.g. faulty equipment) Underpowered studies Need better training in methods 2

  3. Cognitive constraints that can make it hard to do science well Seeing patterns in noise Systematic misunderstanding of probability Schemata: Need for narrative Asymmetric moral judgements Confirmation bias: selective attention/memory

  4. Cognitive constraints that can make it hard to do science well Seeing patterns in noise Systematic misunderstanding of probability Schemata: Need for narrative Asymmetric moral judgements Confirmation bias: selective attention/memory

  5. Data analysis Why is p-hacking so common? 5

  6. Large population database used to explore link between ADHD and handedness 1 contrast Probability of a significant p-value < .05 = .05 6 https://figshare.com/articles/The_Garden_of_Forking_Paths/2100379

  7. Large population database used to explore link between ADHD and handedness Focus just on Young subgroup: 2 contrasts at this level Probability of at least one significant p-value < .05 under null hypothesis computed as 1 minus probability of NO significant result, i.e. 1 - .95^2 = .10 NB. If I had predicted this specific association, then probability = .05. Problem arises if I am happy with ANY significant association! 7

  8. Large population database used to explore link between ADHD and handedness Focus just on Young on measure of hand skill: 4 contrasts at this level Probability of at least one significant p-value < .05 = .19 8

  9. Large population database used to explore link between ADHD and handedness Focus just on Young, Females on measure of hand skill: 8 contrasts at this level Probability of at least one significant p-value < .05 = .34 9

  10. Large population database used to explore link between ADHD and handedness Focus just on Young, Urban, Females on measure of hand skill: 16 contrasts at this level If no a priori prediction, then the relevant value to compute is the probability of at least one significant p-value < .05 = .56 10

  11. Falsification and selective reporting (p-hacking) in science % moral judgements endorsed by members of public (N = 406) Falsification Selective reporting 71 63 73 37 Morally unacceptable Should be fired Should receive funding ban 93 Should be a crime 96 96 66 Pickett, J. T., & Roche, S. P. (2018). Questionable, objectionable or criminal? Public opinion on data fraud and selective reporting in science. Science and Engineering Ethics, 24(1). doi:10.1007/s11948-017-9886-2 11

  12. Errors of omission vs commission in studies of negotiation by Rogers et al (2016) Omission of information seen as dishonest, but more acceptable than lying Honesty judgement 5% Lying (untrue statement) 23% Omission of relevant information 32% Stating something that is true, but in a misleading way (paltering) Rogers, T., Zeckhauser, R., Gino, F., Norton, M. I., & Schweitzer, M. E. (2017). Artful paltering: The risks and rewards of using truthful statements to mislead others. Journal of Personality and Social Psychology, 112(3), 456-473. 12

  13. Errors of omission vs commission in studies of negotiation by Rogers et al (2016) Omission of information seen as dishonest, but more acceptable than lying Honesty judgement 5% Lying (untrue statement) 23% Omission of relevant information 32% Stating something that is true, but in a misleading way (paltering) P-hacking has features of paltering. Doesn t involve changing data: you report what SPSS gives you! If you don t understand probability, may seem innocuous - More like jaywalking than burglary! 13

  14. Confirmation bias 14

  15. Cherry-picking as confirmation bias We find it easier to process and remember information that agrees with our viewpoint 15

  16. A personal example: Suppressed memory of relevant research, when it does not fit Twin studies of Developmental Language Disorder probandwise points to genetic influence when MZ > DZ Twin concordance concordance: same-sex twins MZ DZ Lewis & Thompson, 1992 Bishop et al, 1995 Tomblin & Buckwalter, 1998 Hayiou-Thomas et al, 2005 .86 .70 .48 .46 .69 .96 .36 .33 16

  17. A personal example: Suppressed memory of relevant research, when it does not fit Twin studies of DLD Twin concordance points to genetic influence when MZ > DZ probandwise concordance: same-sex twins MZ DZ Lewis & Thompson, 1992 Bishop et al, 1995 Tomblin & Buckwalter, 1998 Hayiou-Thomas et al, 2005 .86 .70 .48 .46 .69 .96 .36 .33 I failed to mention this in talks for several years I literally forgot about it! 17

  18. Example of paltering in a literature review Example: Study published in 2013 Regardless of etiology, cerebellar neuropathology commonly occurs in autistic individuals. Cerebellar hypoplasia and reduced cerebellar Purkinje cell numbers are the most consistent neuropathologies linked to autism [8, 9, 10, 11, 12, 13]. MRI studies report that autistic children have smaller cerebellar vermal volume in comparison to typically developing children [14].

  19. Meta-analysis: Traut et al (2018) https://doi.org/10.1016/j.biopsych.2017.09.029 Standardized mean difference is +ve when cerebellar volume is greater in ASD Webb et al did find area of vermis smaller in ASD after covarying cerebellum size

  20. Just as with reporting of results, omission/paltering in reporting the literature is not seen as serious Can be unintentional hard to establish blame Need to tell a good story in limited space Social pressures ( everyone is doing it ) 20

  21. Just as with reporting of results, omission/paltering in reporting the literature is not seen as serious Can be unintentional hard to establish blame Need to tell a good story in limited space Social pressures ( everyone is doing it ) Impossible to read everything No obvious victims Impact assumed to be small 21

  22. Biased reporting: How big is the effect? 22

  23. Consider a series of experiments testing effectiveness of a treatment Y = significant difference in favour of treatment (T ) N = nonsignificant difference between T and control Alpha = .05: probability of Y when T has no effect = .05 probability of N when T has no effect = .95 Power = .8 probability of Y when T is effective = .80 probability of N when T is effective = .20 At the outset, you think T has a 50:50 chance of working What would you conclude from this series of results: Y N Y N N Y 23

  24. When sequence of 'significant' results is Y N Y N N Y 1. Treatment very likely to be ineffective 2. Treatment may be effective, but need more experiments to be sure 3. Treatment very likely to be effective 24

  25. Log odds of 3 means 20 times more likely to be effective than ineffective Sequence is: Y N Y N N Y A: Treatment very likely to be ineffective B: Treatment may be effective, but need more experiments to be sure C: Treatment very likely to be effective 25

  26. When sequence of 'significant' results is Y N N N Y N N N Y N 1. Treatment very likely to be ineffective 2. Treatment may be effective, but need more experiments to be sure 3. Treatment very likely to be effective 26

  27. Sequence is: Y N N N Y N N N Y N Trials in red not reported/not cited A: Treatment very likely to be ineffective B: Treatment may be effective, but need more experiments to be sure C: Treatment very likely to be effective 27

  28. Sequence is: Y N N N Y N N N Y N Trials in red not reported/not cited A: Treatment very likely to be ineffective B: Treatment may be effective, but need more experiments to be sure C: Treatment very likely to be effective 28

  29. Sequence is: Y N N N Y N N N Y N Black line shows situation when p-hacking used, so that effective alpha is .2 rather than .05 A: Treatment very likely to be ineffective B: Treatment may be effective, but need more experiments to be sure C: Treatment very likely to be effective 29

  30. Silas Boye Nissen, Tali Magidson, Kevin Gross, Carl T Bergstrom 30

  31. Silas Boye Nissen, Tali Magidson, Kevin Gross, Carl T Bergstrom Or citation! 31

  32. Inheritance of bias When we read a peer-reviewed paper, we tend to trust the citations that back up a point When we come to write our own paper, we cite the same materials A good scientist won t cite papers without reading them, but even this won t save you from bias you inherit it from prior papers If prior papers only cite studies agreeing with a viewpoint, that viewpoint gets entrenched You won t know unless you explicitly search that there are other studies that give a different picture 32

  33. So errors of omission/paltering in reviews can have serious cumulative effects false canonization of facts Overlooked victims: General public Esp. potential users of research (patients, etc) Researchers trying to build on results Funders 33

  34. The (partial) solution from clinical trials Always start work in a new area with a systematic review Systematic review Collecting and summarise all empirical evidence that fits pre-specified eligibility criteria to address a specific question But relevant studies found by searching titles and abstracts. These tend to mention only positive results! 34

  35. Classic p-hacking Study looked at association with autism for many occupational exposures for both parents, and found none survived Bonferroni correction. Abstract just reported the one result that was significant A systematic review of other substances (e.g. pesticides) would not find the null results from this study when screening Abstracts

  36. Academy of Medical Sciences, 2015 Report on Reproducibility and Reliability of Biomedical Research Need to change incentives Data dredging Omitting null results Weak experimental design Underspecified methods Errors (e.g. faulty equipment) Underpowered studies Need better training in methods 36

  37. Need to change incentives What s missing? How humans think & reason Data dredging Omitting null results Find ways to counteract cognitive biases Weak experimental design Underspecified methods Errors (e.g. faulty equipment) Underpowered studies Need better training in methods 37

  38. Thank you for listening! Longer written version: https://psyarxiv.com/hnbex/ Other slideshows: https://www.slideshare.net/deevybishop Blogposts: http://deevybee.blogspot.com/2012/11/ bishopblog-catalogue-updated-24th-nov.html Professor Dorothy Bishop Department of Experimental Psychology, Anna Watts Building, Woodstock Road, Oxford, OX2 6GG. @deevybee 38

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#