Cognitive Biases in Sustaining Bad Science

Dorothy V. M. Bishop

Professor of Developmental Neuropsychology

University of Oxford

@deevybee

Data dredging

Omitting null results

Weak experimental design

Underspecified methods

Errors (e.g. faulty equipment)

Underpowered studies

Need better training

in methods

Possible

solutions:

Emphasis on

both bottom-up

and top-down

changes

Academy of Medical Sciences, 2015

Report on Reproducibility and Reliability of  Biomedical Research

Need to change

incentives

Cognitive constraints that can make

it hard to do science well

•

Seeing patterns in noise

•

Systematic misunderstanding of probability

•

Schemata: Need for narrative

•

Asymmetric moral judgements

•

Confirmation bias: selective attention/memory

Cognitive constraints that can make

it hard to do science well

•

Seeing patterns in noise

•

Systematic misunderstanding of probability

•

Schemata: Need for narrative

•

Asymmetric moral judgements

•

Confirmation bias: selective attention/memory

Data analysis

Why is p-hacking so common?

1 contrast

Probability of a

‘significant’ p-value

< .05 = .05

Large population

database used to explore

link between ADHD and

handedness

https://figshare.com/articles/The_Garden_of_Forking_Paths/2100379

Focus just on Young

subgroup:

2 contrasts at this level

Large population

database used to explore

link between ADHD and

handedness

•

Probability of at least one ‘significant’ p-value < .05 under null

hypothesis computed as 1 minus probability of NO significant result,

i.e. 1 - .95^2 =   .10

•

NB. If I had predicted

this specific association

, then probability  = .05.

 Problem arises if I am happy with ANY significant association!

Focus just on Young on

measure of hand skill:

4 contrasts at this level

Probability of at least one

‘significant’ p-value < .05

= .19

Large population

database used to explore

link between ADHD and

handedness

Focus just on Young,

Females on

measure of hand skill:

8 contrasts at this level

Probability of at least one

‘significant’ p-value < .05

= .34

Large population

database used to explore

link between ADHD and

handedness

Focus just on Young, Urban, Females on

measure of hand skill:

16 contrasts at this level

If no a priori prediction, then the relevant

value to compute is the probability of

at least one

 ‘significant’ p-value < .05 = .56

Large population

database used to explore

link between ADHD and

handedness

% moral judgements endorsed by members of public  (N = 406)

Pickett, J. T., & Roche, S. P. (2018). Questionable, objectionable or criminal? Public opinion on

data fraud and selective reporting in science.

Science and Engineering Ethics, 24

(1).

doi:10.1007/s11948-017-9886-2

Falsification and selective reporting (p-hacking) in

science

Errors of omission vs commission in studies of

negotiation by Rogers et al (2016)

•

Lying (untrue statement)

•

Omission of relevant information

•

Stating something that is true, but in a misleading way

(paltering)

Omission of information seen as dishonest, but more acceptable than lying

Rogers, T., Zeckhauser, R., Gino, F., Norton, M. I., & Schweitzer, M. E. (2017).

Artful paltering: The risks and rewards of using truthful statements to mislead

others. Journal of Personality and Social Psychology, 112(3), 456-473.

Errors of omission vs commission in studies of

negotiation by Rogers et al (2016)

•

Lying (untrue statement)

•

Omission of relevant information

•

Stating something that is true, but in a misleading way

(paltering)

Omission of information seen as dishonest, but more acceptable than lying

P-hacking has features of paltering.

Doesn’t involve changing data: you report what SPSS gives you!

If you don’t understand probability, may seem innocuous

- More like jaywalking than burglary!

 Confirmation bias

We find it easier to process and remember information

that agrees with our viewpoint

Cherry-picking as confirmation bias

Twin studies of Developmental

Language Disorder

   probandwise

   concordance:

same-sex twins

MZ

DZ

Lewis & Thompson, 1992

.86

.48

Bishop et al, 1995

.70

.46

Tomblin & Buckwalter, 1998

.96

.69

Hayiou-Thomas et al, 2005

 .36                    .33

A personal example:

 Suppressed memory of relevant research, when it does not fit

Twin concordance

points to genetic

influence when

MZ > DZ

Twin studies of DLD

   probandwise

   concordance:

same-sex twins

MZ

DZ

Lewis & Thompson, 1992

.86

.48

Bishop et al, 1995

.70

.46

Tomblin & Buckwalter, 1998

.96

.69

Hayiou-Thomas et al, 2005

.36                     .33

Twin concordance

points to genetic

influence when

MZ > DZ

I failed to mention this in talks for several years – I literally forgot about it!

A personal example:

 Suppressed memory of relevant research, when it does not fit

Example of paltering in a literature review

“Regardless of etiology, cerebellar neuropathology

commonly occurs in autistic individuals. Cerebellar

hypoplasia and reduced cerebellar Purkinje cell

numbers are the most consistent neuropathologies

linked to autism [

]. MRI studies

report that autistic children have smaller cerebellar

vermal volume in comparison to typically developing

children [

].”

Example: Study published in 2013

Meta-analysis: Traut et al (2018)

https://doi.org/10.1016/j.biopsych.2017.09.029

Webb et al did find area of vermis smaller in ASD after covarying cerebellum size

Just as with reporting of results, omission/paltering in reporting the

literature is not seen as serious

•

Can be unintentional – hard to establish blame

•

Need to tell ‘a good story’ in limited space

•

Social pressures (‘everyone is doing it’)

Just as with reporting of results, omission/paltering in reporting the

literature is not seen as serious

•

Can be unintentional – hard to establish blame

•

Need to tell ‘a good story’ in limited space

•

Social pressures (‘everyone is doing it’)

•

Impossible to read everything

•

No obvious victims

•

Impact assumed to be small

Biased reporting:

How big is the effect?

Consider a series of experiments testing effectiveness of a treatment

Y = significant difference in favour of treatment (T )

N = nonsignificant difference between T and control

Alpha = .05:

   probability of Y when T has no effect = .05

   probability of N when T has no effect = .95

Power = .8

   probability of Y when T is effective = .80

   probability of N when T is effective = .20

At the outset, you think T has a 50:50 chance of working

What would you conclude from this series of results:

Y N Y N N Y

When sequence of 'significant' results is

Y N Y N N Y

Vote Trigger

Sequence is: Y N Y N N Y

•

A: Treatment very likely to be

ineffective

•

B:  Treatment may be effective,

but need more experiments to

be sure

•

C: Treatment very likely to be

effective

Log odds of 3 means 20 times more

likely to be effective than ineffective

When sequence of 'significant' results is

Y N N N Y N N N Y N

Vote Trigger

Sequence is: Y N

N N

Y N N

•

A: Treatment very likely to be

ineffective

•

B:  Treatment may be

effective, but need more

experiments to be sure

•

C: Treatment very likely to be

effective

Trials in red not

reported/not cited

Sequence is: Y N

N N

Y N N

•

A: Treatment very likely to be

ineffective

•

B:  Treatment may be

effective, but need more

experiments to be sure

•

C: Treatment very likely to be

effective

Trials in red not

reported/not cited

Sequence is: Y N

N N

Y N N

•

A: Treatment very likely to be

ineffective

•

B:  Treatment may be

effective, but need more

experiments to be sure

•

C: Treatment very likely to be

effective

Black line shows situation when p-hacking used, so that

effective alpha is .2 rather than .05

Silas Boye Nissen, Tali Magidson, Kevin Gross,

Carl T Bergstrom

Silas Boye Nissen, Tali Magidson, Kevin Gross,

Carl T Bergstrom

Inheritance of bias

•

When we read a peer-reviewed paper, we tend to trust the citations

that back up a point

•

When we come to write our own paper, we cite the same materials

•

A good scientist won’t cite papers without reading them, but even

this won’t save you from bias – you inherit it from prior papers

•

If prior papers only cite studies agreeing with a viewpoint, that

viewpoint gets entrenched

•

You won’t know – unless you explicitly search – that there are other

studies that give a different picture

So errors of omission/paltering in reviews can have serious

cumulative effects – false ‘canonization’ of facts

Overlooked victims:

•

General public

Esp. potential users of research (patients, etc)

•

Researchers trying to build on results

•

Funders

The (partial) solution from clinical trials

Always start work in a new area with a systematic review

•

Systematic review

•

Collecting and summarise all empirical evidence that fits

pre-specified eligibility criteria to address a specific

question

•

But relevant studies found by searching titles and

abstracts. These tend to mention only positive results!

•

Classic p-hacking

•

Study looked at association with autism for many “occupational exposures” for

both parents, and found none survived Bonferroni correction.

•

Abstract just reported the one result that was “significant”

•

A systematic review of other substances (e.g. pesticides) would not find the

null results from this study when screening Abstracts

Data dredging

Omitting null results

Weak experimental design

Underspecified methods

Errors (e.g. faulty equipment)

Underpowered studies

Need better training

in methods

Academy of Medical Sciences, 2015

Report on Reproducibility and Reliability of  Biomedical Research

Need to change

incentives

Data dredging

Omitting null results

Weak experimental design

Underspecified methods

Errors (e.g. faulty equipment)

Underpowered studies

Need better training

in methods

Need to change

incentives

What’s missing?

How humans think

& reason

Professor Dorothy Bishop

Department of Experimental Psychology,

Anna Watts Building,

Woodstock Road,

Oxford, OX2 6GG.

@deevybee

Thank you for listening!

Longer written version:

https://psyarxiv.com/hnbex/

Other slideshows:

https://www.slideshare.net/deevybishop

Blogposts:

http://deevybee.blogspot.com/2012/11/

bishopblog-catalogue-updated-24th-nov.html

Slide Note

Embed Share

Download

Cognitive biases play a significant role in perpetuating bad science by influencing the way research is conducted. Inaccurate interpretations of data, confirmation bias, and selective attention/memory can lead to flawed conclusions. The Academy of Medical Sciences highlights the need for better training in research methods to improve reproducibility and reliability in biomedical research. Cognitive constraints such as seeing patterns in noise and misunderstanding probability further complicate scientific endeavors. P-hacking is a common issue in data analysis, impacting the validity of research findings. Examples of using large population databases to explore associations like ADHD and handedness shed light on the challenges of interpreting statistical results.

hyler_a Follow

Uploaded on Oct 04, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

The role of cognitive biases in sustaining bad science The role of cognitive biases in sustaining bad science Dorothy V. M. Bishop Professor of Developmental Neuropsychology University of Oxford @deevybee

Academy of Medical Sciences, 2015 Report on Reproducibility and Reliability of Biomedical Research Need to change incentives Possible solutions: Emphasis on both bottom-up and top-down changes Data dredging Omitting null results Weak experimental design Underspecified methods Errors (e.g. faulty equipment) Underpowered studies Need better training in methods 2

Cognitive constraints that can make it hard to do science well Seeing patterns in noise Systematic misunderstanding of probability Schemata: Need for narrative Asymmetric moral judgements Confirmation bias: selective attention/memory

Cognitive constraints that can make it hard to do science well Seeing patterns in noise Systematic misunderstanding of probability Schemata: Need for narrative Asymmetric moral judgements Confirmation bias: selective attention/memory

Data analysis Why is p-hacking so common? 5

Large population database used to explore link between ADHD and handedness 1 contrast Probability of a significant p-value < .05 = .05 6 https://figshare.com/articles/The_Garden_of_Forking_Paths/2100379

Large population database used to explore link between ADHD and handedness Focus just on Young subgroup: 2 contrasts at this level Probability of at least one significant p-value < .05 under null hypothesis computed as 1 minus probability of NO significant result, i.e. 1 - .95^2 = .10 NB. If I had predicted this specific association, then probability = .05. Problem arises if I am happy with ANY significant association! 7

Large population database used to explore link between ADHD and handedness Focus just on Young on measure of hand skill: 4 contrasts at this level Probability of at least one significant p-value < .05 = .19 8

Large population database used to explore link between ADHD and handedness Focus just on Young, Females on measure of hand skill: 8 contrasts at this level Probability of at least one significant p-value < .05 = .34 9

Large population database used to explore link between ADHD and handedness Focus just on Young, Urban, Females on measure of hand skill: 16 contrasts at this level If no a priori prediction, then the relevant value to compute is the probability of at least one significant p-value < .05 = .56 10

Falsification and selective reporting (p-hacking) in science % moral judgements endorsed by members of public (N = 406) Falsification Selective reporting 71 63 73 37 Morally unacceptable Should be fired Should receive funding ban 93 Should be a crime 96 96 66 Pickett, J. T., & Roche, S. P. (2018). Questionable, objectionable or criminal? Public opinion on data fraud and selective reporting in science. Science and Engineering Ethics, 24(1). doi:10.1007/s11948-017-9886-2 11

Errors of omission vs commission in studies of negotiation by Rogers et al (2016) Omission of information seen as dishonest, but more acceptable than lying Honesty judgement 5% Lying (untrue statement) 23% Omission of relevant information 32% Stating something that is true, but in a misleading way (paltering) Rogers, T., Zeckhauser, R., Gino, F., Norton, M. I., & Schweitzer, M. E. (2017). Artful paltering: The risks and rewards of using truthful statements to mislead others. Journal of Personality and Social Psychology, 112(3), 456-473. 12

Errors of omission vs commission in studies of negotiation by Rogers et al (2016) Omission of information seen as dishonest, but more acceptable than lying Honesty judgement 5% Lying (untrue statement) 23% Omission of relevant information 32% Stating something that is true, but in a misleading way (paltering) P-hacking has features of paltering. Doesn t involve changing data: you report what SPSS gives you! If you don t understand probability, may seem innocuous - More like jaywalking than burglary! 13

Confirmation bias 14

Cherry-picking as confirmation bias We find it easier to process and remember information that agrees with our viewpoint 15

A personal example: Suppressed memory of relevant research, when it does not fit Twin studies of Developmental Language Disorder probandwise points to genetic influence when MZ > DZ Twin concordance concordance: same-sex twins MZ DZ Lewis & Thompson, 1992 Bishop et al, 1995 Tomblin & Buckwalter, 1998 Hayiou-Thomas et al, 2005 .86 .70 .48 .46 .69 .96 .36 .33 16

A personal example: Suppressed memory of relevant research, when it does not fit Twin studies of DLD Twin concordance points to genetic influence when MZ > DZ probandwise concordance: same-sex twins MZ DZ Lewis & Thompson, 1992 Bishop et al, 1995 Tomblin & Buckwalter, 1998 Hayiou-Thomas et al, 2005 .86 .70 .48 .46 .69 .96 .36 .33 I failed to mention this in talks for several years I literally forgot about it! 17

Example of paltering in a literature review Example: Study published in 2013 Regardless of etiology, cerebellar neuropathology commonly occurs in autistic individuals. Cerebellar hypoplasia and reduced cerebellar Purkinje cell numbers are the most consistent neuropathologies linked to autism [8, 9, 10, 11, 12, 13]. MRI studies report that autistic children have smaller cerebellar vermal volume in comparison to typically developing children [14].

Meta-analysis: Traut et al (2018) https://doi.org/10.1016/j.biopsych.2017.09.029 Standardized mean difference is +ve when cerebellar volume is greater in ASD Webb et al did find area of vermis smaller in ASD after covarying cerebellum size

Just as with reporting of results, omission/paltering in reporting the literature is not seen as serious Can be unintentional hard to establish blame Need to tell a good story in limited space Social pressures ( everyone is doing it ) 20

Just as with reporting of results, omission/paltering in reporting the literature is not seen as serious Can be unintentional hard to establish blame Need to tell a good story in limited space Social pressures ( everyone is doing it ) Impossible to read everything No obvious victims Impact assumed to be small 21

Biased reporting: How big is the effect? 22

Consider a series of experiments testing effectiveness of a treatment Y = significant difference in favour of treatment (T ) N = nonsignificant difference between T and control Alpha = .05: probability of Y when T has no effect = .05 probability of N when T has no effect = .95 Power = .8 probability of Y when T is effective = .80 probability of N when T is effective = .20 At the outset, you think T has a 50:50 chance of working What would you conclude from this series of results: Y N Y N N Y 23

When sequence of 'significant' results is Y N Y N N Y 1. Treatment very likely to be ineffective 2. Treatment may be effective, but need more experiments to be sure 3. Treatment very likely to be effective 24

Log odds of 3 means 20 times more likely to be effective than ineffective Sequence is: Y N Y N N Y A: Treatment very likely to be ineffective B: Treatment may be effective, but need more experiments to be sure C: Treatment very likely to be effective 25

When sequence of 'significant' results is Y N N N Y N N N Y N 1. Treatment very likely to be ineffective 2. Treatment may be effective, but need more experiments to be sure 3. Treatment very likely to be effective 26

Sequence is: Y N N N Y N N N Y N Trials in red not reported/not cited A: Treatment very likely to be ineffective B: Treatment may be effective, but need more experiments to be sure C: Treatment very likely to be effective 27

Sequence is: Y N N N Y N N N Y N Trials in red not reported/not cited A: Treatment very likely to be ineffective B: Treatment may be effective, but need more experiments to be sure C: Treatment very likely to be effective 28

Sequence is: Y N N N Y N N N Y N Black line shows situation when p-hacking used, so that effective alpha is .2 rather than .05 A: Treatment very likely to be ineffective B: Treatment may be effective, but need more experiments to be sure C: Treatment very likely to be effective 29

Silas Boye Nissen, Tali Magidson, Kevin Gross, Carl T Bergstrom 30

Silas Boye Nissen, Tali Magidson, Kevin Gross, Carl T Bergstrom Or citation! 31

Inheritance of bias When we read a peer-reviewed paper, we tend to trust the citations that back up a point When we come to write our own paper, we cite the same materials A good scientist won t cite papers without reading them, but even this won t save you from bias you inherit it from prior papers If prior papers only cite studies agreeing with a viewpoint, that viewpoint gets entrenched You won t know unless you explicitly search that there are other studies that give a different picture 32

So errors of omission/paltering in reviews can have serious cumulative effects false canonization of facts Overlooked victims: General public Esp. potential users of research (patients, etc) Researchers trying to build on results Funders 33

The (partial) solution from clinical trials Always start work in a new area with a systematic review Systematic review Collecting and summarise all empirical evidence that fits pre-specified eligibility criteria to address a specific question But relevant studies found by searching titles and abstracts. These tend to mention only positive results! 34

Classic p-hacking Study looked at association with autism for many occupational exposures for both parents, and found none survived Bonferroni correction. Abstract just reported the one result that was significant A systematic review of other substances (e.g. pesticides) would not find the null results from this study when screening Abstracts

Academy of Medical Sciences, 2015 Report on Reproducibility and Reliability of Biomedical Research Need to change incentives Data dredging Omitting null results Weak experimental design Underspecified methods Errors (e.g. faulty equipment) Underpowered studies Need better training in methods 36

Need to change incentives What s missing? How humans think & reason Data dredging Omitting null results Find ways to counteract cognitive biases Weak experimental design Underspecified methods Errors (e.g. faulty equipment) Underpowered studies Need better training in methods 37

Thank you for listening! Longer written version: https://psyarxiv.com/hnbex/ Other slideshows: https://www.slideshare.net/deevybishop Blogposts: http://deevybee.blogspot.com/2012/11/ bishopblog-catalogue-updated-24th-nov.html Professor Dorothy Bishop Department of Experimental Psychology, Anna Watts Building, Woodstock Road, Oxford, OX2 6GG. @deevybee 38

Cognitive Biases in Sustaining Bad Science

Download Presentation

Presentation Transcript

Related

More Related Content