Evolving Trends in Social Science Data Usage

Changing Patterns of Social Science

Data Usage

Patrick Sturgis

Context

•

Rapid increase in new types of data for social

science research:

–

Social media

–

Online surveys

–

Administrative data

–

Mobile digital devices

–

Textual archives

–

Transactional data (Uber, bike shares, Airbnb)

‘the coming crisis of empirical sociology’

•

“

the sample survey is not a tool that stands ‘outside

history’. Its glory years, we contend, are in the past

”

•

•

“

It is unlikely, we suggest, that in the future the

sample survey will be a particularly important

research tool, and those sociologists who stake the

expertise of their discipline to this method might want

to reflect on whether this might leave them exposed

to marginalization or even redundancy

.” (Savage &

Burrows, 2007)

Motivation

1)

What kinds of data do social scientists use?

2)

Patterns across disciplines & over time?

3)

Decline of surveys & increase in ‘big data’?

4)

(transparency and quality of methods)

Survey research in crisis?

Low and declining response rates

•

Face-to-face surveys now routinely struggle to reach

50%response rates

•

RDD even worse, in the US routinely < 10%

(increasing mobile-only+do not call legislation)

•

Survey sponsors ask ‘what are we getting for our

money?’

•

Is a low response rate survey better than a well

designed quota?

Increasing costs

•

Per achieved interview costs are high and increasing

•

Simon Jackman estimates $2000 per complete

interview in 2012 American National Election Study

•

My estimate= ~£180 per achieved for PAF sample,

45 min CAPI, n=~1500, RR=~50%

•

Compare ~£5 for opt-in panels

Cost drivers

•

Average number of calls increasing

•

More refusal conversion

•

More incentives (UKHLS, £30)

•

30%-40% of fieldwork costs can be deployed on

the 20% ‘hardest to get’ respondents

US Survey of Consumer Attitudes

1979-1996 (Curtin et al 2000)

Mean contact attempts

% refusal conversions

Response Rate = 70% (1979) -> 68% (1996)

Externalities of ‘survey pressure’

•

Poor data quality of ‘hard to get’ respondents

•

Fabrication pressure on respondents

•

Fabrication pressure on interviewers

•

Ethical research practice?

Content analysis of journal articles

(joint work with Rebekah Luff)

Content analysis of all papers: 1949-50, 1964-65, 1979-80, 1994-95

Presser (1983) and Saris & Gallhofer (2007)

Metzler et al (2016)

•

Online survey of Sage social science ‘contacts’

•

9412 respondents

•

33% reported having undertaken big data research

•

But response rate < 2%

•

Self-definition of ‘big data’

Percentages of articles using survey data by discipline and year

*Presser included studies performed by organisations for official statistics (statistical bureaus) under the

category ‘surveys’. Saris and Gallhofer repeated this method but also used their own classification-

these results are shown in last column in italics

Findings of Presser, Saris & Gallhofer

Updating the Analysis: 2014-15

•

1453 research papers

•

7 coders, papers randomly assigned to coders

•

24 data-information codes:

–

Theoretical, review, quant/qual/mixed,

primary/secondary, survey/administrative, big data,

experimental, observation, interview, textual, visual,

social media

Inter-rater reliability

•

8% of 1453 papers were ‘flagged’ – that is

coders were unsure of some aspect of coding

–

wide variation in papers flagged by coders

•

Coder reliability (based on random subset of

papers coded by all coders):

–

Average pairwise agreement = 87%

–

Coder average agreement range = 85 - 89%

–

Variation in reliability for code types:

Survey/administrative  = 76% vs Qual codes = 94%

Empirical v Theory/review

papers

by discipline 2014/15

Quant/Qual/Mixed by Discipline 2014/15

N=1251

Mainly Quantitative Data by Discipline 2014/15

N=1251     *Exclusively quantitative

Mainly Qualitative Data by Discipline 2014/15

N=1251       *Exclusively qualitative

Surveys 94/95 > 2014/15

Experiments 94/95 > 2014/15

Observation 94/95 > 2014/15

Text analysis 94/95 > 2014/15

Transparency and Quality of Methods reporting

•

Many of Presser’s initial criticisms still stand:

–

basic reporting is frequently absent or unclear

–

Inter-rater reliability and time taken to code papers

shows how challenging this task could be

–

A third of papers using surveys lacked some basic

information e.g. sampling method

–

Some journals have essential details in online

appendices or refer to other documents/articles (do

reviewers look at these?)

Next steps

•

Use the human coding data as training sample

for machine learning

•

Automated sampling and retrieval of online

journal articles

•

Apply Natural Language Processing to code

articles for methodological content

Reports of the death of surveys

greatly exaggerated?

Frequency of GB Polls 1940-2015

N of election polls

1945-2010 = 3,500

N of election polls

2010-2015 = 1,942

Global spend on online market research

Chart: Mario Callegaro, source Inside Research

Survey Futures

•

Lower cost of online surveys mean we are likely

to see

more not fewer

surveys in future

•

Population inference still key to social science

•

Big data failing to live up to hype for social

science applications

Survey Futures

•

Shorter questionnaires administered at more

frequent intervals

•

Device agnostic questionnaires

•

Data linkage & ‘passive’ data collection

Example: Wellcome Trust Science

Education Tracker (SET)

Science Education Tracker waves 1 & 2

•

Conducted as part of survey of adults

•

Stratified, multi-stage PAF, CAPI

•

Interview all children aged 14-18 years in

sampled households

•

+ additional screener on adjacent houses

•

Achieved sample ~450

•

Response rate ~ 50%

Science Education Tracker wave 3

•

Sample drawn from National Pupil Database

•

Invitation with login details to named individual

sent by post, short online interview (25 mins)

•

£10 conditional incentive

•

4000 achieved interviews, response rate 50%

•

25% of interviews completed on mobile devices

Concluding Remarks

•

Evidence of changing data use in content of

social science journals

•

Big differences by discipline

•

Growth in ‘bid data’ including admin & text

•

Big increase in experiments

•

But no evidence of decline in survey research

•

Reasons to be cheerful about the future

Slide Note

Embed Share

Download

Rapid changes in social science research have led to a shift in data sources, with emerging forms like social media data and mobile device information gaining prominence over traditional surveys. This shift raises questions about the future relevance of traditional survey methodologies and challenges researchers to adapt to new data collection methods to enhance transparency and methodological quality.

mnoe Follow

Uploaded on Oct 09, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Changing Patterns of Social Science Data Usage Patrick Sturgis

Context Rapid increase in new types of data for social science research: Social media Online surveys Administrative data Mobile digital devices Textual archives Transactional data (Uber, bike shares, Airbnb)

the coming crisis of empirical sociology the sample survey is not a tool that stands outside history . Its glory years, we contend, are in the past It is unlikely, we suggest, that in the future the sample survey will be a particularly important research tool, and those sociologists who stake the expertise of their discipline to this method might want to reflect on whether this might leave them exposed to marginalization or even redundancy. (Savage & Burrows, 2007)

Motivation 1) What kinds of data do social scientists use? 2) Patterns across disciplines & over time? 3) Decline of surveys & increase in big data ? 4) (transparency and quality of methods)

Survey research in crisis?

Low and declining response rates Face-to-face surveys now routinely struggle to reach 50%response rates RDD even worse, in the US routinely < 10% (increasing mobile-only+do not call legislation) Survey sponsors ask what are we getting for our money? Is a low response rate survey better than a well designed quota?

Increasing costs Per achieved interview costs are high and increasing Simon Jackman estimates $2000 per complete interview in 2012 American National Election Study My estimate= ~ 180 per achieved for PAF sample, 45 min CAPI, n=~1500, RR=~50% Compare ~ 5 for opt-in panels

Cost drivers Average number of calls increasing More refusal conversion More incentives (UKHLS, 30) 30%-40% of fieldwork costs can be deployed on the 20% hardest to get respondents

US Survey of Consumer Attitudes 1979-1996 (Curtin et al 2000) Response Rate = 70% (1979) -> 68% (1996) Mean contact attempts % refusal conversions

Externalities of survey pressure Poor data quality of hard to get respondents Fabrication pressure on respondents Fabrication pressure on interviewers Ethical research practice?

Content analysis of journal articles (joint work with Rebekah Luff)

Presser (1983) and Saris & Gallhofer (2007) Content analysis of all papers: 1949-50, 1964-65, 1979-80, 1994-95 Field Economics Journal American Economic Review Journal of Political Economy Review of Economics and Statistics American Sociological Review American Journal of Sociology Social Forces American Journal of Political Science American Political Science Review Journal of Politics Journal of Personality and Social Psychology Public Opinion Quarterly Sociology Political Sciences Social Psychology Public Opinion Research

Metzler et al (2016) Online survey of Sage social science contacts 9412 respondents 33% reported having undertaken big data research But response rate < 2% Self-definition of big data

Findings of Presser, Saris & Gallhofer Percentages of articles using survey data by discipline and year Presser Saris & Gallhofer* Years 1949-50 1964-65 1979-80 1994-95 Sociology 24% (282) 3% (114) 6% (141) 2% (59) 43% (86) 54% (259) 19% (160) 33% (155) 15% (233) 56% (61) 56% (285) 35% (203) 29% (317) 21% (377) 91% (53) 70% (287) 42% (303) 42% (461) 50% (347) 90% (43) 47% Political Science Economics 27% 20% Social Psychology Public Opinion *Presser included studies performed by organisations for official statistics (statistical bureaus) under the category surveys . Saris and Gallhofer repeated this method but also used their own classification- these results are shown in last column in italics. 49% 90%

Updating the Analysis: 2014-15 1453 research papers 7 coders, papers randomly assigned to coders 24 data-information codes: Theoretical, review, quant/qual/mixed, primary/secondary, survey/administrative, big data, experimental, observation, interview, textual, visual, social media

Inter-rater reliability 8% of 1453 papers were flagged that is coders were unsure of some aspect of coding wide variation in papers flagged by coders Coder reliability (based on random subset of papers coded by all coders): Average pairwise agreement = 87% Coder average agreement range = 85 - 89% Variation in reliability for code types: Survey/administrative = 76% vs Qual codes = 94%

Empirical v Theory/review papers by discipline 2014/15

Quant/Qual/Mixed by Discipline 2014/15 Field Economics Sociology Political Sciences Social Psychology Public Opinion TOTAL Quantitative 98 80 87 Qualitative 0.3 11 Mixed 1 10 8 5 72 11 18 97 87 3 6 0 8 N=1251

Mainly Quantitative Data by Discipline 2014/15 Field Economics Sociology Political Sciences Social Psychology Public Opinion TOTAL Survey/Poll Administrative Census Digital/Big data* Experimental 31 52 41 73 42 58 19 17 9 3 4 4 14 5 17 69 5 0 1 72 89 17 3 5 33 48 47 12 3 24 N=1251 *Exclusively quantitative

Mainly Qualitative Data by Discipline 2014/15 Field Observational Interview/ focus grp 0.5 15 4 Textual Visual Social media/ online* 0 2 1 Economics Sociology Political Sciences Social Psychology Public Opinion TOTAL N=1251 *Exclusively qualitative 0.5 12 2 2 11 12 0.3 2 1 8 5 24 14 2 0 1 4 1 0 5 5 10 3 1

Surveys 94/95 > 2014/15 100 95 89 90 80 69 70 60 60 52 49 1994-1995 50 2014-2015 41 39 40 31 29 30 20 10 0 Economics Sociology Political Sciences Social Psychology Public Opinion

Experiments 94/95 > 2014/15 80 72 70 60 50 46 1994-1995 40 2014-2015 33 30 20 17 14 10 6 5 5 5 2 0 Economics Sociology Political Sciences Social Psychology Public Opinion

Observation 94/95 > 2014/15 35 32 30 25 20 1994-1995 2014-2015 15 12 10 8 5 4 3 2 0.6 0.5 0 0 0 Economics Sociology Political Sciences Social Psychology Public Opinion

Text analysis 94/95 > 2014/15 30 25 24 20 1994-1995 15 2014-2015 12 11 10 7 6 5 5 4 2 0.6 0 0 Economics Sociology Political Sciences Social Psychology Public Opinion

Transparency and Quality of Methods reporting Many of Presser s initial criticisms still stand: basic reporting is frequently absent or unclear Inter-rater reliability and time taken to code papers shows how challenging this task could be A third of papers using surveys lacked some basic information e.g. sampling method Some journals have essential details in online appendices or refer to other documents/articles (do reviewers look at these?)

Next steps Use the human coding data as training sample for machine learning Automated sampling and retrieval of online journal articles Apply Natural Language Processing to code articles for methodological content

Reports of the death of surveys greatly exaggerated?

Frequency of GB Polls 1940-2015 150 N of election polls 1945-2010 = 3,500 Number of polls, by quarter 100 N of election polls 2010-2015 = 1,942 50 0 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 28

Global spend on online market research Chart: Mario Callegaro, source Inside Research

Survey Futures Lower cost of online surveys mean we are likely to see more not fewer surveys in future Population inference still key to social science Big data failing to live up to hype for social science applications

Survey Futures Shorter questionnaires administered at more frequent intervals Device agnostic questionnaires Data linkage & passive data collection

Example: Wellcome Trust Science Education Tracker (SET)

Science Education Tracker waves 1 & 2 Conducted as part of survey of adults Stratified, multi-stage PAF, CAPI Interview all children aged 14-18 years in sampled households + additional screener on adjacent houses Achieved sample ~450 Response rate ~ 50%

Science Education Tracker wave 3 Sample drawn from National Pupil Database Invitation with login details to named individual sent by post, short online interview (25 mins) 10 conditional incentive 4000 achieved interviews, response rate 50% 25% of interviews completed on mobile devices

Concluding Remarks Evidence of changing data use in content of social science journals Big differences by discipline Growth in bid data including admin & text Big increase in experiments But no evidence of decline in survey research Reasons to be cheerful about the future

Evolving Trends in Social Science Data Usage

Download Presentation

Presentation Transcript

Related

More Related Content