Evolving Trends in Social Science Data Usage
Rapid changes in social science research have led to a shift in data sources, with emerging forms like social media data and mobile device information gaining prominence over traditional surveys. This shift raises questions about the future relevance of traditional survey methodologies and challenges researchers to adapt to new data collection methods to enhance transparency and methodological quality.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Changing Patterns of Social Science Data Usage Patrick Sturgis
Context Rapid increase in new types of data for social science research: Social media Online surveys Administrative data Mobile digital devices Textual archives Transactional data (Uber, bike shares, Airbnb)
the coming crisis of empirical sociology the sample survey is not a tool that stands outside history . Its glory years, we contend, are in the past It is unlikely, we suggest, that in the future the sample survey will be a particularly important research tool, and those sociologists who stake the expertise of their discipline to this method might want to reflect on whether this might leave them exposed to marginalization or even redundancy. (Savage & Burrows, 2007)
Motivation 1) What kinds of data do social scientists use? 2) Patterns across disciplines & over time? 3) Decline of surveys & increase in big data ? 4) (transparency and quality of methods)
Low and declining response rates Face-to-face surveys now routinely struggle to reach 50%response rates RDD even worse, in the US routinely < 10% (increasing mobile-only+do not call legislation) Survey sponsors ask what are we getting for our money? Is a low response rate survey better than a well designed quota?
Increasing costs Per achieved interview costs are high and increasing Simon Jackman estimates $2000 per complete interview in 2012 American National Election Study My estimate= ~ 180 per achieved for PAF sample, 45 min CAPI, n=~1500, RR=~50% Compare ~ 5 for opt-in panels
Cost drivers Average number of calls increasing More refusal conversion More incentives (UKHLS, 30) 30%-40% of fieldwork costs can be deployed on the 20% hardest to get respondents
US Survey of Consumer Attitudes 1979-1996 (Curtin et al 2000) Response Rate = 70% (1979) -> 68% (1996) Mean contact attempts % refusal conversions
Externalities of survey pressure Poor data quality of hard to get respondents Fabrication pressure on respondents Fabrication pressure on interviewers Ethical research practice?
Content analysis of journal articles (joint work with Rebekah Luff)
Presser (1983) and Saris & Gallhofer (2007) Content analysis of all papers: 1949-50, 1964-65, 1979-80, 1994-95 Field Economics Journal American Economic Review Journal of Political Economy Review of Economics and Statistics American Sociological Review American Journal of Sociology Social Forces American Journal of Political Science American Political Science Review Journal of Politics Journal of Personality and Social Psychology Public Opinion Quarterly Sociology Political Sciences Social Psychology Public Opinion Research
Metzler et al (2016) Online survey of Sage social science contacts 9412 respondents 33% reported having undertaken big data research But response rate < 2% Self-definition of big data
Findings of Presser, Saris & Gallhofer Percentages of articles using survey data by discipline and year Presser Saris & Gallhofer* Years 1949-50 1964-65 1979-80 1994-95 Sociology 24% (282) 3% (114) 6% (141) 2% (59) 43% (86) 54% (259) 19% (160) 33% (155) 15% (233) 56% (61) 56% (285) 35% (203) 29% (317) 21% (377) 91% (53) 70% (287) 42% (303) 42% (461) 50% (347) 90% (43) 47% Political Science Economics 27% 20% Social Psychology Public Opinion *Presser included studies performed by organisations for official statistics (statistical bureaus) under the category surveys . Saris and Gallhofer repeated this method but also used their own classification- these results are shown in last column in italics. 49% 90%
Updating the Analysis: 2014-15 1453 research papers 7 coders, papers randomly assigned to coders 24 data-information codes: Theoretical, review, quant/qual/mixed, primary/secondary, survey/administrative, big data, experimental, observation, interview, textual, visual, social media
Inter-rater reliability 8% of 1453 papers were flagged that is coders were unsure of some aspect of coding wide variation in papers flagged by coders Coder reliability (based on random subset of papers coded by all coders): Average pairwise agreement = 87% Coder average agreement range = 85 - 89% Variation in reliability for code types: Survey/administrative = 76% vs Qual codes = 94%
Empirical v Theory/review papers by discipline 2014/15
Quant/Qual/Mixed by Discipline 2014/15 Field Economics Sociology Political Sciences Social Psychology Public Opinion TOTAL Quantitative 98 80 87 Qualitative 0.3 11 Mixed 1 10 8 5 72 11 18 97 87 3 6 0 8 N=1251
Mainly Quantitative Data by Discipline 2014/15 Field Economics Sociology Political Sciences Social Psychology Public Opinion TOTAL Survey/Poll Administrative Census Digital/Big data* Experimental 31 52 41 73 42 58 19 17 9 3 4 4 14 5 17 69 5 0 1 72 89 17 3 5 33 48 47 12 3 24 N=1251 *Exclusively quantitative
Mainly Qualitative Data by Discipline 2014/15 Field Observational Interview/ focus grp 0.5 15 4 Textual Visual Social media/ online* 0 2 1 Economics Sociology Political Sciences Social Psychology Public Opinion TOTAL N=1251 *Exclusively qualitative 0.5 12 2 2 11 12 0.3 2 1 8 5 24 14 2 0 1 4 1 0 5 5 10 3 1
Surveys 94/95 > 2014/15 100 95 89 90 80 69 70 60 60 52 49 1994-1995 50 2014-2015 41 39 40 31 29 30 20 10 0 Economics Sociology Political Sciences Social Psychology Public Opinion
Experiments 94/95 > 2014/15 80 72 70 60 50 46 1994-1995 40 2014-2015 33 30 20 17 14 10 6 5 5 5 2 0 Economics Sociology Political Sciences Social Psychology Public Opinion
Observation 94/95 > 2014/15 35 32 30 25 20 1994-1995 2014-2015 15 12 10 8 5 4 3 2 0.6 0.5 0 0 0 Economics Sociology Political Sciences Social Psychology Public Opinion
Text analysis 94/95 > 2014/15 30 25 24 20 1994-1995 15 2014-2015 12 11 10 7 6 5 5 4 2 0.6 0 0 Economics Sociology Political Sciences Social Psychology Public Opinion
Transparency and Quality of Methods reporting Many of Presser s initial criticisms still stand: basic reporting is frequently absent or unclear Inter-rater reliability and time taken to code papers shows how challenging this task could be A third of papers using surveys lacked some basic information e.g. sampling method Some journals have essential details in online appendices or refer to other documents/articles (do reviewers look at these?)
Next steps Use the human coding data as training sample for machine learning Automated sampling and retrieval of online journal articles Apply Natural Language Processing to code articles for methodological content
Reports of the death of surveys greatly exaggerated?
Frequency of GB Polls 1940-2015 150 N of election polls 1945-2010 = 3,500 Number of polls, by quarter 100 N of election polls 2010-2015 = 1,942 50 0 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 28
Global spend on online market research Chart: Mario Callegaro, source Inside Research
Survey Futures Lower cost of online surveys mean we are likely to see more not fewer surveys in future Population inference still key to social science Big data failing to live up to hype for social science applications
Survey Futures Shorter questionnaires administered at more frequent intervals Device agnostic questionnaires Data linkage & passive data collection
Example: Wellcome Trust Science Education Tracker (SET)
Science Education Tracker waves 1 & 2 Conducted as part of survey of adults Stratified, multi-stage PAF, CAPI Interview all children aged 14-18 years in sampled households + additional screener on adjacent houses Achieved sample ~450 Response rate ~ 50%
Science Education Tracker wave 3 Sample drawn from National Pupil Database Invitation with login details to named individual sent by post, short online interview (25 mins) 10 conditional incentive 4000 achieved interviews, response rate 50% 25% of interviews completed on mobile devices
Concluding Remarks Evidence of changing data use in content of social science journals Big differences by discipline Growth in bid data including admin & text Big increase in experiments But no evidence of decline in survey research Reasons to be cheerful about the future