Understanding Quantitative Research in Social Studies: A Comprehensive Overview
An in-depth exploration of quantitative research methods in social studies, covering concepts, data analysis, survey techniques, causality identification, and setting up research projects. Emphasizes the importance of grasping strengths, pitfalls, and key points in quantitative research and understanding social indicators. Designed for gaining foundational knowledge and direction in research projects.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Methods in Social Research Quantitative research Ph.D. Programme in Global studies Universit degli studi di Urbino Carlo Bo Tim Goedem , PhD tim.goedeme@spi.ox.ac.uk Lecture 1 24/11/2020
Overview of the course Main aims Basic understanding of strengths and pitfalls of various types of quantitative research Being able to identify and reflect upon quality of social indicators and social research To be familiar with survey research and the total survey error paradigm Being able to critically reflect upon the identification of causality Having an understanding of some of often used quantitative research techniques in social policy research Being aware of key points of attention when setting up your own quantitative research project
Overview of the course 1. Introduction to quantitative research and social indicators 2. Survey data and total survey error, including sampling variance 3. Causality 4. Quantitative research techniques to identify drivers 5. Setting up your own research project Perspective of (social) policy, poverty and inequality This is an introduction to give you some handles to understand quantitative research, better grasp the main issues and points of attention, and give some direction for your own research, not a statistics course
Introduction to quantitative research 1. Quantitative research: concepts and definitions 2. Summarising quantitative data 3. Social indicators: an introduction
Main questions What is quantitative research? What are key metrics, tables and graphs to summarize quantitative data? What are the most important characteristics of good indicators? How can one define comparability?
Quantitative research What is qualitative research? Quantitative research: Quantitative research is the process of collecting and analyzing numerical data. It can be used to find patterns and averages, make predictions, test causal relationships, and generalize results to wider populations. Quantitative research is the opposite of qualitative research, which involves collecting and analyzing non-numerical data (e.g. text, video, or audio). (Source: https://www.scribbr.com/methodology/quantitative-research/) <<Discussion>>
Quantitative research Quantitative research is empirical research where the data are in the form of numbers Qualitative research is empirical research where the data are not in the form of numbers (Punch, 2014, p. 3) However, lots of data in quantitative research are (initially) not in the form of numbers: gender, occupational status, economic sector of activity. And sometimes, in qualitative studies, numerical data are very important (e.g. dynamics of household debt, reference budgets research on how much people need to participate in society) Mixed-methods designs, number of observations, data generation process and degree of generalisability
Quantitative research Types: Descriptive Correlational Causal Methodological
Quantitative research Basis: start from data and try to turn that into a number that can be meaningfully interpreted Key issue: replicability if others would follow the same procedure, they should (be able to) arrive at the same result So the question is: are the data of sufficient quality? is the procedure appropriate and adequate? do the authors interpret the result correctly?
Quantitatve research Basic ingredients: variables and indicators Quantitative database: variables and observations
Quantitative research Structure of a database Variables (columns) A record Respondents / units of observation (rows) An observation / data point (cell)
Quantitative research Types of units of observation Social entities: Individuals, households, municipalities, companies, countries Social phenomena: Transactions, court cases, crimes, purchases Other types: durables, animals, Types of variables Categorical: Dichotomous (only two values) Ordinal (logical order, but no clear (numerical) distance between categories) Nominal (no logical order) Numerical: ordinal, continuous (interval vs. ratio); discrete variables (only integers) fully continuous variables Refers to measurement and how it is recorded in the database, not how it is in reality (e.g. weight in reality, versus recorded in varying classes of 5-10kg) Cases in between (read Heeringa et al., 2010, section 5.2.3, p. 119-120)
Quantitative research Structure of a database
Quantitative research Variables vs. indicators Indicator a summary statistic which tries to measure a (social) phenomenon or a (past, current or future) state of affairs Can be based on one or a combination of variables Can be based on a combination of (many) indicators Does not necessarily be based on survey data (e.g. the minimum level of social benefits can be derived from a database with programmed legislation) Examples: Average or median income, GDP/capita, CPI, GINI coefficient, Human Development Index (HDI) => a database can be a collection of (e.g. country-level) indicators, which themselves are generated on the basis of other databases
Quantitative research Types of analysis: Univariate (distribution of one variable) Bivariate (joint distribution of two variables, correlation) Multivariate (joint distribution of more than two variables, correlation)
Quantitative research Types of variables (bis) Dependent variable (response variable) Independent variable (treatment variable) Control variables
Quantitative research Types of databases Cross-sectional One moment in time Longitudinal Repeated cross-sections (multiple moments, different units of observation) Panel data (multiple moments, same units of observation) Survey data, Administrative data (register data)
Quantitative research Different types of data: Population data => information on all elements of the target population => Some elements of total survey error paradigm still relevant (missing data, coverage errors, etc.) Information on a sample (i.e. a selection of the population) Non-random samples => cannot generalise with confidence to target population, no good indicators of statistical reliability Random samples => can generalise with more confidence, with indicators of statistical reliability (including confidence intervals, but also non-response rates
Quantitative research Non-random samples: Convenience samples, self-selected samples, purposefully selected samples (e.g. quota samples) No direct theoretical support to generalise findings to the population Random samples: Human influence (both known and unknown) is removed from the selection process All elements in the population have a non-zero probability of selection For all elements in the sample, this probability of selection is known Simple random sampling: no stratification, no clustering, equal probability of selection for all population members Samples can be randomly wrong => require estimate of reliability! See also: Groves et al., 2009, p. 97ff
Quantitative research Strengths: (testable) potential for high degree of generalisability Replicability Support or reject quantitative and causal claims Helps to simplify to an understandable degree complex phenomena and changes Weaknesses: Risk of over-simplification ( superficial ) Sometimes hard to identify and test causal mechanisms Limited possibilities to take variables and considerations on board that were not thought of in the design phase Over-confidence & misrepresentation Some of these also apply to qualitative research
Quantitative research Essential for: Knowledge of incidence and distribution of phenomena / characteristics, correlations in a target population Proving causal relationships in specific and broader populations But quantitative research does not automatically lead to these things Requires careful sample selection, data treatment, analysis and interpretation
Part 2: Summarising data Databases are too big and not very telling to publish / present at such We need meaningful summary measures that tell us what the data look like They are estimated from the data. Very often an additional step is required to estimate what the estimate value tells about the target population (see afternoon) Two ways of summarising data: tables (i.e. numbers) and graphs
Summarising data The type of appropriate metric depends on the type of variable E.g. more limited possibilities with nominal variables vs. continuous variables Hierarchy: nominal ordinal interval ratio As continuous variables can be made ordinal, what is applicable to ordinal is also applicable to continuous (with loss of information) Usually not the other way around Same holds for nominal vs. ordinal Exception: dichotomous ( dummy ) variables => in analysis often treated as if they are continuous
Summarising data 1. Typical values (univariate analysis) Totals (e.g. how many unemployed are there in Italy?) Mode (most common value of a variable) Average Median (the observation in the middle of the distribution, when units of observation are ranked from lowest to highest; in case of even numbers: arithmetic average of two observations in the middle of the distribution)
Summarising data Illustration geometric average
Summarising data A simple line graph Title, which also indicates unit of measurement Y-axis (dependent) Median disposable income in imgainland between 2005 and 2020 (EUR, in prices of 2020) 3000 Axis titles are often very helpful to read the graph Median income in EUR, prices of 2020 Gridlines help to read the graph 2500 2000 1500 1000 500 0 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Year Source: my imagination Origin, includes zero in case of dependent (in this case) X-axis (independent) Source: essential
Summarising data 2. Summarising the distribution Relative and cumulative frequencies Proportion = relative frequency = number / total number Percentage = relative frequency x 100 Percentage point difference (p.p.) = Percentage(A) Percentage(B) Percentage change = 100 x Percentage(A) / Percentage(B) Quantiles = rank from low to high, then divide in groups of equal numbers of observations, quantiles are % cut-of points. Median = 50% cut-off Percentiles = Cut-offs when subdividing in 100 groups Deciles = Cut-offs when subdividing in 10 groups Quintiles = 5 groups Quartiles = 4 groups
Summarising data The frequency distribution of equivalised disposable income in Italy in 2017 Source: EU-SILC 2018 UDB, own computations
Summarising data Number of children per household in Italy, EU-SILC 2018 Note: children defined as those aged below 18 years, private households only. Source: EU-SILC 2018 UDB, own computations. What do these numbers mean?
Summarising data Source: https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51
Summarising data 3. Summarising the distribution: dispersion Absolute measures: Maximum minimum Interquartile range = p75-p25 Standard deviation (=s) Variance (=s ) Standard deviation of the mean (in a sample) Relative measures: Decile ratio (D9/D1) Coefficient of variation ( = s / mean) GINI coefficient and many other indicators of inequality
Summarising data 4. Measures of association between variables Risk ratio (relative risk) = relative frequency(A) / relative frequency(B) Pearson s correlation coefficient (-1 0 +1), positive vs. negative Spearman s rank-order coefficient = Pearson s correlation of the ranks Multiple variables: regression coefficients Regression is a technique to analyse how one or more variables (simultaneously) correlate with a dependent variable Differences in mean values / medians in Y by groups of X
Summarising data Cross-tabulation of health status by immigration status for persons between 18 and 65 years old in Italy, EU-SILC 2018 Risk ratio = 10.9 / 12.8 = 0.86 Immigrants are 14% less likely to have health problems Note: only persons living in private households Source: EU-SILC 2018 UDB, own computations
Summarising data https://en.wikipedia.org/wiki/Correlation_and_dependence#/media/File:Pearson_Correlation_Coefficient_and_associated_sc atterplots.png
Summarising data https://en.wikipedia.org/wiki/Correlation_and_dependence
Summarising data Scatterplots Source: https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient (last accessed 23/11/2020)
Summarising data Source: https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient (last accessed 23/11/2020)
Summarising data Linear regression Source: https://sphweb.bumc.bu.edu/otlt/MPH-Modules/PH717-QuantCore/PH717-Module9-Correlation-Regression/PH717- Module9-Correlation-Regression7.html (last accessed 23/11/2020)
Summarising data What would be an appropriate way to summarize each of the variables? (typical values, distribution, dispersion) What is the proportion of males? What is the mode of professional status? What is the average wage? What is the median of wage? Would the median or the average be the best way to summarise wage? How would you find out whether there is an association between age and wage; age and agreement; and gender and agreement?
Summarising data There are many more relevant metrics, this was really scratching the surface By now also sizeable literature on best way to graphical present data (e.g. do not use pseudo 3D graphs in Excell) Many resources online. My favourite handbook (somewhat more advanced, but takes better account of data generation process): Heeringa, S. G., West, B. T. and Berglund, P. A. (2010), Applied Survey Data Analysis, Boca Raton: Chapman & Hall/CRC, 467p.
Part 3: (Social) indicators An indicator = a summary statistic which tries to measure a phenomenon or a (past, current or future) state of affairs
Indicators Concept definition metric indicator Concept: description of the phenomenon in relation to related phenomena Definition: exact description which allows one to (theoretically) identify the phenomenon of interest at the exclusion of others Metric: Typically a mathematical formula which expresses how a single or multiple variables will be used and combined to measure the phenomenon Indicator: An implementation of the metric with observed variables / real data (i.e. operationalisation) *Multiple concepts are sometimes measured with the same indicator, often multiple indicators are necessary / possible to fully capture a single concept
Indicators Quality criteria of individual indicators (Atkinson et al., 2002): Validity: an indicator should identify the essence of the problem and have a clear and accepted normative interpretation (Face validity, transparency & acceptability) Internal vs. external validity => should always be evaluated in function of the objective of the exercise! Reliability: an indicator should be robust and statistically validated Responsiveness: an indicator should be responsive to effective policy interventions but not subject to manipulation
Quality criteria Comparability (Goedem et al., 2015) Place & time: an indicator should be measurable in a sufficiently comparable way across member states Procedural comparability: the same procedures are implemented for measuring a phenomenon or characteristic at different occasions different times or different places Substantive comparability (i.e. functional equivalence): the same phenomenon is captured similarly in different (social) contexts Operational feasibility, timeliness and potential for revision
Quality criteria For the portfolio of indicators (Atkinson et al., 2002): Balance across different dimensions (and be comprehensive but selective rather than exhaustive) Mutual consistency of indicators & proportionate weight of indicators Transparency and accessibility
Some points to remember Quantitative data are often helpful, sometimes essential They have to be treated and interpreted carefully A definition is not an indicator, multiple indicators are often necessary to measure a concept Validity, reliability and comparability are key quality characterstics of indicators