Understanding Quantitative Research in Social Studies: A Comprehensive Overview

 
Methods in Social Research – Quantitative research
Ph.D. Programme in Global studies
Università degli studi di Urbino Carlo Bo
 
Tim Goedemé, PhD
 
Lecture 1 – 24/11/2020
 
Overview of the course
 
Main aims
Basic understanding of strengths and pitfalls of various types of quantitative
research
Being able to identify and reflect upon quality of social indicators and social
research
To be familiar with survey research and the total survey error paradigm
Being able to critically reflect upon the identification of causality
Having an understanding of some of often used quantitative research
techniques in social policy research
Being aware of key points of attention when setting up your own
quantitative research project
 
Overview of the course
 
1.
Introduction to quantitative research and social indicators
2.
Survey data and total survey error, including sampling variance
3.
Causality
4.
Quantitative research techniques to identify drivers
5.
Setting up your own research project
Perspective of (social) policy, poverty and inequality
This is an introduction to give you some handles to understand quantitative
research, better grasp the main issues and points of attention, and give
some direction for your own research, not a statistics course
 
Introduction to quantitative research
 
 
1.
Quantitative research: concepts and definitions
 
2.
Summarising quantitative data
 
3.
Social indicators: an introduction
 
Main questions
 
What is quantitative research?
What are key metrics, tables and graphs to summarize quantitative
data?
What are the most important characteristics of good indicators?
How can one define comparability?
 
 
Part 1: Quantitative research
 
 
Quantitative research
 
What is qualitative research?
 
Quantitative research:
Quantitative research is the process of collecting and analyzing numerical
data. It can be used to find patterns and averages, make predictions, test
causal relationships, and generalize results to wider populations.
 
Quantitative research is the opposite of qualitative research, which involves
collecting and analyzing non-numerical data (e.g. text, video, or audio).”
(Source: 
https://www.scribbr.com/methodology/quantitative-research/
)
 
<<Discussion>>
 
Quantitative research
 
“Quantitative research is empirical research where the data are in the form of
numbers”
“Qualitative research is empirical research where the data are not in the form
of numbers” (Punch, 2014, p. 3)
 
However, lots of data in quantitative research are (initially) not in the form of
numbers: gender, occupational status, economic sector of activity.
 
And sometimes, in qualitative studies, numerical data are very important
(e.g. dynamics of household debt, reference budgets research on how much
people need to participate in society)
 
Mixed-methods designs, number of observations, data generation process
and degree of ‘generalisability’
 
Quantitative research
 
Types:
 
Descriptive
 
Correlational
 
Causal
 
Methodological
 
Quantitative research
 
Basis: start from data and try to turn that into a number that can be
meaningfully interpreted
 
Key issue: replicability
if others would follow the same procedure, they should (be able to)
arrive at the same result
 
So the question is:
are the data of sufficient quality?
is the procedure appropriate and adequate?
do the authors interpret the result correctly?
 
Quantitatve research
 
Basic ingredients: variables and indicators
 
 
Quantitative database: variables and observations
Quantitative research
 
Variables (columns)
 
Respondents / units of
observation (rows)
 
A record
 
An observation / data point (cell)
Structure of a database
 
Quantitative research
 
Types of units of observation
Social entities: Individuals, households, municipalities, companies, countries
Social phenomena: Transactions, court cases, crimes, purchases
Other types: durables, animals, …
 
Types of variables
Categorical:
Dichotomous (only two values)
Ordinal (logical order, but no clear (numerical) distance between categories)
Nominal (no logical order)
Numerical: ordinal, continuous (interval vs. ratio); discrete variables (only
integers) – fully continuous variables
Refers to measurement and how it is recorded in the database, not how it is in
reality (e.g. weight in reality, versus recorded in varying classes of 5-10kg)
 
Cases in between (read Heeringa et al., 2010, section 5.2.3, p. 119-120)
 
Quantitative research
 
Structure of a database
 
Quantitative research
 
Variables vs. indicators
 
Indicator
a summary statistic which tries to measure a (social) phenomenon or a (past, current or future)
state of affairs
 
Can be based on one or a combination of variables
 
Can be based on a combination of (many) indicators
 
Does not necessarily be based on survey data (e.g. the minimum level of social
benefits can be derived from a database with programmed legislation)
 
Examples: Average or median income, GDP/capita, CPI, GINI coefficient, Human
Development Index (HDI)
 
=> a database can be a collection of (e.g. country-level) indicators, which
themselves are generated on the basis of other databases
 
Quantitative research
 
Types of analysis:
Univariate (distribution of one variable)
Bivariate (joint distribution of two variables, correlation)
Multivariate (joint distribution of more than two variables,
correlation)
 
 
Quantitative research
 
Types of variables (bis)
 
Dependent variable (response variable)
 
Independent variable (treatment variable)
 
Control variables
 
Quantitative research
 
Types of databases
 
Cross-sectional
One moment in time
 
Longitudinal
Repeated cross-sections (multiple moments, different units of observation)
 
Panel data (multiple moments, same units of observation)
 
Survey data, Administrative data (register data)
 
Quantitative research
 
Different types of data:
Population data
=> information on all elements of the target population
=> Some elements of ‘total survey error paradigm’ still relevant (missing data,
coverage errors, etc.)
 
Information on a sample (i.e. a selection of the population)
Non-random samples => cannot generalise with confidence to target
population, no good indicators of statistical reliability
 
Random samples => can generalise with more confidence, with indicators of
statistical reliability (including confidence intervals, but also non-response
rates
 
 
Quantitative research
 
Non-random samples:
Convenience samples, self-selected samples, purposefully selected samples (e.g.
quota samples)
No direct theoretical support to generalise findings to the population
 
Random samples:
Human influence (both known and unknown) is ‘removed’ from the selection
process
All elements in the population have a non-zero probability of selection
For all elements in the sample, this probability of selection is known
Simple random sampling: no stratification, no clustering, equal probability of
selection for all population members
Samples can be randomly wrong => require estimate of reliability!
 
See also: Groves et al., 2009, p. 97ff
 
Quantitative research
 
Strengths:
(testable) potential for high degree of generalisability
Replicability
Support or reject quantitative and causal claims
Helps to simplify to an understandable degree complex phenomena and changes
 
Weaknesses:
Risk of over-simplification (‘superficial’)
Sometimes hard to identify and test causal mechanisms
Limited possibilities to take variables and considerations on board that were not
thought of in the design phase
Over-confidence & misrepresentation
 
Some of these also apply to qualitative research
 
Quantitative research
 
Essential for:
 
Knowledge of incidence and distribution of phenomena /
characteristics, correlations in a target population
 
‘Proving’ causal relationships in specific and broader populations
 
But quantitative research does not automatically lead to these things
Requires careful sample selection, data treatment, analysis and
interpretation
 
Part 2: Summarising data
 
Databases are too big and not very telling to publish / present at such
 
We need meaningful summary measures that tell us what the data
look like
 
They are estimated from the data. Very often an additional step is
required to estimate what the estimate value tells about the target
population (see afternoon)
 
Two ways of summarising data: tables (i.e. numbers) and graphs
 
Summarising data
 
The type of appropriate metric depends on the type of variable
E.g. more limited possibilities with nominal variables vs. continuous variables
 
Hierarchy: nominal – ordinal – interval – ratio
 
As continuous variables can be made ordinal, what is applicable to
ordinal is also applicable to continuous (with loss of information)
 
Usually not the other way around
Same holds for nominal vs. ordinal
Exception: dichotomous (‘dummy’) variables => in analysis often treated as if
they are continuous
 
Summarising data
 
1.
‘Typical values’ (univariate analysis)
 
Totals (e.g. how many unemployed are there in Italy?)
 
Mode (most common value of a variable)
 
Average
 
 
 
Median (the observation in the middle of the distribution, when units of
observation are ranked from lowest to highest; in case of even numbers:
arithmetic average of two observations in the middle of the distribution)
 
Summarising data
 
Illustration geometric average
 
Summarising data
 
A simple line graph
 
X-axis (independent)
 
Y-axis (dependent)
 
Origin, includes zero in case
of dependent (in this case)
 
Title, which also indicates
unit of measurement
 
Gridlines
help to
read the
graph
 
Axis
titles
are
often
very
helpful
to read
the
graph
 
Source: my imagination
 
Source: essential
 
Summarising data
 
Summarising data
 
2. Summarising the distribution
 
Relative and cumulative frequencies
 
Proportion = relative frequency = number / total number
Percentage = relative frequency x 100
Percentage point difference (p.p.) = Percentage(A) – Percentage(B)
Percentage change = 100 x Percentage(A) / Percentage(B)
 
Quantiles = rank from low to high, then divide in groups of equal numbers of
observations, quantiles are % cut-of points.
Median = 50% cut-off
Percentiles = Cut-offs when subdividing in 100 groups
Deciles = Cut-offs when subdividing in 10 groups
Quintiles = 5 groups
Quartiles = 4 groups
 
 
Summarising data
 
The frequency distribution of equivalised disposable income in Italy in
2017
 
Source: EU-SILC 2018 UDB, own computations
 
Summarising data
 
Number of children per household in Italy, EU-SILC 2018
 
Note: children defined as those aged below 18 years, private households only.
Source: EU-SILC 2018 UDB, own computations.
 
What do these numbers mean?
 
Summarising data
 
Source: 
https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51
 
Summarising data
Summarising data
 
3. Summarising the distribution: dispersion
 
Absolute measures:
Maximum – minimum
Interquartile range = p75-p25
Standard deviation (=s)
Variance (=s²)
 
Relative measures:
Decile ratio (D9/D1)
Coefficient of variation ( = s / mean)
GINI coefficient and many other indicators of inequality
 
Standard deviation of the
mean (in a sample)
Summarising data
 
4. Measures of association between variables
 
Risk ratio (relative risk) = relative frequency(A) / relative frequency(B)
 
Pearson’s correlation coefficient
    (-1  –  0  –  +1), positive vs. negative
 
Spearman’s rank-order coefficient
 = Pearson’s correlation of the ranks
 
Multiple variables: regression coefficients
 
Regression is a technique to analyse how one or more variables
(simultaneously) correlate with a dependent variable
 
Differences in mean values / medians in Y by groups of X
 
Summarising data
 
Cross-tabulation of health status by immigration status for persons
between 18 and 65 years old in Italy, EU-SILC 2018
 
Note: only persons living in private households
Source: EU-SILC 2018 UDB, own computations
 
Risk ratio = 10.9 / 12.8
 
= 0.86
 
Immigrants are 14% less
likely to have health
problems
 
Summarising data
 
https://en.wikipedia.org/wiki/Correlation_and_dependence#/media/File:Pearson_Correlation_Coefficient_and_associated_sc
atterplots.png
 
Summarising data
 
https://en.wikipedia.org/wiki/Correlation_and_dependence
 
Summarising data
 
Source: 
https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
 (last accessed 23/11/2020)
 
Scatterplots
 
Summarising data
 
Source: 
https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
 (last accessed 23/11/2020)
 
Summarising data
 
Linear regression
 
Source: 
https://sphweb.bumc.bu.edu/otlt/MPH-Modules/PH717-QuantCore/PH717-Module9-Correlation-Regression/PH717-
Module9-Correlation-Regression7.html
 (last accessed 23/11/2020)
 
Summarising data
Summarising data
 
What would be an appropriate way to summarize each of the variables? (typical values,
distribution, dispersion)
 
What is the proportion of males? What is the mode of professional status? What is the
average wage? What is the median of wage?
 
Would the median or the average be the best way to summarise wage?
 
How would you find out whether there is an association between age and wage; age and
agreement; and gender and agreement?
 
Summarising data
 
There are many more relevant metrics, this was really scratching the
surface
 
By now also sizeable literature on best way to graphical present data
(e.g. do not use pseudo 3D graphs in Excell)
 
Many resources online.
 
My favourite handbook (somewhat more advanced, but takes better
account of data generation process):
 
Heeringa, S. G., West, B. T. and Berglund, P. A. (2010), Applied
 
Survey Data Analysis, Boca Raton: Chapman & Hall/CRC, 467p.
 
 
 
Part 3: (Social) indicators
 
 
 
 
 
An indicator = a summary statistic which tries to measure a
phenomenon or a (past, current or future) state of affairs
 
 
Indicators
 
Concept – definition – metric – indicator
 
Concept: description of the phenomenon in relation to related phenomena
 
Definition: exact description which allows one to (theoretically) identify the
phenomenon of interest at the exclusion of others
 
Metric: Typically a mathematical formula which expresses how a single or
multiple variables will be used and combined to measure the phenomenon
 
Indicator: An implementation of the metric with observed variables / real data
(i.e. operationalisation)
 
 
*Multiple concepts are sometimes measured with the same indicator, often
multiple indicators are necessary / possible to fully capture a single concept
 
Indicators
 
Quality criteria of individual indicators (Atkinson et al., 2002):
 
Validity:
an indicator should identify the essence of the problem and have a clear and
accepted normative interpretation  (Face validity, transparency & acceptability)
Internal vs. external validity
=> should always be evaluated in function of the objective of the exercise!
 
Reliability:
an indicator should be robust and statistically validated
 
Responsiveness:
an indicator should be responsive to effective policy interventions but not
subject to manipulation
 
Quality criteria
 
 
Comparability (Goedemé et al., 2015)
Place & time:
an indicator should be measurable in a sufficiently comparable way across member states
Procedural comparability:
the same procedures are implemented for measuring a phenomenon or characteristic at
different occasions – different times or different places
Substantive comparability (i.e. functional equivalence):
the same phenomenon is captured similarly in different (social) contexts
Operational feasibility, timeliness and potential for revision
 
Quality criteria
 
For the portfolio of indicators (Atkinson et al., 2002):
Balance across different dimensions (and be comprehensive but
selective rather than exhaustive)
Mutual consistency of indicators & proportionate weight of indicators
Transparency and accessibility
 
Some points to remember
 
 
Quantitative data are often helpful, sometimes essential
 
They have to be treated and interpreted carefully
 
A definition is not an indicator, multiple indicators are often necessary to
measure a concept
 
Validity, reliability and comparability are key quality characterstics of
indicators
 
 
References
 
Atkinson, A. B., Cantillon, B., Marlier, E., and Nolan, B. (2002), Social
Indicators: the EU and Social Inclusion, Oxford: Oxford University Press,
240p. Chapter 2
 
Heeringa, S. G., West, B. T. and Berglund, P. A. (2010), 
Applied Survey Data
Analysis
, Boca Raton: Chapman & Hall/CRC, 467p.
 
Groves, R. M., F. J. J. Fowler, M. P. Couper, et al. (2009), Survey Methodology
(Second edition), John Wiley & Sons, New Jersey.
 
Punch, K. F. (2014) 
Introduction to social research
, London: Sage.
 
 
 
 
 
Slide Note

Initially get to know students and the extent to which they are familiar with quantitative research

Embed
Share

An in-depth exploration of quantitative research methods in social studies, covering concepts, data analysis, survey techniques, causality identification, and setting up research projects. Emphasizes the importance of grasping strengths, pitfalls, and key points in quantitative research and understanding social indicators. Designed for gaining foundational knowledge and direction in research projects.


Uploaded on Aug 04, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Methods in Social Research Quantitative research Ph.D. Programme in Global studies Universit degli studi di Urbino Carlo Bo Tim Goedem , PhD tim.goedeme@spi.ox.ac.uk Lecture 1 24/11/2020

  2. Overview of the course Main aims Basic understanding of strengths and pitfalls of various types of quantitative research Being able to identify and reflect upon quality of social indicators and social research To be familiar with survey research and the total survey error paradigm Being able to critically reflect upon the identification of causality Having an understanding of some of often used quantitative research techniques in social policy research Being aware of key points of attention when setting up your own quantitative research project

  3. Overview of the course 1. Introduction to quantitative research and social indicators 2. Survey data and total survey error, including sampling variance 3. Causality 4. Quantitative research techniques to identify drivers 5. Setting up your own research project Perspective of (social) policy, poverty and inequality This is an introduction to give you some handles to understand quantitative research, better grasp the main issues and points of attention, and give some direction for your own research, not a statistics course

  4. Introduction to quantitative research 1. Quantitative research: concepts and definitions 2. Summarising quantitative data 3. Social indicators: an introduction

  5. Main questions What is quantitative research? What are key metrics, tables and graphs to summarize quantitative data? What are the most important characteristics of good indicators? How can one define comparability?

  6. Part 1: Quantitative research

  7. Quantitative research What is qualitative research? Quantitative research: Quantitative research is the process of collecting and analyzing numerical data. It can be used to find patterns and averages, make predictions, test causal relationships, and generalize results to wider populations. Quantitative research is the opposite of qualitative research, which involves collecting and analyzing non-numerical data (e.g. text, video, or audio). (Source: https://www.scribbr.com/methodology/quantitative-research/) <<Discussion>>

  8. Quantitative research Quantitative research is empirical research where the data are in the form of numbers Qualitative research is empirical research where the data are not in the form of numbers (Punch, 2014, p. 3) However, lots of data in quantitative research are (initially) not in the form of numbers: gender, occupational status, economic sector of activity. And sometimes, in qualitative studies, numerical data are very important (e.g. dynamics of household debt, reference budgets research on how much people need to participate in society) Mixed-methods designs, number of observations, data generation process and degree of generalisability

  9. Quantitative research Types: Descriptive Correlational Causal Methodological

  10. Quantitative research Basis: start from data and try to turn that into a number that can be meaningfully interpreted Key issue: replicability if others would follow the same procedure, they should (be able to) arrive at the same result So the question is: are the data of sufficient quality? is the procedure appropriate and adequate? do the authors interpret the result correctly?

  11. Quantitatve research Basic ingredients: variables and indicators Quantitative database: variables and observations

  12. Quantitative research Structure of a database Variables (columns) A record Respondents / units of observation (rows) An observation / data point (cell)

  13. Quantitative research Types of units of observation Social entities: Individuals, households, municipalities, companies, countries Social phenomena: Transactions, court cases, crimes, purchases Other types: durables, animals, Types of variables Categorical: Dichotomous (only two values) Ordinal (logical order, but no clear (numerical) distance between categories) Nominal (no logical order) Numerical: ordinal, continuous (interval vs. ratio); discrete variables (only integers) fully continuous variables Refers to measurement and how it is recorded in the database, not how it is in reality (e.g. weight in reality, versus recorded in varying classes of 5-10kg) Cases in between (read Heeringa et al., 2010, section 5.2.3, p. 119-120)

  14. Quantitative research Structure of a database

  15. Quantitative research Variables vs. indicators Indicator a summary statistic which tries to measure a (social) phenomenon or a (past, current or future) state of affairs Can be based on one or a combination of variables Can be based on a combination of (many) indicators Does not necessarily be based on survey data (e.g. the minimum level of social benefits can be derived from a database with programmed legislation) Examples: Average or median income, GDP/capita, CPI, GINI coefficient, Human Development Index (HDI) => a database can be a collection of (e.g. country-level) indicators, which themselves are generated on the basis of other databases

  16. Quantitative research Types of analysis: Univariate (distribution of one variable) Bivariate (joint distribution of two variables, correlation) Multivariate (joint distribution of more than two variables, correlation)

  17. Quantitative research Types of variables (bis) Dependent variable (response variable) Independent variable (treatment variable) Control variables

  18. Quantitative research Types of databases Cross-sectional One moment in time Longitudinal Repeated cross-sections (multiple moments, different units of observation) Panel data (multiple moments, same units of observation) Survey data, Administrative data (register data)

  19. Quantitative research Different types of data: Population data => information on all elements of the target population => Some elements of total survey error paradigm still relevant (missing data, coverage errors, etc.) Information on a sample (i.e. a selection of the population) Non-random samples => cannot generalise with confidence to target population, no good indicators of statistical reliability Random samples => can generalise with more confidence, with indicators of statistical reliability (including confidence intervals, but also non-response rates

  20. Quantitative research Non-random samples: Convenience samples, self-selected samples, purposefully selected samples (e.g. quota samples) No direct theoretical support to generalise findings to the population Random samples: Human influence (both known and unknown) is removed from the selection process All elements in the population have a non-zero probability of selection For all elements in the sample, this probability of selection is known Simple random sampling: no stratification, no clustering, equal probability of selection for all population members Samples can be randomly wrong => require estimate of reliability! See also: Groves et al., 2009, p. 97ff

  21. Quantitative research Strengths: (testable) potential for high degree of generalisability Replicability Support or reject quantitative and causal claims Helps to simplify to an understandable degree complex phenomena and changes Weaknesses: Risk of over-simplification ( superficial ) Sometimes hard to identify and test causal mechanisms Limited possibilities to take variables and considerations on board that were not thought of in the design phase Over-confidence & misrepresentation Some of these also apply to qualitative research

  22. Quantitative research Essential for: Knowledge of incidence and distribution of phenomena / characteristics, correlations in a target population Proving causal relationships in specific and broader populations But quantitative research does not automatically lead to these things Requires careful sample selection, data treatment, analysis and interpretation

  23. Part 2: Summarising data Databases are too big and not very telling to publish / present at such We need meaningful summary measures that tell us what the data look like They are estimated from the data. Very often an additional step is required to estimate what the estimate value tells about the target population (see afternoon) Two ways of summarising data: tables (i.e. numbers) and graphs

  24. Summarising data The type of appropriate metric depends on the type of variable E.g. more limited possibilities with nominal variables vs. continuous variables Hierarchy: nominal ordinal interval ratio As continuous variables can be made ordinal, what is applicable to ordinal is also applicable to continuous (with loss of information) Usually not the other way around Same holds for nominal vs. ordinal Exception: dichotomous ( dummy ) variables => in analysis often treated as if they are continuous

  25. Summarising data 1. Typical values (univariate analysis) Totals (e.g. how many unemployed are there in Italy?) Mode (most common value of a variable) Average Median (the observation in the middle of the distribution, when units of observation are ranked from lowest to highest; in case of even numbers: arithmetic average of two observations in the middle of the distribution)

  26. Summarising data Illustration geometric average

  27. Summarising data A simple line graph Title, which also indicates unit of measurement Y-axis (dependent) Median disposable income in imgainland between 2005 and 2020 (EUR, in prices of 2020) 3000 Axis titles are often very helpful to read the graph Median income in EUR, prices of 2020 Gridlines help to read the graph 2500 2000 1500 1000 500 0 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Year Source: my imagination Origin, includes zero in case of dependent (in this case) X-axis (independent) Source: essential

  28. Summarising data

  29. Summarising data 2. Summarising the distribution Relative and cumulative frequencies Proportion = relative frequency = number / total number Percentage = relative frequency x 100 Percentage point difference (p.p.) = Percentage(A) Percentage(B) Percentage change = 100 x Percentage(A) / Percentage(B) Quantiles = rank from low to high, then divide in groups of equal numbers of observations, quantiles are % cut-of points. Median = 50% cut-off Percentiles = Cut-offs when subdividing in 100 groups Deciles = Cut-offs when subdividing in 10 groups Quintiles = 5 groups Quartiles = 4 groups

  30. Summarising data The frequency distribution of equivalised disposable income in Italy in 2017 Source: EU-SILC 2018 UDB, own computations

  31. Summarising data Number of children per household in Italy, EU-SILC 2018 Note: children defined as those aged below 18 years, private households only. Source: EU-SILC 2018 UDB, own computations. What do these numbers mean?

  32. Summarising data Source: https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51

  33. Summarising data

  34. Summarising data 3. Summarising the distribution: dispersion Absolute measures: Maximum minimum Interquartile range = p75-p25 Standard deviation (=s) Variance (=s ) Standard deviation of the mean (in a sample) Relative measures: Decile ratio (D9/D1) Coefficient of variation ( = s / mean) GINI coefficient and many other indicators of inequality

  35. Summarising data 4. Measures of association between variables Risk ratio (relative risk) = relative frequency(A) / relative frequency(B) Pearson s correlation coefficient (-1 0 +1), positive vs. negative Spearman s rank-order coefficient = Pearson s correlation of the ranks Multiple variables: regression coefficients Regression is a technique to analyse how one or more variables (simultaneously) correlate with a dependent variable Differences in mean values / medians in Y by groups of X

  36. Summarising data Cross-tabulation of health status by immigration status for persons between 18 and 65 years old in Italy, EU-SILC 2018 Risk ratio = 10.9 / 12.8 = 0.86 Immigrants are 14% less likely to have health problems Note: only persons living in private households Source: EU-SILC 2018 UDB, own computations

  37. Summarising data https://en.wikipedia.org/wiki/Correlation_and_dependence#/media/File:Pearson_Correlation_Coefficient_and_associated_sc atterplots.png

  38. Summarising data https://en.wikipedia.org/wiki/Correlation_and_dependence

  39. Summarising data Scatterplots Source: https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient (last accessed 23/11/2020)

  40. Summarising data Source: https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient (last accessed 23/11/2020)

  41. Summarising data Linear regression Source: https://sphweb.bumc.bu.edu/otlt/MPH-Modules/PH717-QuantCore/PH717-Module9-Correlation-Regression/PH717- Module9-Correlation-Regression7.html (last accessed 23/11/2020)

  42. Summarising data

  43. Summarising data What would be an appropriate way to summarize each of the variables? (typical values, distribution, dispersion) What is the proportion of males? What is the mode of professional status? What is the average wage? What is the median of wage? Would the median or the average be the best way to summarise wage? How would you find out whether there is an association between age and wage; age and agreement; and gender and agreement?

  44. Summarising data There are many more relevant metrics, this was really scratching the surface By now also sizeable literature on best way to graphical present data (e.g. do not use pseudo 3D graphs in Excell) Many resources online. My favourite handbook (somewhat more advanced, but takes better account of data generation process): Heeringa, S. G., West, B. T. and Berglund, P. A. (2010), Applied Survey Data Analysis, Boca Raton: Chapman & Hall/CRC, 467p.

  45. Part 3: (Social) indicators An indicator = a summary statistic which tries to measure a phenomenon or a (past, current or future) state of affairs

  46. Indicators Concept definition metric indicator Concept: description of the phenomenon in relation to related phenomena Definition: exact description which allows one to (theoretically) identify the phenomenon of interest at the exclusion of others Metric: Typically a mathematical formula which expresses how a single or multiple variables will be used and combined to measure the phenomenon Indicator: An implementation of the metric with observed variables / real data (i.e. operationalisation) *Multiple concepts are sometimes measured with the same indicator, often multiple indicators are necessary / possible to fully capture a single concept

  47. Indicators Quality criteria of individual indicators (Atkinson et al., 2002): Validity: an indicator should identify the essence of the problem and have a clear and accepted normative interpretation (Face validity, transparency & acceptability) Internal vs. external validity => should always be evaluated in function of the objective of the exercise! Reliability: an indicator should be robust and statistically validated Responsiveness: an indicator should be responsive to effective policy interventions but not subject to manipulation

  48. Quality criteria Comparability (Goedem et al., 2015) Place & time: an indicator should be measurable in a sufficiently comparable way across member states Procedural comparability: the same procedures are implemented for measuring a phenomenon or characteristic at different occasions different times or different places Substantive comparability (i.e. functional equivalence): the same phenomenon is captured similarly in different (social) contexts Operational feasibility, timeliness and potential for revision

  49. Quality criteria For the portfolio of indicators (Atkinson et al., 2002): Balance across different dimensions (and be comprehensive but selective rather than exhaustive) Mutual consistency of indicators & proportionate weight of indicators Transparency and accessibility

  50. Some points to remember Quantitative data are often helpful, sometimes essential They have to be treated and interpreted carefully A definition is not an indicator, multiple indicators are often necessary to measure a concept Validity, reliability and comparability are key quality characterstics of indicators

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#