Quantitative Data Analysis in Research

Exploring Data:
Frequencies, Central Tendency,
Dispersion and Standard Deviation
SIT094
The Collection and Analysis of Quantitative Data
Week 3
 
Luke Sloan
About Me
 
Name: Dr Luke Sloan
Office: 0.56 Glamorgan
 
To see me: 
please email first
Introduction
 
Collecting Quantitative Data
 
Levels of Measurement
 
Frequencies & Fidelity
 
Central Tendency
 
Dispersion
 
Summary
Collecting Quantitative Data I
“Research involving the collection of data in
numerical form… the defining factor is that
numbers result from the process, whether the
initial data collection produced numerical
values, or whether 
non-numerical values were
subsequently converted to numbers
 as part of
the analysis process…”
 
Source: Jupp 2006:250
Collecting Quantitative Data II
 
Operationalising of social concepts
 
Quantifying ‘fuzzy’ data into VARIABLES
 
How to measure feelings, attitudes, behaviours, beliefs and
attributes?
 
Numbers allow statistical tests
 
Statistical tests allow generalisations to made
 
Characterisation from samples to populations
Collecting Quantitative Data III
 
Capture data using instruments
 
Surveys (paper, online, telephone, in person)
 
Secondary data analysis
 
Experiments – difficult outside of the natural sciences
 
But social scientists try to emulate the natural science
model (remember 
Popper’s Falsification Principle?
)
 
But not all data is equal (some are more equal than others!)
Levels of Measurement I
 
*NOTE: Interval = no true zero point (e.g. height), Ratio = true zero point (e.g. income)
 
Source: David & Sutton (2004)
Levels of Measurement II
 
Level of measurement for certain variables is not pre-
defined:
 
AGE (in years e.g. 22, 34, 54)
AGE (pre-set bands e.g. 18-30, 31-50)
AGE (group membership e.g. mature student)
 
There is a hierarchy of data – always try to collect the
highest level possible to maximise usefulness!
 
Are you bored? (Yes/No)
On a scale of 1-10, how bored are you [where 1=‘practically in
tears of boredom’ and 10=‘riveted’]
Frequencies & Fidelity I
 
Not as interesting as it sounds – sorry!
 
Frequency tables display the number of times that a value
appears in your dataset (per variable across all cases)
 
They are always the first thing you do once your data is in
electronic form
 
Highlights data errors
 
Indicative of potential analysis
Frequencies & Fidelity II
What can we say about this table?
A simple frequency table can tell you quite a bit!
Error?
What we would
expect?
Look at %s
More than UKIP
Really? Only 1?
What’s this?
Central Tendency I
You have all done quantitative research and you all use
measures of central tendency in your normal lives – the
average, middle and most common values
Central Tendency II
MODE
the value that occurs the
most frequently in the
data
MODE = 32
Central Tendency III
What is the most frequent (MODAL) response?
The mode is useful for thinking about NOMINAL data
Central Tendency IV
NOMINAL data can be displayed using a bar chart
Central Tendency V
MEDIAN
the middle value of the ordered
sample data
When the sample size if odd,
the median is the middle
value
When the sample size if even,
the median is the midpoint
(mean) of the two middle
values
MEDIAN = 42.5
Central Tendency VI
The mode 
and median
 are useful for thinking about ORDINAL
data
What is the most frequent
(MODAL) response?
What is the middle
(MEDIAN) response?
Central Tendency VII
ORDINAL data can also be displayed using a bar chart
Central Tendency VIII
MEAN
sum of the value divided by the
number of cases
MEAN = 44.2
Central Tendency IX
The mean, 
mode and median 
are useful for thinking about
INTERVAL data
What is the average
(MEAN) age?
What is the middle
(MEDIAN) age?
What is the most
common (MODAL) age?
Central Tendency X
INTERVAL data can be displayed using a histogram
Dispersion I
Measures of central tendency are heuristics
They can hide important details in the data
MEAN = 5
 
MEDIAN = 5
MEAN = 14
 
MEDIAN = 5
Need to consider RANGE and STANDARD DEVIATION
Dispersion II
 
RANGE
 measures the difference between the
lowest and highest values
Large range may reveal outliers (dataset 2!)
Small range suggests tight grouping of data
 
STANDARD DEVIATION (SD)
 measures the
distance (deviation) of each value from the mean
Large SDs occur when data points are a long way from
the mean (wide range of different values)
Small SDs occur when data points are close to the
mean (values do not differ very much)
Dispersion III
For example:
Summary
 
Levels of measurement determine how data can be analysed
 
Vital to understand what your data represents and into which level of
measurement it falls
 
Frequency tables help us to screen data for errors
 
Frequency tables also help us to identify the median and mode
 
Central tendency is a heuristic, but very common because of this
 
Dispersion plays a vital role in critically evaluating central tendency
 
These modes of analyses are often referred to as DESCRIPTIVE STATISTICS
or UNIVARIATE ANALYSIS (literally ‘one variable’!)
Lies, Damn Lies and Statistics?
90% of Sun readers want a cap on immigration
The average Yale graduate earns $30,000 within six months of graduating
The Green Party is not well supported as it received less than 5% of the
national vote in the 2010 General Election
House prices drop by 10% in the UK
90% of students at Cardiff University are binge drinkers
Slide Note
Embed
Share

Dive into the world of quantitative data analysis with a focus on frequencies, central tendency, dispersion, and standard deviation. Explore the collection and analysis of numerical data, levels of measurement, and methods for quantifying social concepts. Learn about the importance of capturing data using various instruments and understanding different levels of measurement in quantitative research. Gain insights into converting non-numerical values to numerical form for statistical analysis purposes. Join Dr. Luke Sloan on a journey through the fundamental aspects of quantitative data collection and analysis.

  • Quantitative Data Analysis
  • Research Methods
  • Data Collection
  • Statistical Analysis
  • Levels of Measurement

Uploaded on Nov 27, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Exploring Data: Frequencies, Central Tendency, Dispersion and Standard Deviation SIT094 The Collection and Analysis of Quantitative Data Week 3 Luke Sloan

  2. About Me Name: Dr Luke Sloan Office: 0.56 Glamorgan Email: SloanLS@cardiff.ac.uk To see me: please email first

  3. Introduction Collecting Quantitative Data Levels of Measurement Frequencies & Fidelity Central Tendency Dispersion Summary

  4. Collecting Quantitative Data I Research involving the collection of data in numerical form the defining factor is that numbers result from the process, whether the initial data collection produced numerical values, or whether non-numerical values were subsequently converted to numbers as part of the analysis process Source: Jupp 2006:250

  5. Collecting Quantitative Data II Operationalising of social concepts Quantifying fuzzy data into VARIABLES How to measure feelings, attitudes, behaviours, beliefs and attributes? Numbers allow statistical tests Statistical tests allow generalisations to made Characterisation from samples to populations

  6. Collecting Quantitative Data III Capture data using instruments Surveys (paper, online, telephone, in person) Secondary data analysis Experiments difficult outside of the natural sciences But social scientists try to emulate the natural science model (remember Popper s Falsification Principle?) But not all data is equal (some are more equal than others!)

  7. Levels of Measurement I Data Level Description Examples Nominal (categorical) Response categories cannot be placed in a specific order impossible to judge distance between categories Sex (Male/Female) Ethnicity (White/Black ) Party (Lab/Con/LD ) Ordinal (categorical) Response categories can be placed in rank order distance between categories cannot be measured mathematically Likert (Agree/Neutral/Disagree) Rank Preference (Coke/Pepsi ) Education (GCSE/A-Level ) Interval (or continuous)* Responses measured on a continuous scale with rank order uniform distance between responses allows mathematical measurement Age (in years) Income (in ) Source: David & Sutton (2004) *NOTE: Interval = no true zero point (e.g. height), Ratio = true zero point (e.g. income)

  8. Levels of Measurement II Level of measurement for certain variables is not pre- defined: AGE (in years e.g. 22, 34, 54) AGE (pre-set bands e.g. 18-30, 31-50) AGE (group membership e.g. mature student) There is a hierarchy of data always try to collect the highest level possible to maximise usefulness! Are you bored? (Yes/No) On a scale of 1-10, how bored are you [where 1= practically in tears of boredom and 10= riveted ]

  9. Frequencies & Fidelity I Not as interesting as it sounds sorry! Frequency tables display the number of times that a value appears in your dataset (per variable across all cases) They are always the first thing you do once your data is in electronic form Highlights data errors Indicative of potential analysis

  10. Frequencies & Fidelity II What can we say about this table? Error? Parties coded Frequency Percent Valid Percent Cumulative Percent Valid -9 Conservative 1 .0 .0 .0 1331 29.9 29.9 30.0 What we would expect? Labour 1103 24.8 24.8 54.8 Lib Dem 1044 23.5 23.5 78.2 Green 368 8.3 8.3 86.5 Look at %s UKIP 171 3.8 3.8 90.4 BNP 78 1.8 1.8 92.1 More than UKIP Independent 216 4.9 4.9 97.0 Others 135 3.0 3.0 100.0 What s this? Total 4447 100.0 100.0 Really? Only 1? Missing System 1 .0 Total 4448 100.0 A simple frequency table can tell you quite a bit!

  11. Central Tendency I You have all done quantitative research and you all use measures of central tendency in your normal lives the average, middle and most common values What to watch on TV with housemates Decide based on the most popular choice Most Common (MODE) How long do you cook a chicken? Cookbook says 2 hours but internet says 3 Middle (MEDIAN) Maintenance grant allowance per week Divide total grant by number of weeks at uni Average (MEAN)

  12. Central Tendency II High Date 2-Jan 3-Jan 4-Jan 5-Jan 6-Jan 7-Jan 8-Jan 9-Jan 10-Jan 11-Jan Temperature MODE 59 60 43 42 35 32 <===Mode 32 <===Mode 46 41 52 the value that occurs the most frequently in the data MODE = 32

  13. Central Tendency III The mode is useful for thinking about NOMINAL data Main reason for going to gym Cumulative Percent Frequency Percent Valid Percent Valid Relaxation Fitness Lose weight Build strength Total 9 10.0 34.4 36.7 18.9 100.0 10.0 34.4 36.7 18.9 100.0 10.0 44.4 81.1 100.0 31 33 17 90 What is the most frequent (MODAL) response?

  14. Central Tendency IV NOMINAL data can be displayed using a bar chart 40 30 Count 20 10 0 Relaxation Fitness Lose weight Build strength Main reason for going to gym

  15. Central Tendency V High Date 7-Jan 8-Jan 6-Jan 10-Jan 5-Jan 4-Jan 9-Jan 11-Jan 2-Jan 3-Jan Temperature MEDIAN 32 32 35 41 42 <===Middle values 43 <===Middle values 46 52 59 60 the middle value of the ordered sample data When the sample size if odd, the median is the middle value When the sample size if even, the median is the midpoint (mean) of the two middle values MEDIAN = 42.5

  16. Central Tendency VI The mode and median are useful for thinking about ORDINAL data There is a general lack of public knowledge about local government Cumulative Percent Frequency Percent Valid Percent Valid Strongly Agree Agree Neutral Disagree Strongly Disagree Total System 1911 2281 255 111 17 4575 71 4646 41.1 49.1 5.5 2.4 41.8 49.9 5.6 2.4 41.8 91.6 97.2 99.6 100.0 .4 .4 98.5 1.5 100.0 100.0 Missing Total What is the middle (MEDIAN) response? What is the most frequent (MODAL) response?

  17. Central Tendency VII ORDINAL data can also be displayed using a bar chart

  18. Central Tendency VIII High Date 2-Jan 3-Jan 4-Jan 5-Jan 6-Jan 7-Jan 8-Jan 9-Jan 10-Jan 11-Jan Temperature MEAN 59 60 43 42 35 32 32 46 41 52 442 sum of the value divided by the number of cases MEAN = 44.2 Sum

  19. Central Tendency IX The mean, mode and median are useful for thinking about INTERVAL data Statistics What is the average (MEAN) age? What was your age last birthday N Valid Missing Mean Median Mode 4290 158 54.74 57.00 What is the middle (MEDIAN) age? 62 What is the most common (MODAL) age?

  20. Central Tendency X INTERVAL data can be displayed using a histogram

  21. Dispersion I Measures of central tendency are heuristics They can hide important details in the data Dataset 1: 1 2 3 4 5 6 7 8 9 MEAN = 5 MEDIAN = 5 Dataset 2: 1 2 3 4 5 6 7 8 90 MEAN = 14 MEDIAN = 5 Need to consider RANGE and STANDARD DEVIATION

  22. Dispersion II RANGE measures the difference between the lowest and highest values Large range may reveal outliers (dataset 2!) Small range suggests tight grouping of data STANDARD DEVIATION (SD) measures the distance (deviation) of each value from the mean Large SDs occur when data points are a long way from the mean (wide range of different values) Small SDs occur when data points are close to the mean (values do not differ very much)

  23. Dispersion III For example: Age Age (Sample 1) (Sample 2) Descriptive Statistics 18 30 23 31 21 19 20 19 28 21 8 55 53 13 12 52 7 9 11 10 Std. N Range Minimum Maximum Mean 23.0000 Deviation 4.85341 Age Valid N (listwise) 10 10 13.00 18.00 31.00 Descriptive Statistics Std. N Range Minimum Maximum Mean 23.0000 Deviation 21.01851 Age Valid N (listwise) 10 10 48.00 7.00 55.00

  24. Summary Levels of measurement determine how data can be analysed Vital to understand what your data represents and into which level of measurement it falls Frequency tables help us to screen data for errors Frequency tables also help us to identify the median and mode Central tendency is a heuristic, but very common because of this Dispersion plays a vital role in critically evaluating central tendency These modes of analyses are often referred to as DESCRIPTIVE STATISTICS or UNIVARIATE ANALYSIS (literally one variable !)

  25. Lies, Damn Lies and Statistics? 90% of Sun readers want a cap on immigration The average Yale graduate earns $30,000 within six months of graduating The Green Party is not well supported as it received less than 5% of the national vote in the 2010 General Election House prices drop by 10% in the UK 90% of students at Cardiff University are binge drinkers

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#