Quantitative Data Analysis in Research

Exploring Data:

Frequencies, Central Tendency,

Dispersion and Standard Deviation

SIT094

The Collection and Analysis of Quantitative Data

Week 3

Luke Sloan

About Me

•

Name: Dr Luke Sloan

•

Office: 0.56 Glamorgan

•

Email:

SloanLS@cardiff.ac.uk

•

To see me:

please email first

Introduction

•

Collecting Quantitative Data

•

Levels of Measurement

•

Frequencies & Fidelity

•

Central Tendency

•

Dispersion

•

Summary

Collecting Quantitative Data I

“Research involving the collection of data in

numerical form… the defining factor is that

numbers result from the process, whether the

initial data collection produced numerical

values, or whether

non-numerical values were

subsequently converted to numbers

 as part of

the analysis process…”

Source: Jupp 2006:250

Collecting Quantitative Data II

•

Operationalising of social concepts

•

Quantifying ‘fuzzy’ data into VARIABLES

•

How to measure feelings, attitudes, behaviours, beliefs and

attributes?

•

Numbers allow statistical tests

•

Statistical tests allow generalisations to made

•

Characterisation from samples to populations

Collecting Quantitative Data III

•

Capture data using instruments

•

Surveys (paper, online, telephone, in person)

•

Secondary data analysis

•

Experiments – difficult outside of the natural sciences

•

But social scientists try to emulate the natural science

model (remember

Popper’s Falsification Principle?

•

But not all data is equal (some are more equal than others!)

Levels of Measurement I

*NOTE: Interval = no true zero point (e.g. height), Ratio = true zero point (e.g. income)

Source: David & Sutton (2004)

Levels of Measurement II

•

Level of measurement for certain variables is not pre-

defined:

–

AGE (in years e.g. 22, 34, 54)

–

AGE (pre-set bands e.g. 18-30, 31-50)

–

AGE (group membership e.g. mature student)

•

There is a hierarchy of data – always try to collect the

highest level possible to maximise usefulness!

–

Are you bored? (Yes/No)

–

On a scale of 1-10, how bored are you [where 1=‘practically in

tears of boredom’ and 10=‘riveted’]

Frequencies & Fidelity I

•

Not as interesting as it sounds – sorry!

•

Frequency tables display the number of times that a value

appears in your dataset (per variable across all cases)

•

They are always the first thing you do once your data is in

electronic form

•

Highlights data errors

•

Indicative of potential analysis

Frequencies & Fidelity II

What can we say about this table?

A simple frequency table can tell you quite a bit!

Error?

What we would

expect?

Look at %s

More than UKIP

Really? Only 1?

What’s this?

Central Tendency I

You have all done quantitative research and you all use

measures of central tendency in your normal lives – the

average, middle and most common values

Central Tendency II

MODE

the value that occurs the

most frequently in the

data

MODE = 32

Central Tendency III

What is the most frequent (MODAL) response?

The mode is useful for thinking about NOMINAL data

Central Tendency IV

NOMINAL data can be displayed using a bar chart

Central Tendency V

MEDIAN

the middle value of the ordered

sample data

When the sample size if odd,

the median is the middle

value

When the sample size if even,

the median is the midpoint

(mean) of the two middle

values

MEDIAN = 42.5

Central Tendency VI

The mode

and median

 are useful for thinking about ORDINAL

data

What is the most frequent

(MODAL) response?

What is the middle

(MEDIAN) response?

Central Tendency VII

ORDINAL data can also be displayed using a bar chart

Central Tendency VIII

MEAN

sum of the value divided by the

number of cases

MEAN = 44.2

Central Tendency IX

The mean,

mode and median

are useful for thinking about

INTERVAL data

What is the average

(MEAN) age?

What is the middle

(MEDIAN) age?

What is the most

common (MODAL) age?

Central Tendency X

INTERVAL data can be displayed using a histogram

Dispersion I

•

Measures of central tendency are heuristics

•

They can hide important details in the data

MEAN = 5

MEDIAN = 5

MEAN = 14

MEDIAN = 5

Need to consider RANGE and STANDARD DEVIATION

Dispersion II

•

RANGE

 measures the difference between the

lowest and highest values

–

Large range may reveal outliers (dataset 2!)

–

Small range suggests tight grouping of data

•

STANDARD DEVIATION (SD)

 measures the

distance (deviation) of each value from the mean

–

Large SDs occur when data points are a long way from

the mean (wide range of different values)

–

Small SDs occur when data points are close to the

mean (values do not differ very much)

Dispersion III

•

For example:

Summary

•

Levels of measurement determine how data can be analysed

•

Vital to understand what your data represents and into which level of

measurement it falls

•

Frequency tables help us to screen data for errors

•

Frequency tables also help us to identify the median and mode

•

Central tendency is a heuristic, but very common because of this

•

Dispersion plays a vital role in critically evaluating central tendency

•

These modes of analyses are often referred to as DESCRIPTIVE STATISTICS

or UNIVARIATE ANALYSIS (literally ‘one variable’!)

Lies, Damn Lies and Statistics?

90% of Sun readers want a cap on immigration

The average Yale graduate earns $30,000 within six months of graduating

The Green Party is not well supported as it received less than 5% of the

national vote in the 2010 General Election

House prices drop by 10% in the UK

90% of students at Cardiff University are binge drinkers

Slide Note

Embed Share

Download

Dive into the world of quantitative data analysis with a focus on frequencies, central tendency, dispersion, and standard deviation. Explore the collection and analysis of numerical data, levels of measurement, and methods for quantifying social concepts. Learn about the importance of capturing data using various instruments and understanding different levels of measurement in quantitative research. Gain insights into converting non-numerical values to numerical form for statistical analysis purposes. Join Dr. Luke Sloan on a journey through the fundamental aspects of quantitative data collection and analysis.

nade_527 Follow

Uploaded on Nov 27, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Exploring Data: Frequencies, Central Tendency, Dispersion and Standard Deviation SIT094 The Collection and Analysis of Quantitative Data Week 3 Luke Sloan

About Me Name: Dr Luke Sloan Office: 0.56 Glamorgan Email: SloanLS@cardiff.ac.uk To see me: please email first

Introduction Collecting Quantitative Data Levels of Measurement Frequencies & Fidelity Central Tendency Dispersion Summary

Collecting Quantitative Data I Research involving the collection of data in numerical form the defining factor is that numbers result from the process, whether the initial data collection produced numerical values, or whether non-numerical values were subsequently converted to numbers as part of the analysis process Source: Jupp 2006:250

Collecting Quantitative Data II Operationalising of social concepts Quantifying fuzzy data into VARIABLES How to measure feelings, attitudes, behaviours, beliefs and attributes? Numbers allow statistical tests Statistical tests allow generalisations to made Characterisation from samples to populations

Collecting Quantitative Data III Capture data using instruments Surveys (paper, online, telephone, in person) Secondary data analysis Experiments difficult outside of the natural sciences But social scientists try to emulate the natural science model (remember Popper s Falsification Principle?) But not all data is equal (some are more equal than others!)

Levels of Measurement I Data Level Description Examples Nominal (categorical) Response categories cannot be placed in a specific order impossible to judge distance between categories Sex (Male/Female) Ethnicity (White/Black ) Party (Lab/Con/LD ) Ordinal (categorical) Response categories can be placed in rank order distance between categories cannot be measured mathematically Likert (Agree/Neutral/Disagree) Rank Preference (Coke/Pepsi ) Education (GCSE/A-Level ) Interval (or continuous)* Responses measured on a continuous scale with rank order uniform distance between responses allows mathematical measurement Age (in years) Income (in ) Source: David & Sutton (2004) *NOTE: Interval = no true zero point (e.g. height), Ratio = true zero point (e.g. income)

Levels of Measurement II Level of measurement for certain variables is not pre- defined: AGE (in years e.g. 22, 34, 54) AGE (pre-set bands e.g. 18-30, 31-50) AGE (group membership e.g. mature student) There is a hierarchy of data always try to collect the highest level possible to maximise usefulness! Are you bored? (Yes/No) On a scale of 1-10, how bored are you [where 1= practically in tears of boredom and 10= riveted ]

Frequencies & Fidelity I Not as interesting as it sounds sorry! Frequency tables display the number of times that a value appears in your dataset (per variable across all cases) They are always the first thing you do once your data is in electronic form Highlights data errors Indicative of potential analysis

Frequencies & Fidelity II What can we say about this table? Error? Parties coded Frequency Percent Valid Percent Cumulative Percent Valid -9 Conservative 1 .0 .0 .0 1331 29.9 29.9 30.0 What we would expect? Labour 1103 24.8 24.8 54.8 Lib Dem 1044 23.5 23.5 78.2 Green 368 8.3 8.3 86.5 Look at %s UKIP 171 3.8 3.8 90.4 BNP 78 1.8 1.8 92.1 More than UKIP Independent 216 4.9 4.9 97.0 Others 135 3.0 3.0 100.0 What s this? Total 4447 100.0 100.0 Really? Only 1? Missing System 1 .0 Total 4448 100.0 A simple frequency table can tell you quite a bit!

Central Tendency I You have all done quantitative research and you all use measures of central tendency in your normal lives the average, middle and most common values What to watch on TV with housemates Decide based on the most popular choice Most Common (MODE) How long do you cook a chicken? Cookbook says 2 hours but internet says 3 Middle (MEDIAN) Maintenance grant allowance per week Divide total grant by number of weeks at uni Average (MEAN)

Central Tendency II High Date 2-Jan 3-Jan 4-Jan 5-Jan 6-Jan 7-Jan 8-Jan 9-Jan 10-Jan 11-Jan Temperature MODE 59 60 43 42 35 32 <===Mode 32 <===Mode 46 41 52 the value that occurs the most frequently in the data MODE = 32

Central Tendency III The mode is useful for thinking about NOMINAL data Main reason for going to gym Cumulative Percent Frequency Percent Valid Percent Valid Relaxation Fitness Lose weight Build strength Total 9 10.0 34.4 36.7 18.9 100.0 10.0 34.4 36.7 18.9 100.0 10.0 44.4 81.1 100.0 31 33 17 90 What is the most frequent (MODAL) response?

Central Tendency IV NOMINAL data can be displayed using a bar chart 40 30 Count 20 10 0 Relaxation Fitness Lose weight Build strength Main reason for going to gym

Central Tendency V High Date 7-Jan 8-Jan 6-Jan 10-Jan 5-Jan 4-Jan 9-Jan 11-Jan 2-Jan 3-Jan Temperature MEDIAN 32 32 35 41 42 <===Middle values 43 <===Middle values 46 52 59 60 the middle value of the ordered sample data When the sample size if odd, the median is the middle value When the sample size if even, the median is the midpoint (mean) of the two middle values MEDIAN = 42.5

Central Tendency VI The mode and median are useful for thinking about ORDINAL data There is a general lack of public knowledge about local government Cumulative Percent Frequency Percent Valid Percent Valid Strongly Agree Agree Neutral Disagree Strongly Disagree Total System 1911 2281 255 111 17 4575 71 4646 41.1 49.1 5.5 2.4 41.8 49.9 5.6 2.4 41.8 91.6 97.2 99.6 100.0 .4 .4 98.5 1.5 100.0 100.0 Missing Total What is the middle (MEDIAN) response? What is the most frequent (MODAL) response?

Central Tendency VII ORDINAL data can also be displayed using a bar chart

Central Tendency VIII High Date 2-Jan 3-Jan 4-Jan 5-Jan 6-Jan 7-Jan 8-Jan 9-Jan 10-Jan 11-Jan Temperature MEAN 59 60 43 42 35 32 32 46 41 52 442 sum of the value divided by the number of cases MEAN = 44.2 Sum

Central Tendency IX The mean, mode and median are useful for thinking about INTERVAL data Statistics What is the average (MEAN) age? What was your age last birthday N Valid Missing Mean Median Mode 4290 158 54.74 57.00 What is the middle (MEDIAN) age? 62 What is the most common (MODAL) age?

Central Tendency X INTERVAL data can be displayed using a histogram

Dispersion I Measures of central tendency are heuristics They can hide important details in the data Dataset 1: 1 2 3 4 5 6 7 8 9 MEAN = 5 MEDIAN = 5 Dataset 2: 1 2 3 4 5 6 7 8 90 MEAN = 14 MEDIAN = 5 Need to consider RANGE and STANDARD DEVIATION

Dispersion II RANGE measures the difference between the lowest and highest values Large range may reveal outliers (dataset 2!) Small range suggests tight grouping of data STANDARD DEVIATION (SD) measures the distance (deviation) of each value from the mean Large SDs occur when data points are a long way from the mean (wide range of different values) Small SDs occur when data points are close to the mean (values do not differ very much)

Dispersion III For example: Age Age (Sample 1) (Sample 2) Descriptive Statistics 18 30 23 31 21 19 20 19 28 21 8 55 53 13 12 52 7 9 11 10 Std. N Range Minimum Maximum Mean 23.0000 Deviation 4.85341 Age Valid N (listwise) 10 10 13.00 18.00 31.00 Descriptive Statistics Std. N Range Minimum Maximum Mean 23.0000 Deviation 21.01851 Age Valid N (listwise) 10 10 48.00 7.00 55.00

Summary Levels of measurement determine how data can be analysed Vital to understand what your data represents and into which level of measurement it falls Frequency tables help us to screen data for errors Frequency tables also help us to identify the median and mode Central tendency is a heuristic, but very common because of this Dispersion plays a vital role in critically evaluating central tendency These modes of analyses are often referred to as DESCRIPTIVE STATISTICS or UNIVARIATE ANALYSIS (literally one variable !)

Lies, Damn Lies and Statistics? 90% of Sun readers want a cap on immigration The average Yale graduate earns $30,000 within six months of graduating The Green Party is not well supported as it received less than 5% of the national vote in the 2010 General Election House prices drop by 10% in the UK 90% of students at Cardiff University are binge drinkers

Quantitative Data Analysis in Research

Download Presentation

Presentation Transcript

Related

More Related Content