Understanding Sampling in Survey Research

EMR 6500:

Survey Research

Dr. Chris L. S. Coryn

Lyssa N. Wilson

Spring 2015

Agenda

•

Elements of the sampling problem

•

Some basic concepts of statistics

•

Case Study #1

Elements of the Sampling

Problem

Technical Terms (Again)

•

An

element

is an object on which a

measurement is taken

•

population

is a collection of elements

to which an inference is made

•

sample

 is a collection of sampling

units drawn from a frame or frames

•

Sampling units

 are nonoverlapping

collections of elements from the

population that cover the entire

population

•

frame

 is a list of sampling units

How to Select the Sample: The

Design of the Survey Sample

How to Select the Sample

•

The objective of sampling is to

estimate population parameters such

as the mean, proportion, or total

•

The quantity of information is

controlled by the number of units

included in a sample and the method

used to select a sample

How to Select the Sample

•

The primary questions addressed by

sampling theory are:

–

What sampling procedure should be

used?

–

What number of sampling units should

be included in a sample?

•

The answer to both depends on how

much information one is willing to

buy

How to Select the Sample

•

If    is the population parameter of

interest and    is an estimator of

then a bound on the error of

estimation,

, should be specified

that represents the difference in

absolute value between    and

How to Select the Sample

•

A probability,        , specifies the

fraction of times in repeated samples

the the error of estimation is less

than

How to Select a Sample

•

Typically

 is set to       and,

therefore,         will be approximately

.95

•

Once a bound,

, has been specified,

along with its associated probability,

        , different sampling designs can

be compared to determine which is

most efficient for a particular

purpose

Probability Sampling

•

Statistical estimation requires

randomness in sampling designs so

that properties of statistical

estimators can be assessed

probabilistically

•

Sampling designs based on planned

randomness are probability samples

Simple Random Sampling

•

The basic probability sampling

design, simple random sampling,

consists of selecting a group of

sampling units in such a way that all

samples of size

 have the same

probability of selection

Stratified Random Sampling

•

A stratified random sample is one

obtained by separating the

population elements into discrete,

nonoverlapping groups, called strata,

and then selecting a simple random

sample from each stratum

Stratified Random Sampling

•

The principle reasons for using stratified

random sampling rather than simple random

sampling are:

1.

Stratification may produce a smaller bound on the

error of estimation than would be produced by a

simple random sample of the same size (this is

particularly true if measurements within strata

are homogenous)

2.

The cost per observation may be reduced by

stratification of the population elements into

convenient groupings

3.

Estimate of population parameters may be

desired for subgroups of the population (these

subgroups should then be identifiable strata)

Cluster Sampling

•

Cluster sampling is a less costly

alternative to simple or stratified

random sampling if the cost of

obtaining a frame that lists all

population elements is very high or if

the cost of obtaining observations

increases as the distance separating

elements increases

Cluster Sampling

•

Cluster sampling is an effective

design for obtaining a specified

amount of information under the

following conditions:

1.

A good frame listing all population

elements is not available or is very

costly to obtain, but a frame listing

clusters is easily obtained

2.

The cost of obtaining observations

increases as the distances separating

the elements increases

Cluster Sampling

•

Clusters typically consist of herds,

households, or other units of clustering

(e.g., an orange tree forms a cluster of

oranges for investigating insect infestations)

•

A farm herd contains a cluster of livestock

for estimating proportions of diseased

animals

•

Elements within a cluster are often

physically close together and hence tend to

have similar characteristics and the

measurement on one element within a

cluster may be correlated with the

measurement on another

Systematic Sampling

•

Systematic sampling involves

random selection of one element

from the first

 elements and then

selecting every

th

 element

thereafter

Systematic Sampling

•

Systematic sampling is a useful alternative

to simple random sampling for the following

reasons:

1.

Systematic sampling is easier to perform in

the field and hence is less subject to selection

errors by field-workers than are either simple

random samples or stratified random

samples, especially if a good frame is not

available

2.

Systematic sampling can provide greater

information per unit cost than simple random

sampling can provide for certain populations

with certain patterns in the arrangement of

elements

Multi-Stage Sampling

•

Sampling conducted in stages, often

taking into account the hierarchical

(nested) structure of a population

–

Primary sampling units (PSUs) are sampled

first (e.g., cities)

–

Secondary sampling units (SSUs) are

sampled next (e.g., city blocks)

–

Ultimate sampling units (actual elements)

are sampled last (e.g., households)

•

Especially useful when no frame can be

established for a single-stage sample

Multi-Stage Sampling

•

For a fixed sample size of elements,

a multi-stage sampling design is

almost always less efficient than a

simple random sample (though often

more feasible)

•

Variance estimation methods for

complex sample designs must be

used to obtain correct standard

errors

Multiple-Frame Sampling

Quota Sampling

•

A nonprobability sampling method

(although randomness is sometimes

part of the design) in which a

prespecified number of surveys is

obtained from specific subgroups of a

target population (e.g., Republicans,

Democrats)

•

Introduces unknown sampling biases

into survey estimates

Chain-Referral Sampling

•

Snowball sampling methods for sampling in

rare/hard-to-reach populations

•

One or more persons having the trait of

interest serve as seeds and identify others

•

Persons with many connections are likely to

be included, whereas isolated persons may

not be included at all

•

Information about network connections in

the sample can be used to weight sample

units (respondent-driven sampling, which is

premised on Markov-chain theory)

Recruitment Network

Equilibrium

Planning a Survey

Planning A Survey

1.

Statement of objectives

2.

Target population

3.

The frame

4.

Sample design

5.

Method of measurement

6.

Measurement instrument

7.

Selection and training of fieldworkers

8.

The pretest (pilot)

9.

Organization of fieldwork

10.

Organization of data management

11.

Data analysis

Some Basic Concepts of

Statistics

Finite Population Correction

•

Most statistical theory is premised on an

underlying infinite population

•

Sampling theory and practice is founded on

the assumption of sampling from a finite

population

•

In the general framework of finite

population sampling, sample sizes of size

are taken from a population of size

•

In the finite population case, the variance

estimate of a statistical estimator must be

adjusted due to the fact that not all data

from a finite population are observed, using

the finite population correction (fpc)

Finite Population Correction

•

For simple random samples (without

replacement) the fpc is expressed as

or

•

Where

 is the sampling fraction or

rate

Finite Population Correction

•

The fpc is, therefore, the fraction of a

finite population that is not sampled

•

Because the fpc is literally a factor in

the calculation of an estimate of

variance for an estimated finite

population parameter, the estimated

variance is reduced to zero if

Finite Population Correction

•

When

 is small relative to

, the fpc is

close to unity

•

In samples of very large populations

is very small and the fpc may be

ignored

–

Ignore if 1-

>.95

•

Although the fpc is applicable for

estimation, it often is not necessary for

many inferential uses such as statistical

significance testing (e.g., comparison

between sampled subgroups).

Estimate of Population Mean

Estimate of Population

Proportion

where

Estimate of Population Total

Case Study #1

Case Study Activity

•

In small groups, address the

following questions in relation to

Case Study #1 relying only on the

material that was discussed thus far

in the semester

1.

Has the surveyor committed any

serious error(s)?

2.

If so, what type and why? If not, why?

Slide Note

Embed Share

Download Presentation

This content covers essential concepts of survey research, statistics, and sampling methods. It delves into elements of the sampling problem, technical terms, and how to select a sample for surveys. The discussions revolve around population parameters, sampling procedures, and the control of information quantity in a sample. Key questions addressed include the selection of sampling units and the error of estimation in survey research.

flaugher_j Follow

Uploaded on Sep 13, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

EMR 6500: Survey Research Dr. Chris L. S. Coryn Lyssa N. Wilson Spring 2015

Agenda Elements of the sampling problem Some basic concepts of statistics Case Study #1

Elements of the Sampling Problem

Technical Terms (Again) An element is an object on which a measurement is taken A population is a collection of elements to which an inference is made A sample is a collection of sampling units drawn from a frame or frames Sampling units are nonoverlapping collections of elements from the population that cover the entire population A frame is a list of sampling units

How to Select the Sample: The Design of the Survey Sample

How to Select the Sample The objective of sampling is to estimate population parameters such as the mean, proportion, or total The quantity of information is controlled by the number of units included in a sample and the method used to select a sample

How to Select the Sample The primary questions addressed by sampling theory are: What sampling procedure should be used? What number of sampling units should be included in a sample? The answer to both depends on how much information one is willing to buy

How to Select the Sample Q If is the population parameter of interest and is an estimator of then a bound on the error of estimation, B, should be specified that represents the difference in absolute value between and Q Q Q Q Error of estimation= Q- Q <B

How to Select the Sample 1-a ( ) A probability, , specifies the fraction of times in repeated samples the the error of estimation is less than B [ ]=1-a P Error of estimation<B

How to Select a Sample 2s Q Typically B is set to and, therefore, will be approximately .95 Once a bound, B, has been specified, along with its associated probability, , different sampling designs can be compared to determine which is most efficient for a particular purpose 1-a ( ) 1-a ( )

Probability Sampling Statistical estimation requires randomness in sampling designs so that properties of statistical estimators can be assessed probabilistically Sampling designs based on planned randomness are probability samples

Simple Random Sampling The basic probability sampling design, simple random sampling, consists of selecting a group of n sampling units in such a way that all samples of size n have the same probability of selection

Stratified Random Sampling A stratified random sample is one obtained by separating the population elements into discrete, nonoverlapping groups, called strata, and then selecting a simple random sample from each stratum

Stratified Random Sampling The principle reasons for using stratified random sampling rather than simple random sampling are: 1. Stratification may produce a smaller bound on the error of estimation than would be produced by a simple random sample of the same size (this is particularly true if measurements within strata are homogenous) 2. The cost per observation may be reduced by stratification of the population elements into convenient groupings 3. Estimate of population parameters may be desired for subgroups of the population (these subgroups should then be identifiable strata)

Cluster Sampling Cluster sampling is a less costly alternative to simple or stratified random sampling if the cost of obtaining a frame that lists all population elements is very high or if the cost of obtaining observations increases as the distance separating elements increases

Cluster Sampling Cluster sampling is an effective design for obtaining a specified amount of information under the following conditions: 1. A good frame listing all population elements is not available or is very costly to obtain, but a frame listing clusters is easily obtained 2. The cost of obtaining observations increases as the distances separating the elements increases

Cluster Sampling Clusters typically consist of herds, households, or other units of clustering (e.g., an orange tree forms a cluster of oranges for investigating insect infestations) A farm herd contains a cluster of livestock for estimating proportions of diseased animals Elements within a cluster are often physically close together and hence tend to have similar characteristics and the measurement on one element within a cluster may be correlated with the measurement on another

Each element of the population is in exactly one stratum Each element of the population is in exactly one cluster Take a simple random sample of clusters; observe all elements within clusters in the sample Take a simple random sample from every stratum Variance of the estimate depends on the variability within strata Variance of the estimate depends primarily on the variability between clusters For greatest precision, individual elements within each cluster should be heterogeneous, and cluster means should be similar to one another For greatest precision, individual elements within each stratum should have similar values, but stratum means should differ from each other as much as possible

Systematic Sampling Systematic sampling involves random selection of one element from the first k elements and then selecting every kthelement thereafter

Systematic Sampling Systematic sampling is a useful alternative to simple random sampling for the following reasons: 1. Systematic sampling is easier to perform in the field and hence is less subject to selection errors by field-workers than are either simple random samples or stratified random samples, especially if a good frame is not available 2. Systematic sampling can provide greater information per unit cost than simple random sampling can provide for certain populations with certain patterns in the arrangement of elements

Multi-Stage Sampling Sampling conducted in stages, often taking into account the hierarchical (nested) structure of a population Primary sampling units (PSUs) are sampled first (e.g., cities) Secondary sampling units (SSUs) are sampled next (e.g., city blocks) Ultimate sampling units (actual elements) are sampled last (e.g., households) Especially useful when no frame can be established for a single-stage sample

Multi-Stage Sampling For a fixed sample size of elements, a multi-stage sampling design is almost always less efficient than a simple random sample (though often more feasible) Variance estimation methods for complex sample designs must be used to obtain correct standard errors

Multiple-Frame Sampling

Quota Sampling A nonprobability sampling method (although randomness is sometimes part of the design) in which a prespecified number of surveys is obtained from specific subgroups of a target population (e.g., Republicans, Democrats) Introduces unknown sampling biases into survey estimates

Chain-Referral Sampling Snowball sampling methods for sampling in rare/hard-to-reach populations One or more persons having the trait of interest serve as seeds and identify others Persons with many connections are likely to be included, whereas isolated persons may not be included at all Information about network connections in the sample can be used to weight sample units (respondent-driven sampling, which is premised on Markov-chain theory)

Recruitment Network + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Equilibrium 100% 90% 80% 70% Percentage of Population 60% 50% 40% 30% 20% 10% 0% 0 1 2 3 4 5 6 7 8 9 10 Recruitment Wave

Planning a Survey

Planning A Survey 1. Statement of objectives 2. Target population 3. The frame 4. Sample design 5. Method of measurement 6. Measurement instrument 7. Selection and training of fieldworkers 8. The pretest (pilot) 9. Organization of fieldwork 10.Organization of data management 11.Data analysis

Some Basic Concepts of Statistics

Finite Population Correction Most statistical theory is premised on an underlying infinite population Sampling theory and practice is founded on the assumption of sampling from a finite population In the general framework of finite population sampling, sample sizes of size n are taken from a population of size N In the finite population case, the variance estimate of a statistical estimator must be adjusted due to the fact that not all data from a finite population are observed, using the finite population correction (fpc)

Finite Population Correction For simple random samples (without replacement) the fpc is expressed as or N n 1 or - 1 f Where f is the sampling fraction or rate f = n N

Finite Population Correction The fpc is, therefore, the fraction of a finite population that is not sampled Because the fpc is literally a factor in the calculation of an estimate of variance for an estimated finite population parameter, the estimated variance is reduced to zero if n = N

Finite Population Correction When n is small relative to N, the fpc is close to unity In samples of very large populations f is very small and the fpc may be ignored Ignore if 1-n/N>.95 Although the fpc is applicable for estimation, it often is not necessary for many inferential uses such as statistical significance testing (e.g., comparison between sampled subgroups).

Estimate of Population Mean n yi m = y = i=1 n s2 V(y)= 1-n N n s2 1-n V y ( )=2 2 N n

Estimate of Population Proportion n yi p= y = i=1 n p q n-1 V( p)= 1-n q=1- p where N p q n-1 1-n V p ( )=2 2 N

Estimate of Population Total n N yi t = Ny = i=1 n s2 )= N21-n ( V( t)= V Ny N n s2 )=2 N21-n V Ny ( 2 N n

Case Study #1

Case Study Activity In small groups, address the following questions in relation to Case Study #1 relying only on the material that was discussed thus far in the semester 1. Has the surveyor committed any serious error(s)? 2. If so, what type and why? If not, why?

Understanding Sampling in Survey Research

Download Presentation

Presentation Transcript

Related

More Related Content