Understanding Sampling Theory: An Introduction to Statistical Surveys
Statistical surveys play a crucial role in collecting information to fulfill specific needs across various fields such as population demographics, labor statistics, agriculture, industry, and trade. A statistical survey involves investigating unknown characteristics of a population, focusing on parameters like total, mean, proportion, and ratio. While census provides a comprehensive view of the entire population, sampling offers a practical method by studying a representative part of the population. Sampling helps in reducing effort, time, and costs while providing reliable insights for decision-making.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
SAMPLING THEORY an introduction Dr. Dr. Mathachan Mathachan Pathiyil Associate Professor Associate Professor Department of Statistics Department of Statistics Nirmala Nirmala College, College, Muvattupuzha Pathiyil Muvattupuzha
Statistical Surveys The purpose of a survey is the collection of information to satisfy a definite need. The need to collect data arises in all walks of life. The data we need may be about 1. the population (total, sex, age, migration, rate of growth, literacy, religion ) 2. labour (no. of employees, hrs. of work, wages, strikes, unemployment ) 3. agriculture ( area under diff. crops, forests, agriculture income, manures, cultivation practices ) 4. Industry ( turn over, production, capital investment, water consumption, pollution ) 5. Trade ( wholesale/ retail prices, demand, profit/ loss ) etc.
A statistical survey is a sort of investigation carried out by an agency or individual to study the nature of the unknown characteristics of a population. . We undertake a survey for a variety of purposes. However in most cases our interest may be concentrated on 4 important unknown values (or parameters ) of the population under study. 1. the population total 2. the population mean 3. the population proportion 4. the population ratio Proportion : whole and part ( proportion of smokers, males, defectives, distinctions ) Ratio : part and part ( sex ratio, import/ export ratio, birth/ death ratio )
Population : Aggregate of all objects about which we want to collect information ( houses in an area, students in a class, fishes in a lake, viewers of a specific T. V. programme, normal population with mean 50 and SD 5 ) Characteristic : Any aspect of the population about which we want to collect information ( colour, height, life length, yield, political affiliation, income, employment ) Characteristics are of two types Variables and Attributes Two ways of collecting information Census method ( complete enumeration method) and Sampling method.
Census every unit of the population. Method of collecting data from each and Merits of census method The results are more representative, accurate and reliable The results are free from sampling errors A census data may be used as a basis for various other surveys .. However despite of these advantages, the census method is not popularly used in practice. Effort, money, time required for completing census is very large. There is no way of checking the error in the data except through a re- survey or sample checks. Census is practically impossible for a researcher or a small organization. If the population is infinite or the enumeration is destructive in nature, census cannot be used.
Sampling representative part of the population only Method of collecting data from a Merits of sampling include Reduced cost, time and labour Greater scope ( need only less number of trained investigators, less administrative cost, less number of equipments ) Greater accuracy If the population is hypothetical or infinite, only sampling is possible It is always possible to determine the extent of sampling error Demerits of sampling method are A proper choice of the sampling method is not made, the results may be misleading The chances of sampling errors are great in sampling When the population is small, we can t use sampling When the information is needed from each and every unit in the population (Voters list preparation, incom-tax assessment, college admissions ), sampling cannot be used.
Neither application. Census conclusions homogenous. It is a curious fact that the results from a carefully planned, well executed sample survey are expected to be more accurate than those from a census survey. The aim of sampling theory is to make sampling more effective so that the answer to a particular question is given in a quick, valid, efficient and economical way. sampling nor census admit universal and sampling when will population produce identical perfectly the is
Errors in Surveys Two major types of errors can arise when a survey is conducted to make observations on a characteristic defined over the population: sampling errors and non-sampling errors Sampling error refers to the error arising due to drawing inferences about the population on the basis of few observations taken from it. This error is inherent and unavoidable in any sample survey. It can be decreased by increasing the sample size . S. E. is inversely proportional to the square root of the sample size. Sampling errors are absent in census surveys. Few reasons for sampling errors are faulty selection of the sample(purposive or judgment sampling, use of inappropriate sampling scheme like srs for heterogeneous populations ), substitution(when difficulties arise, investigator may substitute a convenient member of the population), faulty identification of the sampling units (high in area surveys or crop surveys)
Non-sampling errors are more serious and are due to mistakes made in the acquisition of data . This is present in both sample surveys and census surveys. It can occur at any stage of its planning, execution and analysis. Few reasons for non sampling errors are faulty planning or definitions(faulty objectives, faulty questionnaire, errors in measurements, lack of trained investigators ), errors due to non response (not at homes, unable to answer, refuses to answer the questions ), response errors (respondent may misunderstand a question and may furnish false data, prestige bias, investigator bias ), errors in coverage(inclusion/exclusion of units which are to be excluded/included in a survey ), compiling errors(errors in coding, editing, tabulation ), publication errors (errors in printing, presentation )
Classification of Sampling Techniques Sampling Techniques Probability Sampling Non probability Sampling Convenience Sampling Sampling Convenience Judgmental Sampling Sampling Judgmental Quota Sampling Sampling Quota Snowball Sampling Simple Random Sampling Stratified sampling Systematic Sampling Cluster Sampling PPS sampling
Non- probability sampling and probability sampling Non probability sampling - Method of selecting samples in which the choice of selection of units into the sample depends entirely on the judgment of the sampler (investigator). Probability sampling Scientific method of selecting samples from the population. In this procedure , each unit in the population has a definite pre assigned non zero probability of being selected into the sample.
Non probability sampling Convenience sampling Attempts to obtain a sample of convenient elements. Usually the sample is restricted to a part of the population that is readily available. Often, respondents are selected because they happen to be in the right place at the right time. use of students or members of social organizations mall intercept interviews respondents fruits on the top of the containers people on the street interviews without qualifying the
Judgmental sampling method of sampling in which the sample elements from the population are selected based on the judgment of the researcher. party members selected in voting behavior research expert witnesses used in court purchase engineers selected in industrial marketing research
Quota sampling may be viewed as two-stage restricted judgmental sampling. The first stage consists of developing control categories, or quotas, of population elements. In the second stage, sample elements are selected based on convenience or judgment. Control Characteristic (Sex) Male Female Population composition Sample composition Percentage Number 48 52 ____ 100 480 (48%) 520 (52%) ____ 1000
Snowball sampling an initial group of respondents is selected, usually at random. After being interviewed, these respondents are asked to identify others who belong to the target population of interest. Subsequent respondents are selected based on the referrals. Disadvantages of non probability sampling All the non probability sampling procedures suffers from drawbacks of favoritism, personal biases, prejudices ... of the investigator. Only if the investigator is well experienced and perfect in nature we can expect satisfactory results in this case
Probability Sampling Schemes Simple Random Sampling Stratified Random Sampling Systematic Sampling Cluster Sampling PPS Samling Advantages of probability sampling The sample will be representative of the population with respect to the variables of interest. Probability samples are more accurate than non-probability samples (They remove conscious and unconscious sampling bias ) Probability samples permit the development of the theory for the estimation of population parameters. Probability samples allow us to determine the accuracy of the sample estimates.
For any survey there are three important stages Planning, execution and Analysis. Execution of the survey is 100% practical work. Sampling theory plays no role in the execution of the survey. It gives importance to the other two stages namely, planning and analysis.
Principal steps in sample surveys Sample survey may be considered as an organized fact finding procedure. While developing a sampling design ( planning execution and analysis), we must pay attention to the following points. Stating clearly the objectives of the survey Define the population to be sampled (covered) Definition of the sampling units and the preparation of the sampling frame Deciding the data to be collected Methods of collecting the data Preparation of the questionnaire Selection of the sample Organization of the field work Summary and analysis of the data collected
The Language of Sampling Population: the theoretical aggregation of specified elements defined for a given survey defined by time and space Sample element: a case or a single unit that is selected from a population and measured in some way for the study (e.g., a person, thing ...). Sampling frame: a specific list containing the names or addresses of all elements in the population. From this, the researcher selects units to create the study sample. Sample: a set of cases or units that is drawn from a population and used to make conclusions(generalizations or inferences) about the unknown aspects of the population Estimator: a calculating scheme or formula (statistic) for obtaining an appropriate value of the population parameter based on the sample observations.
Estimate: particular value of an estimator w. r. to a sample Expected value: average value of all possible estimates based on an estimator from repeated trials of a sampling scheme. Bias : difference between the expected value of the estimator and the true value of the parameter Precision : measures the closeness of the estimator with its expected value. Variance of an estimator is usually used to measure precision. Accuracy : refers to the closeness of the estimate and the true value of the parameter. Mean Square Error of the estimator is used to measure accuracy Estimator is both accurate and precise only if the estimator is unbiased
Simple Random Sampling (srs) It is the simplest method of probability sampling The sample is drawn unit by unit with equal probability of selection for each unit at each draw. If the unit selected is returned to the population after enumeration, before the next draw, the procedure of selection called srswr. If the unit selected is removed from the population after enumeration, before the next draw, the procedure of selection called srswor. If N is the population size and n is the size of the sample we select, there are Nn srswr samples and NCn srswor samples are possible Since the srswor sample provide more precise and accurate estimates of the population parameters than that based on the srswr sample, we always prefer srswor samples
Procedures of selecting a srs Define the population and select a suitable sampling frame Each element is assigned a number from 1 to N Generate n different random numbers between 1 and N The numbers generated denote the elements that should be included in the sample Lottery method and random number table method are the two procedures available for selecting srs
Lottery Method This is the most popular and simplest method. In this method all the items of the population are numbered on separate slips of paper of same size, shape and colour. They are folded and stored in a container. Shuffle them thoroughly. Slips are then drawn at random one by one till the required number or units are selected into the sample. Table of Random numbers As the lottery method cannot be used, when the population is large, the alternative method is that of using the table of random numbers. There are several standard tables of random numbers. 1. Tippett s table 2. Fisher and Yates table 3. Kendall and Smith s table
Selection of a srs using random number tables Identify and define the population. Determine the desired sample size. Assign all individuals on the list a consecutive number from zero to the population size. Select an arbitrary number in the table of random numbers. For the selected number, locate the unit in the population bearing it and select that unit to the sample Go to the next number in the column of the table and repeat the above step until the desired number of individuals has been selected for the sample.
Estimation of Parameters using srswor Let (y1, y2, , yn) be the srswor sample of observations taken from the population under study. Then the sample mean is 1 srs y n n = y i = 1 i It is an unbiased and consistent estimator of the population mean If N is the population size and is the sample variance, then an unbiased estimator of the variance of the estimator is . 2 N n s Nn Y ( ) 1 n 2 = 2 s y y i 1 n = 1 i An estimate of the population total Y is and an unbiased estimator of its variance is N y N n 2 2 N s Nn
Advantages of srs easy to conduct strategy requires minimum knowledge of the population to be sampled simple estimators for the parameters . Drawbacks of srs When the population is not homogeneous w. r. to the characteristic under survey, the srs need not be a good representative of it.
Homogeneous and heterogeneous populations If all members of a population were identical, the population is considered to be homogenous. That is, the characteristics of any one individual in the population would be the same as the characteristics of any other individual (little or no variation among individuals). When individual members of a population are different from each other, the population is considered to be heterogeneous (having significant variation among individuals). Eg. Students in a school, students in a college, workers in a factory
Stratified Sampling When the population is heterogeneous in nature, the stratified random sampling is used. Two-step process in which the population is partitioned into subpopulations, or strata. Strata should be mutually exclusive and collectively exhaustive so that every population element should be assigned to one and only one strata and no population elements should be omitted. Elements are selected from each strata by a random sample procedure, usually srs. Then pool them together to get the stratified sample A major objective of stratified sampling is to increase precision without increasing cost.
The elements within a strata should be as homogeneous as possible, but the elements in different strata should be as heterogeneous as possible. The stratification variables (Auxiliary variables) should also be closely related to the characteristic of interest (Survey variable or Study variable). If height is the study variable, weight or age may taken as the auxiliary variable If volume of timber is the study variable, girth or height of the trees may be taken as auxiliary variable
Procedure for Drawing a Stratified Random Sample Define the population and the sampling frame Select the stratification variable(s) and the number of strata, L Divide the entire population into L non overlapping subdivisions called strata based on the classification variable In each strata, number the elements from 1 to Nh (the pop. size of strata h) Determine the sample size of each strata, nh, where L nh = n h=1 From each strata, select a simple random sample of nh units and pool them together to get the required Stratified sample.
Reasons for stratification Administrative convenience When sampling problems differ markedly in different parts of the population(In surveying factories, firms may be grouped into large/ medium/small, individual/group, private/govt. ) When stratification produce gain in precision in the estimates of the parameters of the whole population ( when the population is highly heterogeneous)
Estimation of Parameters using a Stratified sample Let the population of size N is divided into L strata each of size Nh, h= 1, 2, , L so that . We take random samples of size nh from the h th strata in the population, h=1, 2, ,L so that . Suppose denote the ith observation in the sample taken from the hth strata in the population. Now be the sample mean from the hth 1 i h n = , L = N N h = 1 h L = n n h = = = 1 , 1,2,..., , 1,2,..., , h y i n h L ih h n 1 h = y y h h i strata in the population . Then an unbiased estimator of the population mean is 1 h N = = 1,2,.... h L Y 1 L = y N y h st h
Allocation of sample size in different strata Once the sampling strategy is fixed as Stratified random sampling, their arise the question of deciding the sample size, nh, for the hth strata, h= 1, 2, , L in the population. The following are the important methods of allocation in stratified sampling 1. equal allocation 2. proportional allocation 3. optimum allocation
n L = = , 1,2,..., n h L Equal Allocation h N N = = , 1,2,..., h n n h L Proportional Allocation h Optimum Allocation In optimum allocation procedures, we resort to conditional minimization techniques. Here we consider linear cost functions. The standard procedures are Minimizing the variance of the estimator for a given total cost of the survey Minimizing total cost the survey for a given variance of the estimator Minimizing the variance of the estimator for a given sample size (Neyman Optimum Allocation)
Advantages of Stratification Administrative convenience Samples are more representative Estimation with greater accuracy Stratification makes it possible to use different sampling designs in different strata
Systematic Sampling A sampling technique in which only the first unit is selected randomly and the rest are selected automatically according to a predefined pattern The sample is chosen by selecting a random starting point and then picking every kth element in succession from the sampling frame. The sampling interval, k, is determined by dividing the population size N by the required sample size n and rounding it to the nearest integer. For example, let there are 5,000 elements in the population and a sample of 50 is desired. In this case the sampling interval, k, is 100. A random number ( r, the random start) between 1 and 100 is selected. If, for example, r=23, the sample consists of elements 23, 123, 223, 323, 423, 523, and so on upto 4923.
Procedure for Drawing a Systematic Sample Define the population and select a suitable sampling frame Each element is assigned a number from 1 to N Determine the sampling interval k (k=N/n). If k is a fraction, round it to the nearest integer Select a random number, r, between 1 and k Now the elements with the following numbers will constitute the systematic random sample r, r+k,r+2k,r+3k,r+4k,...,r+(n-1)k
Things to remember before using Systematic sampling scheme Efficiency of systematic sampling depends on the order of arrangement of the units in the population If the units in the population show an increasing/decreasing trend along with the increase in magnitude of their labels, the systematic sample means will also show the same tendency (rank lists, salary lists ) If the population is almost periodic/cyclic in nature, then the efficiency of systematic sampling depends on the value of k, the sampling interval (Rain fall, market days, peak traffic hrs. )
Cluster Sampling The target population is first divided into mutually exclusive and collectively subpopulations, or clusters. Then a random sample of clusters is selected, based on a probability sampling technique such as srs. For each selected cluster, either all the elements are enumerated (single stage) or a random sample of elements is drawn from the selected clusters and are only enumerated (two-stage). Ideally, each cluster should be a small-scale representation of the population. exhaustive
Types of Cluster Sampling Cluster Sampling Single Stage Sampling Two-Stage Sampling Multistage Sampling Population : Students in a college Cluster:Courses/batches Elements : Each student Population : People in Kerala Cluster : districts/ Villages/ Panchayats/ municipalities Elements : Houses/Schools/ individuals