Sampling Methods in Business Analytics

 
Sampling
Polling, and estimating proportions
Choosing a sample size
Sampling methods
»
Stratified sampling, cluster sampling
Sampling problems
»
Non-response bias, measurement bias
Optimization (Excel’s “Solver”)
Adverse selection
 
DECS 430-A
Business Analytics I: Class 5
Polling
 
If the individuals in the population differ in some
qualitative way, we often wish to estimate the
proportion / fraction / percentage of the population
with some given property.
For example: We track the sex of purchasers of our
product, and find that, across 400 recent
purchasers, 240 were female. What do we estimate
to be the proportion of all purchasers who are
female, and how much do we trust our estimate?
First, the Estimate
 
Let
 
Obviously, this will be our estimate for the
population proportion.
But how much can this estimate be trusted?
And Now, the Trick
 
Imagine that each woman is represented by a
“1”, and each man by a “0”.
Then the proportion (of the sample or
population) which is female is just the mean of
these numeric values, and so estimating a
proportion is just a special case of what we’ve
already done!
The Result
 
Estimating a mean:
 
 
Estimating a proportion:
 
[When all of the numeric values are either 0 or 1, s takes the
special form shown above.]
 
The example:
Multiple-Choice Questions
 
If the Republican Party’s candidate were to be
chosen today, which one would you most prefer?
Romney, Cain, Bachman, Perry, Gingrich,
Santorum, Paul, Huntsman, none
The results are reported as if 9 separate “yes/no”
questions had been asked.
If the Republican Party’s candidate were to be
chosen today, which of these would have your
approval?
The same reporting method is used.
Choice of Sample Size
 
Set a “target” margin of error for your
estimate, based on your judgment as to how
small will be small enough for those who will
be using the estimate to make decisions.
There’s no magic formula here, even though
this is a very important choice: Too large, and
your study is useless; too small, and you’re
wasting money.
Estimating a Proportion: Polling
 
Pick the target margin of error.
Why do news organizations always use 3% or
4% during the election season?
Because that’s the largest they can get away with.
 
So, for example, n=400 (resp., 625, or 1112) assures a
margin of error of no more than 5% (resp., 4%, or 3%).
Estimating a Mean: Choice of Sample
Size
 
Set the target margin of error.
Solve
 
 
From whence comes s?
From historical data (previous studies) or from
a pilot study (small initial survey).
target = $25.
s 
 $180.
Set n = 207.
The “Square-Root” Effect : Choice of
Sample Size after an Initial Study
 
Given the results of a study, to cut the margin
of error in half requires roughly 4 times the
original sample size.
And generally, the sample size required to
achieve a desired margin of error =
How to Read Presidential-Race Polls
 
When reading political polls, remember that
the margin of error in an estimate of the “gap”
between the two leading candidates is roughly
twice as large as the poll's reported margin of
error.
The margin of error in the estimated “change
in the gap” from one poll to the next is nearly
three times as large as the poll's reported
margin of error.
Summary
 
Whenever you give an estimate or prediction to someone, or accept an
estimate or prediction from someone, in order to facilitate risk analysis
be sure the estimate is accompanied by its margin of error:
A
95%-confidence interval is
 
 
If you’re estimating a mean using simple random sampling:
 
 
If you’re estimating a proportion using simple random sampling:
 
(your estimate) ± (~2) 
·
 
(one standard-deviation’s-worth of uncertainty
inherent in the way the estimate was made)
How Will the Data be Collected?
 
Primary Goals:         No bias         High precision         Low cost
Simple random sampling 
with
 replacement
Typically implemented via systematic sampling
Simple random sampling without replacement
Typically done if a population list is available
Stratified sampling
Done if the population consists of subgroups with relative
within-group homogeneity
Cluster sampling
Done if the population consists of (typically geographic)
subgroups with substantial within-group heterogeneity
Specialized approaches (e.g., tagging the U-Haul fleet)
Non-Response Bias
 
One of the difficulties in surveying people (whether by
mail, telephone, or direct approach) is that some
choose not to respond. Assume that you have decided
to conduct a study which requires a sample size of 100.
If you only expect 10% of those surveyed to respond to
your questionnaire, what should you do?
A naïve answer is, "Simply send out 1000
questionnaires!"
Unfortunately, the demographics of respondents and
nonrespondents may differ substantially. To base
estimates for the entire population merely on the data
collected from respondents therefore might leave you
exposed to substantial sampling bias.
 
A form of stratified sampling is typically used to overcome non-
response bias. An initial mass mailing of questionnaires takes place,
with identifying codes placed on each questionnaire (or its return
envelope). When the submission deadline for responses is reached,
estimates can be made for the stratum of "people who respond to
the initial mailing." Crossing these people off the mailing list (by
cross-referencing the codes on their responses) leaves a list of
people all of whom are now known to be in the other "people who
don't respond" stratum. The initial response rate is used to
estimate the relative sizes of the two strata.
A sample of those who didn't respond is now recontacted, using a
more expensive approach designed to obtain responses from
everyone. (The expense is typically related to an incentive of some
kind.) Their data provides estimates for the second stratum, and
the study can then be completed.
See “Nonresponse_Bias.xls” for an example.
 
Non-Response Bias
Measurement Bias
 
Asking sensitive questions
Software piracy
Sexual activities
Tax fraud
People will lie
Allow them to hide behind a mask of
randomness
 
Randomized Response Surveys
 
Larger samples are required for the same precision …
But the bias can be completely eliminated.
See Sampling.xls for details.
 
Optimization
 
 
 
Using Excel’s “Solver” add-in
 
Take My Car. Please!
 
Have I got a deal for you! I've got this great used car, and I
might be willing to sell.
The actual value of the car depends on how well it has been
maintained, and this is of course only known to me: Expressed
in terms of the car's value to me, you believe it to be equally
likely to be worth any amount between $0 and $5000.
You, who would utilize the car to a greater extent than I,
would derive 50% more value from ownership (e.g., if it's
worth $3000 to me, then it's worth $4500 to you).
How much are you willing to offer me? (I'll interpret your offer
as "take-it-or-leave-it.")
 
Adverse Selection
 
You are subject to 
adverse selection
 whenever
1.
You offer to engage in a transaction with
another party, and that party can either accept
or refuse your offer.
2.
The other party holds information not yet
available to you concerning the value to you of
the transaction.
3.
The other party is most likely to accept the offer
(i.e., to select to do the deal) when the
information is "bad news" (i.e., adverse) to you.
 
Adverse Selection: Dealing with It
 
We need to be able to compute  E[ V | V 
 v] . For normally-distributed
uncertainty, this 
can
 be done analytically.               
(See Adverse_Selection_plus.xls)
Adverse Selection: Examples
 
Making a buyout offer
Setting an insurance premium
getting (forcing) healthy young people to carry insurance is
critical to the ACA
Giving bid/ask quotes
Auctions with objective value uncertainty
contracting (unknown costs)
natural resource sales (unknown supply)
the “Winner’s Curse”
debt auctions (unknown post-auction market price)
Here’s another Saturday night …
mothers teach daughters to avoid giving bad signals
Course Finale
 
We’ve covered …
Enough probability to get you started in FINC-430,
OPNS-430, and other courses dealing with risk.
Enough statistics to begin DECS-431, on regression
analysis.
Enough warning to provide a bit of protection against
common errors.
Good luck, and bon voyage!
Slide Note
Embed
Share

Sampling plays a crucial role in estimating proportions and making informed decisions in business analytics. From polling to estimating proportions, this class explores sampling techniques, sample size determination, and potential biases. Learn about choosing a sample size, stratified and cluster sampling, handling non-response bias, and using tools like Excel's Solver for optimization in decision-making. Explore how to estimate proportions effectively and understand the importance of selecting an appropriate sample size based on margin of error considerations.

  • Sampling Methods
  • Business Analytics
  • Sample Size
  • Estimating Proportions
  • Optimization

Uploaded on Sep 18, 2024 | 3 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. DECS 430-A Business Analytics I: Class 5 Sampling Polling, and estimating proportions Choosing a sample size Sampling methods Stratified sampling, cluster sampling Sampling problems Non-response bias, measurement bias Optimization (Excel s Solver ) Adverse selection

  2. Polling If the individuals in the population differ in some qualitative way, we often wish to estimate the proportion / fraction / percentage of the population with some given property. For example: We track the sex of purchasers of our product, and find that, across 400 recent purchasers, 240 were female. What do we estimate to be the proportion of all purchasers who are female, and how much do we trust our estimate?

  3. First, the Estimate 240 = 6 . 0 = = Let 400 p 60 % . Obviously, this will be our estimate for the population proportion. But how much can this estimate be trusted?

  4. And Now, the Trick Imagine that each woman is represented by a 1 , and each man by a 0 . Then the proportion (of the sample or population) which is female is just the mean of these numeric values, and so estimating a proportion is just a special case of what we ve already done!

  5. The Result s Estimating a mean: x (~ 2) n p ) p - (1 Estimating a proportion: p (~ 2) n [When all of the numeric values are either 0 or 1, s takes the special form shown above.] 0.6(1 - 0.6) 8 . 4 The example: 0.6 (~ 2) , or 60 % % . 400

  6. Multiple-Choice Questions If the Republican Party s candidate were to be chosen today, which one would you most prefer? Romney, Cain, Bachman, Perry, Gingrich, Santorum, Paul, Huntsman, none The results are reported as if 9 separate yes/no questions had been asked. If the Republican Party s candidate were to be chosen today, which of these would have your approval? The same reporting method is used.

  7. Choice of Sample Size Set a target margin of error for your estimate, based on your judgment as to how small will be small enough for those who will be using the estimate to make decisions. There s no magic formula here, even though this is a very important choice: Too large, and your study is useless; too small, and you re wasting money.

  8. Estimating a Proportion: Polling Pick the target margin of error. Why do news organizations always use 3% or 4% during the election season? Because that s the largest they can get away with. p ) p - (1 0.5(1 - 0.5) 1 (~ 2) (~ 2) n n n So, for example, n=400 (resp., 625, or 1112) assures a margin of error of no more than 5% (resp., 4%, or 3%).

  9. Estimating a Mean: Choice of Sample Size Set the target margin of error. s = Solve target = $25. s $180. Set n = 207. (~ ) 2 t arg et n From whence comes s? From historical data (previous studies) or from a pilot study (small initial survey).

  10. The Square-Root Effect : Choice of Sample Size after an Initial Study Given the results of a study, to cut the margin of error in half requires roughly 4 times the original sample size. And generally, the sample size required to achieve a desired margin of error = 2 original margin of error ( ) original sample size desired m arg in of error

  11. How to Read Presidential-Race Polls When reading political polls, remember that the margin of error in an estimate of the gap between the two leading candidates is roughly twice as large as the poll's reported margin of error. The margin of error in the estimated change in the gap from one poll to the next is nearly three times as large as the poll's reported margin of error.

  12. Summary Whenever you give an estimate or prediction to someone, or accept an estimate or prediction from someone, in order to facilitate risk analysis be sure the estimate is accompanied by its margin of error: A 95%-confidence interval is (your estimate) (~2) (one standard-deviation s-worth of uncertainty inherent in the way the estimate was made) If you re estimating a mean using simple random sampling: s x (~ 2) n If you re estimating a proportion using simple random sampling: p ) p - (1 p (~ 2) n

  13. How Will the Data be Collected? Primary Goals: No bias High precision Low cost Simple random sampling with replacement Typically implemented via systematic sampling Simple random sampling without replacement Typically done if a population list is available Stratified sampling Done if the population consists of subgroups with relative within-group homogeneity Cluster sampling Done if the population consists of (typically geographic) subgroups with substantial within-group heterogeneity Specialized approaches (e.g., tagging the U-Haul fleet)

  14. Non-Response Bias One of the difficulties in surveying people (whether by mail, telephone, or direct approach) is that some choose not to respond. Assume that you have decided to conduct a study which requires a sample size of 100. If you only expect 10% of those surveyed to respond to your questionnaire, what should you do? A na ve answer is, "Simply send out 1000 questionnaires!" Unfortunately, the demographics of respondents and nonrespondents may differ substantially. To base estimates for the entire population merely on the data collected from respondents therefore might leave you exposed to substantial sampling bias.

  15. Non-Response Bias A form of stratified sampling is typically used to overcome non- response bias. An initial mass mailing of questionnaires takes place, with identifying codes placed on each questionnaire (or its return envelope). When the submission deadline for responses is reached, estimates can be made for the stratum of "people who respond to the initial mailing." Crossing these people off the mailing list (by cross-referencing the codes on their responses) leaves a list of people all of whom are now known to be in the other "people who don't respond" stratum. The initial response rate is used to estimate the relative sizes of the two strata. A sample of those who didn't respond is now recontacted, using a more expensive approach designed to obtain responses from everyone. (The expense is typically related to an incentive of some kind.) Their data provides estimates for the second stratum, and the study can then be completed. See Nonresponse_Bias.xls for an example.

  16. Measurement Bias Asking sensitive questions Software piracy Sexual activities Tax fraud People will lie Allow them to hide behind a mask of randomness

  17. Randomized Response Surveys inverted response innocuous response Flip two coins: If both are tails, answer the following question untruthfully; otherwise, answer the queston truthfully. Flip two coins: If at least one is a head, go to A; otherwise, go to B. A: Flip a coin. Have you ever shoplifted? B: Flip a coin. Did you get a head? Have you ever shoplifted? 75% Pr(answer actual question) 50% flip 1,000 sample size 75% Pr(answer actual question) 50% flip 1,000 sample size "Y" rate 55.00% 58.08% 51.92% 3.08% 60.00% estimate 66.17% 95%-confidence 53.83% limits 6.17% margin of error "Y" rate 57.50% 60.56% 54.44% 3.06% 60.00% estimate 64.09% 95%-confidence 55.91% limits 4.09% margin of error Larger samples are required for the same precision But the bias can be completely eliminated. See Sampling.xls for details.

  18. Optimization Using Excel s Solver add-in

  19. Take My Car. Please! Have I got a deal for you! I've got this great used car, and I might be willing to sell. The actual value of the car depends on how well it has been maintained, and this is of course only known to me: Expressed in terms of the car's value to me, you believe it to be equally likely to be worth any amount between $0 and $5000. You, who would utilize the car to a greater extent than I, would derive 50% more value from ownership (e.g., if it's worth $3000 to me, then it's worth $4500 to you). How much are you willing to offer me? (I'll interpret your offer as "take-it-or-leave-it.")

  20. Adverse Selection You are subject to adverse selection whenever 1. You offer to engage in a transaction with another party, and that party can either accept or refuse your offer. 2. The other party holds information not yet available to you concerning the value to you of the transaction. 3. The other party is most likely to accept the offer (i.e., to select to do the deal) when the information is "bad news" (i.e., adverse) to you.

  21. Adverse Selection: Dealing with It We need to be able to compute E[ V | V v] . For normally-distributed uncertainty, this can be done analytically. (See Adverse_Selection_plus.xls)

  22. Adverse Selection: Examples Making a buyout offer Setting an insurance premium getting (forcing) healthy young people to carry insurance is critical to the ACA Giving bid/ask quotes Auctions with objective value uncertainty contracting (unknown costs) natural resource sales (unknown supply) the Winner s Curse debt auctions (unknown post-auction market price) Here s another Saturday night mothers teach daughters to avoid giving bad signals

  23. Course Finale We ve covered Enough probability to get you started in FINC-430, OPNS-430, and other courses dealing with risk. Enough statistics to begin DECS-431, on regression analysis. Enough warning to provide a bit of protection against common errors. Good luck, and bon voyage!

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#