Efficient Bootstrap Computation
An insightful discussion on improving estimation accuracy and reducing estimator variance through post-sampling adjustments in the context of bootstrap computation. The chapter explores techniques, such as geometrical representations and resampling strategies, to enhance statistical analysis outcomes. Dive deep into the concepts presented to elevate your understanding of bootstrap methodologies.
Uploaded on Feb 17, 2025 | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Efficient Bootstrap Computation An Introduction to the Bootstrap by Efron and Tibshirani, Chapter 23 M.Sc. Seminar in statistics, TAU, June 2017 By Aitan Birati 1
Agenda Introduction A geometrical representation for bootstrap - Chapter 20 highlights Post Sampling Adjustment Pre and Post sampling adjustments Summary 2
Agenda Introduction A geometrical representation for bootstrap - Chapter 20 highlights Post Sampling Adjustment Pre and Post sampling adjustments Summary 3
Introduction In this chapter we will suggest techniques to improve accuracy estimations and lower the new estimator variance. The author divides this chapter to two types of Techniques: Post sampling adjustments and Pre and Post sampling adjustments 4
Introduction - General idea Sample x = (?1, ?2 ??) from population F Statistic of interest f(x) ? = ? ?f(? ) ; ? = ? ? ??(?) ; G(x) is the probability measure of x We usually use bootstrap to resample and estimate ??= 1 Bias = ??- ?(?) ??(? ?)/? 5
Agenda Introduction A geometrical representation for bootstrap - Chapter 22 highlights Post Sampling Adjustment Pre and Post sampling adjustments Summary 6
A geometrical representation for bootstrap - Chapter 20 highlights Consider the functional statistic: = ? ? where ? demotes the empirical distribution function putting mass 1/n on each of the data points (?1,?2, ??). Rather thinking about of as a function of the values of (?1,?2, ??) , consider the amount of mass put on each ??. Let ? = (?1 We define = T(? ) ?0= (1 1, 1 = 1 ??? , ?? )?a vector of probabilities, 0 ?? ?,1 ? ..,1 ?)? 7
Bootstrap sampling Sampling with replacement from (?1,?2, ??) is equivalent to sampling n? from multi-nominal distribution with n draws. ? ~ 1 ?Mult(n, ?0) ?2-?0?0? = #{?? The variance of the statistic T(? ) can be defined as the variance under the multi-nominal distribution - can written as va? T(? ) ? ~ (?0,[? ]) ? = ??}/n ; i = 1,2, ,n ?? 9
Agenda Introduction Some tools from Chapter 22 Post Sampling Adjustment Pre and Post sampling adjustments Summary 10
Post Sampling Adjustment Theory Now we will use the Control function Technique. Suppose that we have a function g(z) that approximates f(z) and whose integral in respect to G is known. Then we can write ? ? ?? = ? ? ?? + [? ? ? ? ]?? ?[? ?? ?1= ? ? ?? + 1 Var ( ?1) = 1 ?(??)]/? ?var [? ?? ?(??)] 11
Post Sampling Adjustment Theory If g(z) is a good approximation to f(z), then var[f(z) g(z)] < var[f(z)] As a result, the control function will produce an estimate with similar bias and lower bias for the same number of samples B. 12
Post Sampling Adjustment Four types of estimations will be presented: Standard Bootstrap Re-centering Bootstrap Least Square control function Permutation Bootstrap 13
Bootstrap Suppose f(z) is a real-valued function of possibility vector-valued argument z and G(z) is the probability measure of z. We wish to estimate ??[f(z)] = ? ? ?? ??=1 ??) = 1 ??(??) ? 1 Var ( ?var [? ? ] 15
Re-centering /Control function ?0+ ??? is an effective and convenient form for control function Mean(?0+ ??? ) = ?0+ ???0 Var(?0+ ??? ) = ?? ? T(? ) = ?0+ ??? + T(? ) (?0+ ??? ) How do we choose the best ?0and ??? For estimation optimization - lets choose ?0+ ???0= T(?0) For Variance reduction we will see in the next slides 17
Re-centering /Control function If we use ?0+ ??? as the control function for estimating ? T(? ) then ?1= ? (?0+ ??? ) + 1 (?(? ?) - ?0- ??? ?) = ?0+ ???0+ 1 (?(? ?) (?0+ ??? ) ? ? ?=1 ? ? ?=1 ? = ? T(??) = ??+ ???? T(? ) = ??+ ??? ?1= 1 ? ? ? ? ?(? ?) + T(?0) - T(? ) ? ?=? ? ?=1 ??- T(??) = ? ???? = ? ?(? ?) - T(? ) ? ?=? 18
Variance Estimation T(? ) = ?0+ ??? + T(? ) (?0+ ??? ) ???(T(? )) = ?? a + 1 + 2 ? 1 Choosing the proper ?0and ??will zero the 1 ?(?(? ?) - ?0- ??? ?)2 ? 1 ?(?0+ ??? ?)(?(? ?) ?0 ??? ?) ?(?0+ ??? ?)(?(? ?) ?0 ??? ?) and lower the variance. [ == cov(g(z), f(z) g(z))] 19
Least Square control function Least square fit of T(? ?) on ? ? ?0+ ??? By proper choice of constraints ???(T(? )) = ?? ? + 1 ?(?(? ?) - ?0- ??? ?)2= ?0- ??? ?)2 ? 1 ? 1 2+ 1 ??? ?(?(? ?) - 1 ?0+ ???0+ 1 ??- T(??) = ? ? (?(? ?) ( ?0+ ??? ) ??+ ??? ) ? ?=1 ?1= ???? = ? (?(? ?) ( ? ?=? 20
Permutation Bootstrap In many bootstrap samples ? ?0 When choosing T(? ) as a linear statistic, In order to generate bootstrap samples and keep ? = ?0we will use Permutation (Balanced) Bootstrap Rather than sampling with replacement, we concatenate B copies of vector (?1,?2 ??) into a string of n*B. Then we do B random permutation pick of n samples and create B samples. For these B samples ? = ?0 ???? = 0 and ???? 0 21
Permutation Bootstrap ??,??,,??,?? ?? ??,??,,??,?? ?? ??,??,,??,?? ?? ??,??,,??,?? ?? . . ??,??,,??,?? ?? ??,??,,??,?? ?? ??,??,,??,?? ?? ??,??,,??,?? ?? ??,??,,??,?? ?? ? . . ??,??,,??,?? ?? B x n B samples ??= 1 ? ?(? ?) ? ?=1 ??- T(??) = ? ? ?(? ?) - T(??) ? ?=? ???? = 22
?2 2 ? ?? 1 ?2= ???(? ? ) It estimates the percentage of the variance explained by the linear approximation 24
Agenda Introduction A geometrical representation for bootstrap - Chapter 20 highlights Post Sampling Adjustment Pre and Post sampling adjustments Summary 25
Importance Sampling Theory e = ? ? ? ? ?? ?0= 1 Suppose h(z) is roughly proportional to ? ? ? ? - called Importance sampler e = ? ? ? ? (?) * h(z)dz ? ? ?=1 ?(??) 26
Importance Sampling Theory e = ? ? ? ? * h(z)dz (?) ?1= 1 ?(??)*?(??) ? ? ?=1 (??) E( ?1) = ? [?(??) ?(??) ] = ??[?(??)] (??) Var ( ?1) = 1 ???? [?(??) ?(??) ] (??) Var ( ?0) = 1 ?????[?(??)] 27
Var( ?0) Var( ?1) V1/V2 Var1 Var2 Example Estimate of an upper tail probability (97.5%) - Simulation V_prop ?0 ?1 Mean1 Mean2 z~N(0,1) We want to estimate the Prob {z>1.96} = ?{?>1.96}? ? ?? h(z) = 1.96? ?{?>1.96}Z~N(0,1) Versus ?{?>1.96}* ? ? / 1.96(Y) Z~N(1.96,1) 28
Bootstrap tail probabilities Prob{ > c} = 1- ; = T(? ) Prob { > c} = 1 ? ? ?=1 ?{? ? ?>?} 29
Bootstrap tail probabilities Weight definition h(z) is a multi-distribution with mean ? ?0= (1 ? ?(? ) The probability mass function of the rescaled multinomial Mult(n,P)/n having mean ? Intuitively we would like to choose ? so that the event > c is about 50% chance occurring under ? ?(? ). We will add more weights on the larger sample values and lower the weights on the lower sample values so the new ? c ?,1 ?, .1 ?)? 30
Bootstrap tail probabilities - example ? ?(? ) The probability mass function of the rescaled multinomial Mult(n,P)/n having mean ? When is ? then run the following steps: Define c run a large standard bootstrap to define Prob { > c} = 1 ? ?=1 ?{? ? ?>?}= 1- Find that will suite c = 1 1 Find P_tilde vector ?? = 1 Create T(? ) ? ??? exp[ ?? ? ] ?exp[ ?? ? ] exp[ ?? ? ] ?exp[ ?? ? ] ? ?(? ?) ? ???(? ?) ? ? ?=? ????{T(? ) > ?) = ? Calculate ?{? ? ?>? 31
Agenda Introduction A geometrical representation for bootstrap - Chapter 20 highlights Post Sampling Adjustment Pre and Post sampling adjustments Summary 32
Summary In this presentation I presented two techniques that could improve Variance measurement with Bootstrap Control function Importance Sampling For some applications these are very effective tools 33
Thank you! 34