Dealing with Range Anxiety in Mean Estimation

Slide Note

Dealing with range anxiety in mean estimation involves exploring methods to improve accuracy when estimating the mean value of a random variable based on sampled data. Various techniques such as quantile truncation, quantile estimation, and reducing dynamic range are discussed. The goal is to reduce error and improve the precision of mean estimation in scenarios with limited data access and dynamic ranges. Overall, the focus is on mitigating the challenges posed by range anxiety to enhance the reliability of mean estimation processes.

lottie Follow

Uploaded on Jul 27, 2024 | 3 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Dealing with Range Anxiety in Mean Estimation Vitaly Feldman work done while at IBM Research - Almaden

Mean estimation Given ? i.i.d. samples of random variable ? ?1,?2, ,?? estimate ?[?] Use the empirical mean 1 ? ??? ? Error ~ ? for ? = ?[?2] ? ?2 ???[?] = Limited access to data samples: One bit per sample Statistical query model 2

One bit per sample 1 bit-per-sample oracle:[Ben-David, Dichterman 2000] For input distribution ? over domain ? Given ?:? 0,1 , outputs ?(?) for fresh ? ? Distributed/low-communication learning/stats [Rajagopal,Wainwright,Varaiya 06; Zhang,Duchi,Jordan,Wainwright 13; Steinhardt,Duchi 15; Steinhardt,Valiant,Wager 16; Suresh,Yu,McMahan,Kumar 17] Sensor networks [Luo 05; Ribeiro,Giannakis 06; .] 3

Range anxiety Assume ? [0,?] Unbiased estimator: For each sample: Pick ? uniformly in [0,?] define ??? = Ind(? > ?) Given responses ?1, ,?? return ? 1 ? ??? Unbiased: ??,?[? ??? ] = ?[?] ? ? ? ? ? ? ? ? ? Standard deviation: ? ?[?2] ? ?2 ? ?[?2] ? Grows with ? and could be much worse than 4

Reducing dynamic range ? 0 ?/2 ?/2 ?/4 ? = ?? ?=0 ? 2 ?? ? Ind ? 2? < ? ? ?0= ? Ind ? 2? 1 ? ?? ?/2? 1 ?/ Error for ?? : ? ?2 ?/2?2 ? ?2 ?/2?+1 Chebyshev: Pr ? > ?/2? ? ?? ? ?2 ?/ + ? ?2 ? ? 2 = log ?3/2 +? Total error: ? Still depends on ? ! 5

Quantile truncation Let ? = 1 ? quantile of ?: Pr ? ? = ? ? ?2 ? 1. ? 2. ? ? Ind ? > ? ? ? ?2 1 ? truncation to [0,?] gives error: For ? = log ?3/2? ?2 ? 6

Quantile estimation Find ? such that Pr ? ? = ? Pr ? ? ? For every ? can estimate Pr ? ? within Find 1 ? quantile ? via approximate binary search Discretize [0,?] with step ? ? log ?/? /? Get ? log ?/? and discretization error ? ? 7

Centering ? ?2 ? ? ? Error so far: log ?3/2 log + ? 1 4,3 Let ?0 be such that Pr ? ?0 Define ?+= ? Ind ? ?0, ? = ? Ind(? < ?0) Then: ? ?+ 4 2+ ? ? 2 5 ???[?] Proof: ?0 ?[?] 2? Given ? s.t. ? ?2 ?2 and any ? > 0, the algo has log ?3log ?/? ? estimation error O ? + ? 8

Statistical problems ?1,?2, ,?? ? over ?

Statistical query model [Kearns 93] ?1 ?1 ?2 ?2 ?? ?? SQ algorithm STAT?(?) oracle ?1:? 0,1 ?1 ?? ??1? ? ? is the proxy for the number of samples ? 1/ ? Not faithful for lower bounds!

Simulating ideal SQ oracle Ideal oracle iSTAT?(?): tolerance ???? ??(?) /? Beyond current lower bound techniques VSTAT?(?): [F.,Grigorescu,Reyzin,Vempala,Xiao 13] For ?:? [0,1] returns ? 1 ?, ? ? (1 ? ? ) ? ? ? ? max Possible to analyze and prove lower bounds! Can be simulated using ?(?)1-bit-per-sample queries Can simulate up to ?1-bit-per-sample queries iSTAT?(?) can be implemented using VSTAT?( ?(?)) 11

Conclusions Large range/heavy tails are a hassle when access to data is limited (Asymptotically) reasonable approach for dealing with it To do: better/simpler/more practical algorithms 12

Dealing with Range Anxiety in Mean Estimation

Download Presentation

Presentation Transcript

Related

More Related Content