Dealing with Range Anxiety in Mean Estimation

  Dealing with Range Anxiety in
Mean Estimation
 
Vitaly Feldman
 
work done while at IBM Research - Almaden
Mean estimation
2
One bit per sample
 
Distributed/low-communication learning/stats
[Rajagopal,Wainwright,Varaiya 06; Zhang,Duchi,Jordan,Wainwright 13;
Steinhardt,Duchi 15; Steinhardt,Valiant,Wager 16; Suresh,Yu,McMahan,Kumar 17]
 
Sensor networks
[Luo 05; Ribeiro,Giannakis 06; ……….]
3
Range anxiety
4
Reducing dynamic range
5
Quantile truncation
6
Quantile estimation
7
Centering
8
Statistical problems
Statistical query model 
[Kearns ‘93]
 
 
 
 
 
 
 
 SQ algorithm
 
Not faithful for lower bounds!
Simulating ideal SQ oracle
11
 
Possible to analyze and prove lower bounds!
Conclusions
 
Large range/heavy tails are a hassle when access to
data is limited
 
(Asymptotically) reasonable approach for dealing
with it
 
To do: better/simpler/more practical algorithms
12
SQ applications
 
Noise-tolerant learning
     
[Kearns 93; BFKV 96; 
F
,Balcan 13]
Differentially private data analysis
     [Dinur,Nissim 03; BDMN 05; DMNS 06]
o
Local model 
[KLNRS 08]
Distributed/low communication/streaming ML
o
Theory 
[Ben-David,Dichterman 98; BBFM 12; 
F
GRVX 13; Steinhardt,G.Valiant,Wager 15]
o
Practice 
[CKLYBNO 06; RSKSW 10; SLB+ 11; ACDL 14 …]
Evolvability
     [
F
 08;
 F
 09; Kanade,Wortman,L.Valiant 10; Kanade 11; P.Valiant 11; …]
Adaptive data analysis
     [D
F
HPRR 14; Hardt,Ullman 14; Steinke,Ullman 15; 
F
,Steinke 17; …]
13
 
14
Def:
 
[]
Formulas
Slide Note
Embed
Share

Dealing with range anxiety in mean estimation involves exploring methods to improve accuracy when estimating the mean value of a random variable based on sampled data. Various techniques such as quantile truncation, quantile estimation, and reducing dynamic range are discussed. The goal is to reduce error and improve the precision of mean estimation in scenarios with limited data access and dynamic ranges. Overall, the focus is on mitigating the challenges posed by range anxiety to enhance the reliability of mean estimation processes.

  • Range Anxiety
  • Mean Estimation
  • Quantile Truncation
  • Dynamic Range
  • Estimation Methods

Uploaded on Jul 27, 2024 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Dealing with Range Anxiety in Mean Estimation Vitaly Feldman work done while at IBM Research - Almaden

  2. Mean estimation Given ? i.i.d. samples of random variable ? ?1,?2, ,?? estimate ?[?] Use the empirical mean 1 ? ??? ? Error ~ ? for ? = ?[?2] ? ?2 ???[?] = Limited access to data samples: One bit per sample Statistical query model 2

  3. One bit per sample 1 bit-per-sample oracle:[Ben-David, Dichterman 2000] For input distribution ? over domain ? Given ?:? 0,1 , outputs ?(?) for fresh ? ? Distributed/low-communication learning/stats [Rajagopal,Wainwright,Varaiya 06; Zhang,Duchi,Jordan,Wainwright 13; Steinhardt,Duchi 15; Steinhardt,Valiant,Wager 16; Suresh,Yu,McMahan,Kumar 17] Sensor networks [Luo 05; Ribeiro,Giannakis 06; .] 3

  4. Range anxiety Assume ? [0,?] Unbiased estimator: For each sample: Pick ? uniformly in [0,?] define ??? = Ind(? > ?) Given responses ?1, ,?? return ? 1 ? ??? Unbiased: ??,?[? ??? ] = ?[?] ? ? ? ? ? ? ? ? ? Standard deviation: ? ?[?2] ? ?2 ? ?[?2] ? Grows with ? and could be much worse than 4

  5. Reducing dynamic range ? 0 ?/2 ?/2 ?/4 ? = ?? ?=0 ? 2 ?? ? Ind ? 2? < ? ? ?0= ? Ind ? 2? 1 ? ?? ?/2? 1 ?/ Error for ?? : ? ?2 ?/2?2 ? ?2 ?/2?+1 Chebyshev: Pr ? > ?/2? ? ?? ? ?2 ?/ + ? ?2 ? ? 2 = log ?3/2 +? Total error: ? Still depends on ? ! 5

  6. Quantile truncation Let ? = 1 ? quantile of ?: Pr ? ? = ? ? ?2 ? 1. ? 2. ? ? Ind ? > ? ? ? ?2 1 ? truncation to [0,?] gives error: For ? = log ?3/2? ?2 ? 6

  7. Quantile estimation Find ? such that Pr ? ? = ? Pr ? ? ? For every ? can estimate Pr ? ? within Find 1 ? quantile ? via approximate binary search Discretize [0,?] with step ? ? log ?/? /? Get ? log ?/? and discretization error ? ? 7

  8. Centering ? ?2 ? ? ? Error so far: log ?3/2 log + ? 1 4,3 Let ?0 be such that Pr ? ?0 Define ?+= ? Ind ? ?0, ? = ? Ind(? < ?0) Then: ? ?+ 4 2+ ? ? 2 5 ???[?] Proof: ?0 ?[?] 2? Given ? s.t. ? ?2 ?2 and any ? > 0, the algo has log ?3log ?/? ? estimation error O ? + ? 8

  9. Statistical problems ?1,?2, ,?? ? over ?

  10. Statistical query model [Kearns 93] ?1 ?1 ?2 ?2 ?? ?? SQ algorithm STAT?(?) oracle ?1:? 0,1 ?1 ?? ??1? ? ? is the proxy for the number of samples ? 1/ ? Not faithful for lower bounds!

  11. Simulating ideal SQ oracle Ideal oracle iSTAT?(?): tolerance ???? ??(?) /? Beyond current lower bound techniques VSTAT?(?): [F.,Grigorescu,Reyzin,Vempala,Xiao 13] For ?:? [0,1] returns ? 1 ?, ? ? (1 ? ? ) ? ? ? ? max Possible to analyze and prove lower bounds! Can be simulated using ?(?)1-bit-per-sample queries Can simulate up to ?1-bit-per-sample queries iSTAT?(?) can be implemented using VSTAT?( ?(?)) 11

  12. Conclusions Large range/heavy tails are a hassle when access to data is limited (Asymptotically) reasonable approach for dealing with it To do: better/simpler/more practical algorithms 12

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#