Distribution Testing Algorithms for Property and Equivalence Testing

Slide Note
Embed
Share

Distributional Property Testing involves determining if a sample satisfies a given property or is from a specific distribution. The ANACONDA algorithm and other methods are used to test for uniformity, identity, and equivalence of distributions in various domains. Results show complexities and challenges in avoiding lower bounds.


Uploaded on Aug 04, 2024 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. ANACONDA: A Non-Adaptive CONDitional Sampling Algorithm for Distribution Testing Gautam Kamath Simons Institute University of Waterloo ACM-SIAM Symposium on Discrete Algorithms (SODA 2019) January 6, 2019 Joint work with Christos Tzamos (UW Madison) ChrHISSSSStos

  2. Distributional Property Testing Given samples from a distribution ?, does it satisfy some property ?, or is it far from all distributions which do?

  3. Distributional Property Testing Given sample access to an unknown distribution ? over [?], determine whether ? = ?, or ?TV?,? ? Discrete domain [?], ? is large ?: some distribution of interest ?TV = total variation distance =1 ?: distance parameter, think of as a small constant Correct with probability 2/3 2 1-distance

  4. Distributional Property Testing Uniformity Testing: Is ? the uniform distribution on [?]? Identity Testing: Is ? equal to some known distribution ?? AKA goodness-of-fit, one-sample testing Equivalence Testing: Given samples from ?,?, are they equal? AKA closeness, two-sample testing

  5. Distribution Testing Results SAMP (?1/2/?2) [1,2] Uniformity Identity Equivalence [1] Paninski, Transactions on Information Theory 2008 [2] G. Valiant, P. Valiant, FOCS 2014

  6. Distribution Testing Results SAMP (?1/2/?2) [1,2] Uniformity (?1/2/?2) [2] Identity Equivalence [1] Paninski, Transactions on Information Theory 2008 [2] G. Valiant, P. Valiant, FOCS 2014

  7. Distribution Testing Results SAMP (?1/2/?2) [1,2] Uniformity (?1/2/?2) [2] Identity (max(?2/3/?4/3, ?/?2)) [3] Equivalence All complexities are ??, for 0 < ? < 1 Strongly sublinear, but still polynomial in ? Still costly if ? is very large! Can we avoid information-theoretic lower bounds? [1] Paninski, Transactions on Information Theory 2008 [2] G. Valiant, P. Valiant, FOCS 2014 [3] Chan, Diakonikolas, G. Valiant, P. Valiant, SODA 2014

  8. Conditional Sampling: COND A stronger query access to distributions [Chakraborty, Fischer, Goldhirsh, Matsliah 13], [Canonne, Ron, Servedio 14] Algorithm chooses query set ? [?] Receives sample from ?, conditioned on being from ? ??(?) = ?(?)/?(?) Many testing problems become dramatically cheaper!

  9. Distribution Testing with Conditional Samples SAMP COND (1/?2) [4] (?1/2/?2) Uniformity (?1/2/?2) Identity (max(?2/3/?4/3, ?/?2)) Equivalence [4] Canonne, Ron, Servedio, SODA 2014

  10. Distribution Testing with Conditional Samples SAMP COND (1/?2) [4] (1/?2) [5] (?1/2/?2) Uniformity (?1/2/?2) Identity (max(?2/3/?4/3, ?/?2)) Equivalence [4] Canonne, Ron, Servedio, SODA 2014 [5] Falahatgar, Jafarpour, Orlitsky, Pichapati, Suresh, COLT 2015

  11. Distribution Testing with Conditional Samples SAMP COND (1/?2) [4] (1/?2) [5] (?1/2/?2) Uniformity (?1/2/?2) Identity (max(?2/3/?4/3, ?/?2)) log (1)log? [5,6] Equivalence Uniformity and identity testing become very cheap Lose dependence on ? Equivalence is qualitatively different: doubly-logarithmic in ? UB of ??(loglog?) [5], LB of loglog? [6] [4] Canonne, Ron, Servedio, SODA 2014 [5] Falahatgar, Jafarpour, Orlitsky, Pichapati, Suresh, COLT 2015 [6] Acharya, Canonne, K., RANDOM 2015

  12. The Power of Adaptivity? COND: adaptive CONDitional queries Algorithm submits ?1, observes ?1 ??1, submits ?2, observes ?2 ??2, etc. How much power is due to adaptivity? NACOND: Non-Adaptive CONDitional queries Algorithm submits ?1, ,??, then observes (?1, ,??) (??1, ,???)

  13. Previously known about NACOND SAMP COND NACOND (1/?2) (?1/2/?2) ?(poly(log?,1/?)) [7] (log?) [6] Uniformity (1/?2) (?1/2/?2) Identity (max(?2/3/?4/3, ?/?2)) log (1)log? Equivalence [6] Acharya, Canonne, K., RANDOM 2015 [7] Chakraborty, Fischer, Goldhirsh, Matsliah, ITCS 2013

  14. Previously known about NACOND SAMP COND NACOND (1/?2) (?1/2/?2) ?(poly(log?,1/?)) [7] (log?) [6] Uniformity (1/?2) (?1/2/?2) ?(poly(log?,1/?)) [7] (log?) Identity (max(?2/3/?4/3, ?/?2)) log (1)log? Equivalence [6] Acharya, Canonne, K., RANDOM 2015 [7] Chakraborty, Fischer, Goldhirsh, Matsliah, ITCS 2013

  15. Previously known about NACOND SAMP COND NACOND (1/?2) (?1/2/?2) ?(poly(log?,1/?)) [7] (log?) [6] Uniformity (1/?2) (?1/2/?2) ?(poly(log?,1/?)) [7] (log?) Identity (max(?2/3/?4/3, ?/?2)) ?(max(?2/3/?4/3, ?/?2)) (log?) log (1)log? Equivalence Uniformity and identity are between SAMP and COND Polylogarithmic, versus polynomial and constant Complexity of equivalence testing was wide open Should it be polynomial or polylogarithmic? [6] Acharya, Canonne, K., RANDOM 2015 [7] Chakraborty, Fischer, Goldhirsh, Matsliah, ITCS 2013

  16. Results SAMP COND NACOND (1/?2) (?1/2/?2) log ? ?2 Uniformity ? (log?) (1/?2) log2? ?2 (?1/2/?2) Identity ? (log?) log12? ?2 (log?) (max(?2/3/?4/3, ?/?2)) log (1)log? Equivalence ? Equivalence testing is polylogarithmic in ? Nearly-optimal algorithm for testing uniformity Identity testing is via reduction from uniformity [CFGM 13] Simple to describe and intuitive algorithm for uniformity, equivalence

  17. Results SAMP COND NACOND (1/?2) (?1/2/?2) log ? ?2 Uniformity ? (log?) (1/?2) log2? ?2 (?1/2/?2) Identity ? (log?) log12? ?2 (log?) (max(?2/3/?4/3, ?/?2)) log (1)log? Equivalence ? Equivalence testing is polylogarithmic in ? Nearly-optimal algorithm for testing uniformity Identity testing is via reduction from uniformity [CFGM 13] Simple to describe and intuitive algorithm for uniformity, equivalence

  18. Uniformity Testing: Two Simple Cases Error is a single spike 1 ?+ ?, ? ? = Query ? = [?], ? ? = 1 With ? = (1/?) samples: ??1 = ? 1 1 ? ? ? 1 = ? 1 1 ?+ ? ? 1 ? 100

  19. Uniformity Testing: Two Simple Cases Error is a spread uniformly ? ? =(1 2?) ? ? = ?,? for random ?,? [?] W.c.p, pick opposite signs: ? ? = 2/? With ? = (1/?2) samples: ??? ? ? 100 1 ? 2+? 2 1 ? ? 2

  20. Common observations Choose a set ? of the appropriate size Randomly, in general With reasonable probability, one element will stick out ? ? s.t. ??? 1/|?| Take (1/?2) NACOND queries from ? By DKW inequality, empirical probabilities are off by at most Reduced from ?TV-testing (expensive) to -testing (cheap) What is the appropriate size of ?? Guess! ? 100

  21. ANACONDA Given: ?,? (NACOND oracle or explicit description), ? 1. Choose ?, a random power of 2 in {1,2,4, ,?/2,?} 2. Choose ? [?], a random set of size ? 3. Repeatedly query ?, check if ? ?, If so, output ? ? 4. Repeat 1,2,3 several times Non-adaptive Same algorithm works for uniformity, equivalence ??? ??? 0

  22. Uniformity Testing Analysis Sketch Key lemma: With reasonable probability (over choice of ?), ? ? ? ? 1 ? ? s.t. = ? |?| If two vectors are far in 1, a random set is likely to witness a relatively large distance in If many small discrepancies ( small witness set ): ? = 2, good prob. of biases which are oppositely signed and suff. large If not many small discrepancies ( large witness set ): ? ? with ? ? 1 ?= ? w.p. 1+? ? ? by Markov s inequality ? ? 1 log ? (counting argument) ? ? ?

  23. Conclusions and Open Questions The first polylogarithmic non-adaptive algorithm for equivalence Near-optimal algorithm for uniformity, better algorithm for identity One unified algorithm Open Questions Optimal sample complexity of identity and equivalence? Tolerant testing? Adaptive tolerant uniformity testing drops from ?(?/log?) to ?(1) [4] Independence Testing? [Bhattacharyya Chakraborty 18]: ??(?5loglog?) in restricted adaptive model

  24. Thanks!

Related


More Related Content