Exploration of Learning and Privacy Concepts in Machine Learning
A comprehensive discussion on various topics such as Local Differential Privacy (LDP), Statistical Query Model, PAC learning, Margin Complexity, and Known Results in the context of machine learning. It covers concepts like separation, non-interactive learning, error bounds, and the efficiency of learning algorithms. Different models and algorithms are explored, shedding light on their applications and implications in the field of machine learning.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Learning without interaction requires separation Vitaly Feldman Brain with Amit Daniely Hebrew University
Local Differential Privacy (LDP) ?1 [KLNRS 08] ?-LDP if for every user ?, message ? is sent using a local ??,?-DP randomizer ??,?and ?2 ??,? ? ?3 ? Server ??
Non-interactive LDP ?1 ?2 ?3 Server ??
PAC learning PAC model [Valiant 84]: Let ? be a set of binary classifiers over ? ? is a PAC learning algorithm for ? if ? ? and distribution ? over ?, given i.i.d. examples ??,? ?? for ?? ?, ? outputs such that w.h.p. ?? ? ? ? ?(?) ? Distribution-specific learning: ? is fixed and known to ? 4
Statistical query model [Kearns 93] ?1 ?1 ?2 ?2 ? distribution over ? ? = ? { 1} ? is the distribution of (?,? ? ) for ? ? ? ?? ?? SQ algorithm SQ oracle ?1 ?? ??1? ? ?1:? 0,1 ? is tolerance of the query; ? = 1/ ? [KLNRS 08] Simulation with success prob. 1 ?(? 1) ?-LDP with ? messages ? ? queries with ? = ? ? ? log ?/? ??2 ? queries with tolerance ? ?-LDP with ? = ? samples/messages Non-interactive if and only if queries are non-adaptive
Known results ? is SQ-learnable efficiently (non-adaptively) if and only if learnable efficiently with ?-LDP (non-interactively) Examples: Yes: halfspaces/linear classifiers [Dunagan,Vempala 04] No: parity functions [Kearns 93] Yes, non-adaptively: Boolean conjunctions [KLNRS 08] There exists ? that is 1. SQ/LDP-learnable efficiently over the uniform distribution on 0,1? but 2. requires exponential num. of samples to learn non-interactively by an LDP algorithm [KLNRS 08]: Does separation hold for distribution-independent learning? Masked parity 6
Margin Complexity - - - - - - Margin complexity of ? over ? - ??(?): smallest ? such that exists an embedding :? ??(1) under which every ? ? is linearly separable with margin ? - - - - - 1 ? - + ++ Positive examples ? ? ? = +1} Negative examples ? ? ? = 1} + + + + + + + + + 7
Lower bound Thm: Let ? be a negation-closed set of classifiers. Any non-interactive 1-LPD algorithm that learns ? with error ? < 1/2 and success probability 1 needs ? = ?? ?2/3 Corollaries: Decision lists over 0,1?: ? = 2 ?1/3 [Buhrman,Vereshchagin,de Wolf 07] (Interactively) learnable with ? = poly ? ??[Kearns 93] Linear classifiers over 0,1?: ? = 2 ? [Goldmann,Hastad,Razborov 92; Sherstov 07] (Interactively) learnable with ? = poly ? ??[Dunagan,Vempala 04] 8
Upper bound Thm: For any ? and distribution ? there exists a non-adaptive ?-LPD algorithm that learns ? over ? with error ? and success probability 1 ? using ? = poly ?? ? log 1/? ?? Instead of fixed ? access to public unlabeled samples from ? (interactive) LDP access to unlabeled samples from ? Lower bound holds against the hybrid model 9
Lower bound technique Thm: Let ? be a negation-closed set of classifiers. If exists a non-adaptive SQ algorithm that uses ? queries of tolerance 1/? to learn ? with error ? < 1/2 and success probability 1 then ?? ? = ? ?3/2 Correlation dimension of ? - CSQdim(?)[F. 08] : smallest ? for which exist ? functions 1, , ?:? [ 1,1] such that for every ? ? and distribution ? exists ? such that 1 ? ?? ? ?? ? ? Thm: [F. 08; Kallweit,Simon 11]: ?? ? CSQdim ?3/2 10
Proof If exists a non-adaptive SQ algorithm ? that uses ? queries of tolerance 1/? to learn ? with error ? < 1/2 then CSQdim ? ? Let ?1, ,??:? 1 0,1be the (non-adaptive) queries of ? Decompose ? ?,? =? ?,1 + ? ?, 1 +? ?,1 ? ?, 1 ? 2 2 ? ? ???(?,? ? ) = ? ? ? ???? + ? ? ?? ? ?? 1 If ? ? then ? ? ?? ? ?? ? ???(?,? ? ) ? ? ???(?, ? ? ) If this holds for all ? [?], then the algorithm cannot distinguish between ? and ? Cannot achieve error < 1/2 11
Upper bound Thm: For any ? and distribution ? there exists a non-adaptive ?-LPD algorithm that learns ? over ? with error ? < 1/2 and success probability 1 ? using ? = poly ?? ? log 1/? ?? Margin complexity of ? over ? - ??(?): smallest ? such that exists an embedding :? ??(1) under which every ? ? is linearly separable with margin ? ? 1 Thm [Arriaga,Vempala 99; Ben-David,Eiron,Simon 02]: For every every ? ?, random projection into ??(1) for ? = ?(?? ?2log(1/?)) ensures that with prob. 1 ?, 1 ? fraction of points are linearly separable with margin ? 1 ? ?? ? 12
Algorithm Perceptron: if sign( ??,? ) ? then update ??+1 ??+ ?? Expected update: ? (?,?) ??? | sign( ??,? ) ? || scalar ? (?,?) ??? ?(sign( ??,? ) ?) / || (?,?) ?? ? sign ??,? ? (?,?) ?sign( ??,? ) ? ?? ?,? ??? + ? ?,? ?? sign ??,? 2 ? = ? 2 independent of the label non-adaptive Estimate the mean vector with 2 error LDP [Duchi,Jordan,Wainright 13] SQs [F.,Guzman,Vempala 15] 13
Conclusions New approach to lower bounds for non-interactive LDP o Reduction to margin-complexity lower bounds Lower bounds for classical learning problems Same results for communication constrained protocols o Also equivalent to SQ Interaction is necessary for learning Open: o Distribution-independent learning in poly ?? ? o Lower bounds against 2 + round protocols o Stochastic convex optimization https://arxiv.org/abs/1809.09165 https://arxiv.org/abs/1809.09165 https://arxiv.org/abs/1809.09165 14