Cryptographic Reductions and Learning in Computational Complexity

 
CS 224: ALGORITHMS FOR DATA SCIENCE
CS 224: ALGORITHMS FOR DATA SCIENCE
LECTURE 19 (11/13)
LECTURE 19 (11/13)
 
Tensor decomposition
Tensor decomposition
Sum-of-squares
Sum-of-squares
Robustness
Robustness
Supervised learning
Supervised learning
C
C
o
o
m
m
p
p
u
u
t
t
a
a
t
t
i
i
o
o
n
n
a
a
l
l
 
 
c
c
o
o
m
m
p
p
l
l
e
e
x
x
i
i
t
t
y
y
Statistical physics
Statistical physics
Generative modeling
Generative modeling
 
Early connections to cryptography
Early connections to cryptography
(pseudorandom functions, public-key crypto, agnostic noise)
(pseudorandom functions, public-key crypto, agnostic noise)
 
Crypto lower bound for learning MLPs over Gaussians
Crypto lower bound for learning MLPs over Gaussians
(Goldreich’s PRG, Daniely-Vardi lifting)
(Goldreich’s PRG, Daniely-Vardi lifting)
 
Other results and outlook
Other results and outlook
(Continuous LWE, complexity recap)
(Continuous LWE, complexity recap)
 
Disclaimer: 
I’m not a cryptographer, take Prof. Barak’s CS 127 if you want to
learn the crypto fundamentals for real!
 
C
R
Y
P
T
O
G
R
A
P
H
Y
 
A
N
D
 
L
E
A
R
N
I
N
G
 
From Valiant’s “A theory for the learnable”
P
S
E
U
D
O
R
A
N
D
O
M
 
F
U
N
C
T
I
O
N
 
F
A
M
I
L
I
E
S
 
(
P
R
F
)
P
R
F
S
 
F
R
O
M
 
O
N
E
-
W
A
Y
 
F
U
N
C
T
I
O
N
S
P
R
F
S
 
F
R
O
M
 
O
N
E
-
W
A
Y
 
F
U
N
C
T
I
O
N
S
 
Proof:
Proof:
 a polynomial-time PAC
 a polynomial-time PAC
learning algorithm would yield a
learning algorithm would yield a
polynomial-time adversary for
polynomial-time adversary for
distinguishing whether the
distinguishing whether the
underlying function is truly
underlying function is truly
random or from a PRF
random or from a PRF
 
Pros:
Pros:
Assumption is very mild
Assumption is very mild
Learner is very powerful
Learner is very powerful
Cons:
Cons:
worst-case lower bound
worst-case lower bound
class of functions too powerful
class of functions too powerful
P
U
B
L
I
C
-
K
E
Y
 
E
N
C
R
Y
P
T
I
O
N
P
U
B
L
I
C
-
K
E
Y
 
E
N
C
R
Y
P
T
I
O
N
 
(
E
X
A
M
P
L
E
S
)
P
U
B
L
I
C
-
K
E
Y
 
E
N
C
R
Y
P
T
I
O
N
 
(
E
X
A
M
P
L
E
S
)
P
U
B
L
I
C
-
K
E
Y
 
E
N
C
R
Y
P
T
I
O
N
 
+
 
H
A
R
D
N
E
S
S
 
O
F
 
L
E
A
R
N
I
N
G
P
U
B
L
I
C
-
K
E
Y
 
E
N
C
R
Y
P
T
I
O
N
 
+
 
H
A
R
D
N
E
S
S
 
O
F
 
L
E
A
R
N
I
N
G
D
I
S
T
R
I
B
U
T
I
O
N
-
S
P
E
C
I
F
I
C
 
H
A
R
D
N
E
S
S
:
 
N
O
I
S
E
 
H
E
L
P
S
D
I
S
T
R
I
B
U
T
I
O
N
-
S
P
E
C
I
F
I
C
 
H
A
R
D
N
E
S
S
:
 
N
O
I
S
E
 
H
E
L
P
S
T
A
K
E
A
W
A
Y
S
 
S
O
 
F
A
R
 
Typically in the past, cryptographic assumptions have been great for proving
Typically in the past, cryptographic assumptions have been great for proving
lower bounds for learning 
lower bounds for learning 
worst-case functions 
worst-case functions 
either over 
either over 
worst-case input
worst-case input
distributions, 
distributions, 
or over 
or over 
benign input distribution but with label noise
benign input distribution but with label noise
Additionally, the settings are all discrete in nature!
Additionally, the settings are all discrete in nature!
 
Rest of lecture:
Rest of lecture:
- Cryptographic lower bound for learning neural networks over 
- Cryptographic lower bound for learning neural networks over 
Gaussian inputs
Gaussian inputs
 
Briefly at the end:
Briefly at the end:
- Cryptographic lower bound for learning mixtures of Gaussians
- Cryptographic lower bound for learning mixtures of Gaussians
 
Early connections to cryptography
Early connections to cryptography
(pseudorandom functions, public-key crypto, agnostic noise)
(pseudorandom functions, public-key crypto, agnostic noise)
 
Crypto lower bound for learning MLPs over Gaussians
Crypto lower bound for learning MLPs over Gaussians
(Goldreich’s PRG, Daniely-Vardi lifting)
(Goldreich’s PRG, Daniely-Vardi lifting)
 
Other results and outlook
Other results and outlook
(Continuous LWE, complexity recap)
(Continuous LWE, complexity recap)
C
R
Y
P
T
O
 
H
A
R
D
N
E
S
S
 
F
O
R
 
M
L
P
S
 
Cryptographic hardness
Cryptographic hardness
 for learning a 
 for learning a 
real-valued function 
real-valued function 
over a
over a
”nice distribution” 
”nice distribution” 
without 
without 
label noise
label noise
!
!
 
Unlike the CSQ lower bound we saw previously, this hardness result
Unlike the CSQ lower bound we saw previously, this hardness result
pertains to arbitrary polynomial-time computation, though the class
pertains to arbitrary polynomial-time computation, though the class
of functions is more complex– depth 3 vs depth 2
of functions is more complex– depth 3 vs depth 2
 
(see board)
(see board)
 
[
[
C
C
-Gollakota-Klivans-Meka ‘22]: 
-Gollakota-Klivans-Meka ‘22]: 
Daniely-Vardi’s lifting can be refined
Daniely-Vardi’s lifting can be refined
to get rid of outer activation, and can be extended to give both full
to get rid of outer activation, and can be extended to give both full
SQ lower bounds and lower bounds in the membership query model
SQ lower bounds and lower bounds in the membership query model
 
[Daniely-Srebro-Vardi ‘23]: 
[Daniely-Srebro-Vardi ‘23]: 
Modification of the reduction shows that
Modification of the reduction shows that
even in the smoothed complexity setting, depth-4 MLPs are hard to
even in the smoothed complexity setting, depth-4 MLPs are hard to
learn over Gaussian inputs
learn over Gaussian inputs
S
U
B
S
E
Q
U
E
N
T
 
W
O
R
K
 
Early connections to cryptography
Early connections to cryptography
(pseudorandom functions, public-key crypto, agnostic noise)
(pseudorandom functions, public-key crypto, agnostic noise)
 
Crypto lower bound for learning MLPs over Gaussians
Crypto lower bound for learning MLPs over Gaussians
(Goldreich’s PRG, Daniely-Vardi lifting)
(Goldreich’s PRG, Daniely-Vardi lifting)
 
Other results and outlook
Other results and outlook
(Continuous LWE, complexity recap)
(Continuous LWE, complexity recap)
 
Key insight:
Key insight:
 if one takes the parallel pancakes instance but with equally spaced
 if one takes the parallel pancakes instance but with equally spaced
components, looks like solving noisy linear equations “mod 1”
components, looks like solving noisy linear equations “mod 1”
This is the 
This is the 
continuous
continuous
 analogue of solving noisy linear equations over finite
 analogue of solving noisy linear equations over finite
fields (“
fields (“
learning with errors (LWE)
learning with errors (LWE)
”), which is known to be hard under the
”), which is known to be hard under the
same assumption
same assumption
[Gupte-Vafa-Vaikuntanathan ‘22]:
[Gupte-Vafa-Vaikuntanathan ‘22]:
 direct reduction from LWE
 direct reduction from LWE
Subsequent applications to hardness of learning halfspaces, e.g. 
Subsequent applications to hardness of learning halfspaces, e.g. 
[Tiegel ‘23]
[Tiegel ‘23]
O
T
H
E
R
 
R
E
C
E
N
T
 
C
R
Y
P
T
O
 
L
O
W
E
R
 
B
O
U
N
D
S
C
O
M
P
L
E
X
I
T
Y
 
R
E
C
A
P
 
Strong connections between cryptography / average-case complexity and ML
Strong connections between cryptography / average-case complexity and ML
theory
theory
, but practically relevant reductions tricky because the former are
, but practically relevant reductions tricky because the former are
typically 
typically 
discrete
discrete
 and either correspond to 
 and either correspond to 
unnatural input distributions 
unnatural input distributions 
or
or
require 
require 
agnostic noise to simulate
agnostic noise to simulate
 
Bulk of work on lower bounds has been in 
Bulk of work on lower bounds has been in 
restricted models like SQ
restricted models like SQ
(surprisingly robust, main exception is Gaussian elimination)
(surprisingly robust, main exception is Gaussian elimination)
hardness is based on engineering instances with structural features that
hardness is based on engineering instances with structural features that
preclude certain algorithms. convenient by virtue of 
preclude certain algorithms. convenient by virtue of 
recipes
recipes
E.g. 
E.g. 
parity <-> low-degree approximation
parity <-> low-degree approximation
, 
, 
Gaussian mixtures <-> moment
Gaussian mixtures <-> moment
methods / dimensionality reduction
methods / dimensionality reduction
These instances show (surprisingly!) that 
These instances show (surprisingly!) that 
not only do those algorithms fail,
not only do those algorithms fail,
but any algorithm will fail
but any algorithm will fail
C
O
M
P
L
E
X
I
T
Y
 
R
E
C
A
P
 
Lower bounds are a powerful way of 
Lower bounds are a powerful way of 
introspecting about modeling choices
introspecting about modeling choices
e.g. once upon a time distribution-free agnostic learning was a great target
e.g. once upon a time distribution-free agnostic learning was a great target
to shoot for, but turned out to be too good to be true
to shoot for, but turned out to be too good to be true
Realizing a model is too general often requires requires taking a
Realizing a model is too general often requires requires taking a
computational, rather than a statistical, lens
computational, rather than a statistical, lens
 
Even seemingly benign settings (continuous functions, Gaussian inputs,
Even seemingly benign settings (continuous functions, Gaussian inputs,
smoothed MLP weights) can be computationally hard!
smoothed MLP weights) can be computationally hard!
 
T
H
E
 
F
I
N
A
L
 
S
T
R
E
T
C
H
sum-of-squares 
sum-of-squares 
optimization
optimization
tensors
tensors
robust statistics
robust statistics
supervised
supervised
learning
learning
computational
computational
complexity
complexity
 
 
physics
physics
,
,
inference
inference
, 
, 
and
and
sampling
sampling
Slide Note

Embed
Share

This lecture explores the connection between computational complexity and cryptography, focusing on topics like pseudorandom functions, public-key cryptography, and learning from Gaussians. It delves into the implications of cryptographic reductions, lower bounds for learning MLPs, and the existence of pseudorandom function families based on one-way functions. The content emphasizes the importance of understanding cryptographic fundamentals for real-world application.

  • Cryptography
  • Computational Complexity
  • Cryptographic Reductions
  • Pseudorandom Functions
  • Learning

Uploaded on Mar 23, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Computational complexity Computational complexity CS 224: ALGORITHMS FOR DATA SCIENCE LECTURE 19 (11/13) CRYPTOGRAPHIC REDUCTIONS AND LEARNING

  2. Early connections to cryptography (pseudorandom functions, public-key crypto, agnostic noise) Crypto lower bound for learning MLPs over Gaussians (Goldreich s PRG, Daniely-Vardi lifting) Other results and outlook (Continuous LWE, complexity recap) Disclaimer: I m not a cryptographer, take Prof. Barak s CS 127 if you want to learn the crypto fundamentals for real!

  3. CRYPTOGRAPHY AND LEARNING CRYPTOGRAPHY AND LEARNING From Valiant s A theory for the learnable

  4. PSEUDORANDOM FUNCTION FAMILIES (PRF) PSEUDORANDOM FUNCTION FAMILIES (PRF) Roughly, a pseudorandom function family (PRF) is a family ? of 2? functions 0,1? 0,1?which fools any polynomial-time adversary in the following sense: ? is promised to come from one of two scenarios: 1. (Random) ? sampled uniformly from set of all Boolean functions 2. (Pseudorandom) ? is sampled from the PRF ? The adversary can query ? polynomially many times at arbitrary inputs and must distinguish with nontrivial advantage which scenario we are in

  5. PRFS FROM ONE PRFS FROM ONE- -WAY FUNCTIONS WAY FUNCTIONS A one-way function (OWF) is a function ?: 0,1? 0,1? which is efficiently computable but hard to invert, i.e. given ? = ? ? for a random ?, it is computationally hard to find ? such that ? ? = ? minimal assumption necessary for cryptography to be possible Thm [Goldreich-Goldwasser-Micali 84]: Pseudorandom function families exist if one-way functions exist. Corollary [Valiant 84]: Poly-sized circuits are hard to PAC learn, even given the ability to query them at arbitrary inputs ( learning from membership queries )

  6. PRFS FROM ONE PRFS FROM ONE- -WAY FUNCTIONS WAY FUNCTIONS Corollary [Valiant 84]: Poly-sized circuits are hard to PAC learn, even given the ability to query them at arbitrary inputs ( learning from membership queries ) Proof: a polynomial-time PAC learning algorithm would yield a polynomial-time adversary for distinguishing whether the underlying function is truly random or from a PRF Pros: Assumption is very mild Learner is very powerful Cons: worst-case lower bound class of functions too powerful

  7. PUBLIC PUBLIC- -KEY ENCRYPTION KEY ENCRYPTION Stronger assumption, but hardness for weaker classes of functions Three algorithms: gen, enc, and dec gen(??): outputs public key Pk and secret key Sk enc(Pk,m): (randomly) encrypts ? 0,1 into poly ? bits under Pk dec(Sk,c): decrypts a ciphertext c under Pk Correctness: dec(Sk,enc(Pk,?)) = ? with all but negl. failure probability Security: can t distinguish b/t encryption of 0 vs. 1, i.e. for all poly-time algs, 1 Pr Pk,Sk? Pk,enc Pk = 1 Pr Pk,Sk? Pk,enc Pk = 0 poly ?

  8. PUBLIC PUBLIC- -KEY ENCRYPTION (EXAMPLES) KEY ENCRYPTION (EXAMPLES) Trapdoor function: One-way function such that there exists efficient algorithm ? that takes as input ? ? and a trapdoor string ? and outputs ? for which ? ? = ? ? Example: RSA - ? ?? mod ? for ? = ??, ? relatively prime to ? ? = (? 1)(? 1) - trapdoor: ? ? 1 mod ? ?

  9. PUBLIC PUBLIC- -KEY ENCRYPTION (EXAMPLES) KEY ENCRYPTION (EXAMPLES) Trapdoor function: One-way function such that there exists efficient algorithm ? that takes as input ? ? and a trapdoor string ? and outputs ? for which ? ? = ? ? gen(??): Pk is trapdoor function ?, secret key Sk is trapdoor ? enc(Pk,m): encrypts a message ? by mapping through ? dec(Sk,c): decrypts a ciphertext c by inverting using ? Note: doesn t quite work, need to define e.g. hard core predicates

  10. PUBLIC PUBLIC- -KEY ENCRYPTION + HARDNESS OF LEARNING KEY ENCRYPTION + HARDNESS OF LEARNING Lemma [Kearns-Valiant 94]: Given a public-key cryptosystem (gen, enc, dec), if there is a function class ?that contains all the decryption functions dec Sk, for all choices of key, then if ? is distribution-free PAC-learnable in polynomial time, then there is a polynomial-time distinguisher between encryptions of 0 and 1. Proof: train on examples generated via 1. Pick either m = 0 or m = 1 w.p. 2. Generate labeled example enc Pk,? ,? A PAC learning algorithm would return a good approximation for the decryption function and thus violate security assumption

  11. PUBLIC PUBLIC- -KEY ENCRYPTION + HARDNESS OF LEARNING KEY ENCRYPTION + HARDNESS OF LEARNING Examples of hardness results proved using this observation: [Kearns-Valiant 94]: poly-sized Boolean formulas, deterministic finite automata of poly size, poly-sized threshold circuits with constant depth [Klivans-Sherstov 06]: intersections of poly many halfspaces, poly- size depth-2 circuits with majority gates, depth-3 poly-size arithmetic circuits [Applebaum-Barak-Wigderson 09]: Boolean functions that only depend on log ? variables in the input (juntas) Drawback: input distribution is unnatural (distribution over encryptions)

  12. DISTRIBUTION DISTRIBUTION- -SPECIFIC HARDNESS: NOISE HELPS SPECIFIC HARDNESS: NOISE HELPS For learning problems where there is label noise, e.g. agnostic learning, relatively easy to prove hardness even when the input distribution is a known nice distribution like Unif 1? Recall learning parity with noise: ? > 0, random ? ? , given dataset ?1,?1, , ??,?? as follows: ? = ?? w.p.1 ? otherwise ?~ 1?, ?? LPN and in particular its larger finite field size variant, Learning with Errors (LWE), forms the basis of various post-quantum cryptosystems (more on this at the end)

  13. DISTRIBUTION DISTRIBUTION- -SPECIFIC HARDNESS: NOISE HELPS SPECIFIC HARDNESS: NOISE HELPS For learning problems where there is label noise, e.g. agnostic learning, relatively easy to prove hardness even when the input distribution is a known nice distribution like Unif 1? Lemma [Kalai-Klivans-Mansour-Servedio 06]: A polynomial-time algorithm for agnostically learning halfspaces over the uniform distribution implies a polynomial-time algorithm for learning parity with noise. Proof idea: majority function correlates with the parity function nontrivially, so if one could approximately learn the former, one could distinguish noisy parity dataset from purely random labels (see notes for details)

  14. TAKEAWAYS SO FAR TAKEAWAYS SO FAR Typically in the past, cryptographic assumptions have been great for proving lower bounds for learning worst-case functions either over worst-case input distributions, or over benign input distribution but with label noise Additionally, the settings are all discrete in nature! Rest of lecture: - Cryptographic lower bound for learning neural networks over Gaussian inputs Briefly at the end: - Cryptographic lower bound for learning mixtures of Gaussians

  15. Early connections to cryptography (pseudorandom functions, public-key crypto, agnostic noise) Crypto lower bound for learning MLPs over Gaussians (Goldreich s PRG, Daniely-Vardi lifting) Other results and outlook (Continuous LWE, complexity recap)

  16. CRYPTO HARDNESS FOR MLPS CRYPTO HARDNESS FOR MLPS Lemma [Daniely-Vardi 20]: Assuming security of Goldreich s pseudorandom generator, MLP s of depth three are hard to learn under Gaussian inputs. Cryptographic hardness for learning a real-valued function over a nice distribution without label noise! Unlike the CSQ lower bound we saw previously, this hardness result pertains to arbitrary polynomial-time computation, though the class of functions is more complex depth 3 vs depth 2 (see board)

  17. SUBSEQUENT WORK SUBSEQUENT WORK [C-Gollakota-Klivans-Meka 22]: Daniely-Vardi s lifting can be refined to get rid of outer activation, and can be extended to give both full SQ lower bounds and lower bounds in the membership query model [Daniely-Srebro-Vardi 23]: Modification of the reduction shows that even in the smoothed complexity setting, depth-4 MLPs are hard to learn over Gaussian inputs

  18. Early connections to cryptography (pseudorandom functions, public-key crypto, agnostic noise) Crypto lower bound for learning MLPs over Gaussians (Goldreich s PRG, Daniely-Vardi lifting) Other results and outlook (Continuous LWE, complexity recap)

  19. OTHER RECENT CRYPTO LOWER BOUNDS OTHER RECENT CRYPTO LOWER BOUNDS Lemma [Bruna-Regev-Song-Tang 21]: Under the assumption that there are no polynomial-time quantum algorithms for certain (worst- case) lattice problems, learning mixtures of Gaussians is hard Key insight: if one takes the parallel pancakes instance but with equally spaced components, looks like solving noisy linear equations mod 1 This is the continuous analogue of solving noisy linear equations over finite fields ( learning with errors (LWE) ), which is known to be hard under the same assumption [Gupte-Vafa-Vaikuntanathan 22]: direct reduction from LWE Subsequent applications to hardness of learning halfspaces, e.g. [Tiegel 23]

  20. COMPLEXITY RECAP COMPLEXITY RECAP Strong connections between cryptography / average-case complexity and ML theory, but practically relevant reductions tricky because the former are typically discrete and either correspond to unnatural input distributions or require agnostic noise to simulate Bulk of work on lower bounds has been in restricted models like SQ (surprisingly robust, main exception is Gaussian elimination) hardness is based on engineering instances with structural features that preclude certain algorithms. convenient by virtue of recipes E.g. parity <-> low-degree approximation, Gaussian mixtures <-> moment methods / dimensionality reduction These instances show (surprisingly!) that not only do those algorithms fail, but any algorithm will fail

  21. COMPLEXITY RECAP COMPLEXITY RECAP Lower bounds are a powerful way of introspecting about modeling choices e.g. once upon a time distribution-free agnostic learning was a great target to shoot for, but turned out to be too good to be true Realizing a model is too general often requires requires taking a computational, rather than a statistical, lens Even seemingly benign settings (continuous functions, Gaussian inputs, smoothed MLP weights) can be computationally hard!

  22. THE FINAL STRETCH THE FINAL STRETCH sum-of-squares optimization physics, inference, and sampling robust statistics tensors supervised learning computational complexity

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#