Learning-Based Low-Rank Approximations and Linear Sketches

 
L
e
a
r
n
i
n
g
-
B
a
s
e
d
 
L
o
w
-
R
a
n
k
A
p
p
r
o
x
i
m
a
t
i
o
n
s
 
 
Piotr Indyk
  
Ali Vakilian
  
Yang Yuan
 
MIT
Linear Sketches
 
M
a
n
y
 
a
l
g
o
r
i
t
h
m
s
 
a
r
e
 
o
b
t
a
i
n
e
d
u
s
i
n
g
 
l
i
n
e
a
r
 
s
k
e
t
c
h
e
s
:
Input: represented by x (or A)
Sketching: compress x into Sx
S=sketch matrix
Computation: on Sx
Examples:
Dimensionality reduction  (e.g.,
Johnson-Lindenstrauss lemma)
Streaming algorithms
Matrices are implicit, e.g. hash
functions
Compressed sensing
Linear algebra (regression, low-
rank approximation,..)
 
 
L
e
a
r
n
e
d
 
L
i
n
e
a
r
 
S
k
e
t
c
h
e
s
 
S is almost always a random
matrix
Independent,  FJLT,  sparse,...
Pros: simple, efficient, worst-
case guarantees
Cons: does not adapt to data
W
h
y
 
n
o
t
 
l
e
a
r
n
 
S
 
f
r
o
m
e
x
a
m
p
l
e
s
 
?
Dimensionality  reduction:
e.g., PCA
Compressed sensing: talk by
Ali Mousavi
Autoencoders: x → Sx →x’
Streaming algorithms: talk by
Ali Vakilian
Linear algebra ? This talk
 
 
 
 
Singular Value Decomposition (SVD)
Any matrix A = U  
Σ
  V,  where:
U has orthonormal columns
Σ
 is diagonal
V has orthonormal rows
 
Rank-k approximation: A
k
 = U
k
  
Σ
k
  V
k
 
Equivalently: A
k
 = argmin
rank-k matrices B
 ||A-B||
F
 
Low Rank Approximation
 
 
 
Instead of
A
k
 = argmin
rank-k matrices B
 ||A-B||
F
     output a rank-k matrix A’, so that
||A-A’||
F
 
 (1+
ε
) ||A-A
k
||
F
 
Hopefully more efficient than computing
exact A
k
 
Sarlos’06, Clarkson-Woodruff’09,13,….
See Woodruff’14 for a survey
 
Most of these algos use linear sketches SA
S can be dense (FJLT) or sparse (0/+1/-1)
We focus on sparse S
A
p
p
r
o
x
i
m
a
t
e
 
L
o
w
 
R
a
n
k
A
p
p
r
o
x
i
m
a
t
i
o
n
Sarlos-ClarksonWoodruff
 
S
t
r
e
a
m
i
n
g
 
a
l
g
o
r
i
t
h
m
 
(
t
w
o
 
p
a
s
s
e
s
)
:
Compute SA (first pass)
Compute orthonormal V that spans
rowspace of  SA
Compute AV
T 
(second pass)
Return SCW(S,A):= [AV
T
]
k 
V
Space:
Suppose that A is n x d,  S is m x  n
Then SA  is m x d, AV
T 
is n x m
Space proportional to m
Theory: 
m = O(k/
ε
)
 
 
 
 
L
e
a
r
n
i
n
g
-
B
a
s
e
d
 
L
o
w
-
R
a
n
k
A
p
p
r
o
x
i
m
a
t
i
o
n
 
Evaluation
 
Results
k=10
 
Tech
 
MIT Logo
 
Hyper
Fallback  option
 
Learned matrices work  (much) better, but
no guarantees per matrix
S
o
l
u
t
i
o
n
:
 
c
o
m
b
i
n
e
 
S
 
w
i
t
h
 
r
a
n
d
o
m
 
r
o
w
s
 
R
Lemma: augmenting R with additional
(learned) matrix  S cannot  increase the
error of SCW
“Sketch monotonicity”
The algorithm inherits worst-case
guarantees from R
 
 
Mixed matrices - results
Conclusions/Questions
 
Learned  sketches can  improve the
accuracy/measurement tradeoff for  low-rank
approximation
I
m
p
r
o
v
e
s
 
s
p
a
c
e
Questions:
I
m
p
r
o
v
e
 
t
i
m
e
Requires sketching on both sides, more complicated
Guarantees:
Sampling complexity
Minimizing loss function (provably)
Other sketch-monotone algorithms ?
Wacky idea
 
A general approach to learning-based algorithm:
Take a randomized algorithm
Learn random bits from examples
The  last two talks can be viewed as  instantiations
of this approach
Random partitions → learning-based partitions
Random matrices → learning-based matrices
“Super-derandomization”: more efficient algorithm
with  weaker  guarantees
   (as opposed to less efficient algorithm with stronger
guarantees)
Slide Note
Embed
Share

Exploring learning-based low-rank approximations and linear sketches in matrices, including techniques like dimensionality reduction, regression, and streaming algorithms. Discusses the use of random matrices, sparse matrices, and the concept of low-rank approximation through singular value decomposition. Introduces approximate low-rank approximation methods and a streaming algorithm by Sarlos-Clarkson-Woodruff. Lastly, delves into a learning-based approach for low-rank approximation through sample matrices and optimization with PyTorch's SGD.

  • Learning-based
  • Low-rank
  • Linear sketches
  • Dimensionality reduction
  • Optimization

Uploaded on Sep 13, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Learning-Based Low-Rank Approximations Piotr Indyk Ali Vakilian Yang Yuan MIT

  2. Linear Sketches Many algorithms are obtained using linear sketches: Input: represented by x (or A) Sketching: compress x into Sx S=sketch matrix Computation: on Sx Examples: Dimensionality reduction (e.g., Johnson-Lindenstrauss lemma) Streaming algorithms Matrices are implicit, e.g. hash functions Compressed sensing Linear algebra (regression, low- rank approximation,..) Count-Min sketch ?? Regression 250 200 150 100 50 0 0 20 40 60 80 100 120

  3. Learned Linear Sketches S is almost always a random matrix Independent, FJLT, sparse,... Pros: simple, efficient, worst- case guarantees Cons: does not adapt to data Why not learn S from examples ? Dimensionality reduction: e.g., PCA Compressed sensing: talk by Ali Mousavi Autoencoders: x Sx x Streaming algorithms: talk by Ali Vakilian Linear algebra ? This talk Not Heavy Sketching Alg (e.g. CM) Stream element Learned Oracle Unique Bucket Heav y

  4. Low Rank Approximation Singular Value Decomposition (SVD) Any matrix A = U V, where: U has orthonormal columns is diagonal V has orthonormal rows Rank-k approximation: Ak = Uk k Vk Equivalently: Ak = argminrank-k matrices B ||A-B||F

  5. Approximate Low Rank Approximation Instead of Ak = argminrank-k matrices B ||A-B||F output a rank-k matrix A , so that ||A-A ||F (1+ ) ||A-Ak||F Hopefully more efficient than computing exact Ak Sarlos 06, Clarkson-Woodruff 09,13, . See Woodruff 14 for a survey Most of these algos use linear sketches SA S can be dense (FJLT) or sparse (0/+1/-1) We focus on sparse S

  6. Sarlos-ClarksonWoodruff Streaming algorithm (two passes): Compute SA (first pass) Compute orthonormal V that spans rowspace of SA Compute AVT (second pass) Return SCW(S,A):= [AVT]k V Space: Suppose that A is n x d, S is m x n Then SA is m x d, AVT is n x m Space proportional to m Theory: m = O(k/ )

  7. Learning-Based Low-Rank Approximation Sample matrices A1...AN Find S that minimizes ||?? ???(?,??)||? ? Use S happily ever after (as long data come from the same distribution) Details : Use sparse matrices S Random support, optimize values Optimize using SGD in Pytorch Need to differentiate the above w.r.t. S Represent SVD as a sequence of power-method applications (each is differentiable)

  8. Evaluation Datasets: Videos: MIT Logo, Friends, Eagle Hyperspectral images (HS-SOD) TechTC-300 200/400 training, 100 testing Optimize the matrix S Compute the empirical recovery error ?||?? ???(?,??)||? - ||?? ?? ?||? Compare to random matrices S

  9. Results k=10 Tech Hyper MIT Logo

  10. Fallback option Learned matrices work (much) better, but no guarantees per matrix Solution: combine S with random rows R Lemma: augmenting R with additional (learned) matrix S cannot increase the error of SCW Sketch monotonicity The algorithm inherits worst-case guarantees from R

  11. Mixed matrices - results k m Sketch Logo Hyper Tech 10 10 10 20 20 20 Learned Mixed Random 0.1 0.2 2.09 0.52 0.78 2.92 2.95 3.73 7.99 10 10 10 40 40 40 Learned Mixed Random 0.04 0.05 0.45 0.28 0.34 1.12 1.16 1.31 3.28

  12. Conclusions/Questions Learned sketches can improve the accuracy/measurement tradeoff for low-rank approximation Improves space Questions: Improve time Requires sketching on both sides, more complicated Guarantees: Sampling complexity Minimizing loss function (provably) Other sketch-monotone algorithms ?

  13. Wacky idea A general approach to learning-based algorithm: Take a randomized algorithm Learn random bits from examples The last two talks can be viewed as instantiations of this approach Random partitions learning-based partitions Random matrices learning-based matrices Super-derandomization : more efficient algorithm with weaker guarantees (as opposed to less efficient algorithm with stronger guarantees)

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#