Learning-Based Low-Rank Approximations and Linear Sketches

Piotr Indyk

Ali Vakilian

Yang Yuan

MIT

Linear Sketches

•

–

Input: represented by x (or A)

–

Sketching: compress x into Sx

•

S=sketch matrix

–

Computation: on Sx

•

Examples:

–

Dimensionality reduction  (e.g.,

Johnson-Lindenstrauss lemma)

–

Streaming algorithms

•

Matrices are implicit, e.g. hash

functions

–

Compressed sensing

–

Linear algebra (regression, low-

rank approximation,..)

–

…

•

S is almost always a random

matrix

–

Independent,  FJLT,  sparse,...

–

Pros: simple, efficient, worst-

case guarantees

–

Cons: does not adapt to data

•

–

Dimensionality  reduction:

e.g., PCA

–

Compressed sensing: talk by

Ali Mousavi

•

Autoencoders: x → Sx →x’

–

Streaming algorithms: talk by

Ali Vakilian

–

Linear algebra ? This talk

Singular Value Decomposition (SVD)

Any matrix A = U

Σ

  V,  where:

•

U has orthonormal columns

•

Σ

 is diagonal

•

V has orthonormal rows

Rank-k approximation: A

= U

Σ

Equivalently: A

 = argmin

rank-k matrices B

 ||A-B||

Low Rank Approximation

•

Instead of

 = argmin

rank-k matrices B

 ||A-B||

     output a rank-k matrix A’, so that

||A-A’||

≤

(1+

ε

) ||A-A

||

•

Hopefully more efficient than computing

exact A

•

Sarlos’06, Clarkson-Woodruff’09,13,….

•

See Woodruff’14 for a survey

•

Most of these algos use linear sketches SA

•

S can be dense (FJLT) or sparse (0/+1/-1)

•

We focus on sparse S

Sarlos-ClarksonWoodruff

•

–

Compute SA (first pass)

–

Compute orthonormal V that spans

rowspace of  SA

–

Compute AV

(second pass)

–

Return SCW(S,A):= [AV

•

Space:

–

Suppose that A is n x d,  S is m x  n

–

Then SA  is m x d, AV

is n x m

–

Space proportional to m

–

Theory:

m = O(k/

ε

Evaluation

Results

k=10

Tech

MIT Logo

Hyper

Fallback  option

•

Learned matrices work  (much) better, but

no guarantees per matrix

•

•

Lemma: augmenting R with additional

(learned) matrix  S cannot  increase the

error of SCW

–

“Sketch monotonicity”

•

The algorithm inherits worst-case

guarantees from R

Mixed matrices - results

Conclusions/Questions

•

Learned  sketches can  improve the

accuracy/measurement tradeoff for  low-rank

approximation

–

•

Questions:

–

•

Requires sketching on both sides, more complicated

–

Guarantees:

•

Sampling complexity

•

Minimizing loss function (provably)

–

Other sketch-monotone algorithms ?

Wacky idea

•

A general approach to learning-based algorithm:

–

Take a randomized algorithm

–

Learn random bits from examples

•

The  last two talks can be viewed as  instantiations

of this approach

–

Random partitions → learning-based partitions

–

Random matrices → learning-based matrices

•

“Super-derandomization”: more efficient algorithm

with  weaker  guarantees

   (as opposed to less efficient algorithm with stronger

guarantees)

Slide Note

Embed Share

Download

Exploring learning-based low-rank approximations and linear sketches in matrices, including techniques like dimensionality reduction, regression, and streaming algorithms. Discusses the use of random matrices, sparse matrices, and the concept of low-rank approximation through singular value decomposition. Introduces approximate low-rank approximation methods and a streaming algorithm by Sarlos-Clarkson-Woodruff. Lastly, delves into a learning-based approach for low-rank approximation through sample matrices and optimization with PyTorch's SGD.

pjay Follow

Uploaded on Sep 13, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Learning-Based Low-Rank Approximations Piotr Indyk Ali Vakilian Yang Yuan MIT

Linear Sketches Many algorithms are obtained using linear sketches: Input: represented by x (or A) Sketching: compress x into Sx S=sketch matrix Computation: on Sx Examples: Dimensionality reduction (e.g., Johnson-Lindenstrauss lemma) Streaming algorithms Matrices are implicit, e.g. hash functions Compressed sensing Linear algebra (regression, low- rank approximation,..) Count-Min sketch ?? Regression 250 200 150 100 50 0 0 20 40 60 80 100 120

Learned Linear Sketches S is almost always a random matrix Independent, FJLT, sparse,... Pros: simple, efficient, worst- case guarantees Cons: does not adapt to data Why not learn S from examples ? Dimensionality reduction: e.g., PCA Compressed sensing: talk by Ali Mousavi Autoencoders: x Sx x Streaming algorithms: talk by Ali Vakilian Linear algebra ? This talk Not Heavy Sketching Alg (e.g. CM) Stream element Learned Oracle Unique Bucket Heav y

Low Rank Approximation Singular Value Decomposition (SVD) Any matrix A = U V, where: U has orthonormal columns is diagonal V has orthonormal rows Rank-k approximation: Ak = Uk k Vk Equivalently: Ak = argminrank-k matrices B ||A-B||F

Approximate Low Rank Approximation Instead of Ak = argminrank-k matrices B ||A-B||F output a rank-k matrix A , so that ||A-A ||F (1+ ) ||A-Ak||F Hopefully more efficient than computing exact Ak Sarlos 06, Clarkson-Woodruff 09,13, . See Woodruff 14 for a survey Most of these algos use linear sketches SA S can be dense (FJLT) or sparse (0/+1/-1) We focus on sparse S

Sarlos-ClarksonWoodruff Streaming algorithm (two passes): Compute SA (first pass) Compute orthonormal V that spans rowspace of SA Compute AVT (second pass) Return SCW(S,A):= [AVT]k V Space: Suppose that A is n x d, S is m x n Then SA is m x d, AVT is n x m Space proportional to m Theory: m = O(k/ )

Learning-Based Low-Rank Approximation Sample matrices A1...AN Find S that minimizes ||?? ???(?,??)||? ? Use S happily ever after (as long data come from the same distribution) Details : Use sparse matrices S Random support, optimize values Optimize using SGD in Pytorch Need to differentiate the above w.r.t. S Represent SVD as a sequence of power-method applications (each is differentiable)

Evaluation Datasets: Videos: MIT Logo, Friends, Eagle Hyperspectral images (HS-SOD) TechTC-300 200/400 training, 100 testing Optimize the matrix S Compute the empirical recovery error ?||?? ???(?,??)||? - ||?? ?? ?||? Compare to random matrices S

Results k=10 Tech Hyper MIT Logo

Fallback option Learned matrices work (much) better, but no guarantees per matrix Solution: combine S with random rows R Lemma: augmenting R with additional (learned) matrix S cannot increase the error of SCW Sketch monotonicity The algorithm inherits worst-case guarantees from R

Mixed matrices - results k m Sketch Logo Hyper Tech 10 10 10 20 20 20 Learned Mixed Random 0.1 0.2 2.09 0.52 0.78 2.92 2.95 3.73 7.99 10 10 10 40 40 40 Learned Mixed Random 0.04 0.05 0.45 0.28 0.34 1.12 1.16 1.31 3.28

Conclusions/Questions Learned sketches can improve the accuracy/measurement tradeoff for low-rank approximation Improves space Questions: Improve time Requires sketching on both sides, more complicated Guarantees: Sampling complexity Minimizing loss function (provably) Other sketch-monotone algorithms ?

Wacky idea A general approach to learning-based algorithm: Take a randomized algorithm Learn random bits from examples The last two talks can be viewed as instantiations of this approach Random partitions learning-based partitions Random matrices learning-based matrices Super-derandomization : more efficient algorithm with weaker guarantees (as opposed to less efficient algorithm with stronger guarantees)

Learning-Based Low-Rank Approximations and Linear Sketches

Download Presentation

Presentation Transcript

Related

More Related Content