Self-Supervised Learning of Pretext-Invariant Representations

Self-Supervised Learning of Pretext-Invariant

Representations

Ishan Misra, Laurens van der Maaten

(Facebook AI Research Team)

Presenter: Shovito Barua Soumma

Date: April 16, 2024

•

CVPR 2020, [1312 Citations as today]

Background

•

Performance of SSL depends on pretext

task

•

Traditional SSL leads to covariant

representations that vary with

transformations.

•

PIRL aims to learn

invariant

representations

 using pretext tasks by

making

representations similar

for

transformed versions

of the same

image

•

implement this by using a jigsaw

puzzle-solving task as Pretext

Prior Approach

Invariant Loss Func.

Invariant Loss Func.

Memory Bank

Memory Bank

Memory Bank

•

λ

: balance between focusing on aligning the transformed and

original image representations Vs ensuring consistency of the original

image representation with its historical data.

•

By tuning

λ

one can control the emphasis on learning invariance

(when

λ

is higher) Vs maintaining a stable and consistent

representation in the face of ongoing training (when

λ

is lower).

Slide Note

Presenter: Shovito Barua Soumma

Embed Share

Download Presentation

This presentation discusses a novel approach in self-supervised learning (SSL) called Pretext-Invariant Representations Learning (PIRL). Traditional SSL methods yield covariant representations, but PIRL aims to learn invariant representations using pretext tasks that make representations similar for transformed images. The approach involves predicting patch order to encourage networks to maintain non-semantically relevant information. Invariant loss functions and memory banks are utilized to enforce similarity between original and transformed representations, ensuring consistency and invariance over time and different views of the same image.

azami Follow

Uploaded on May 16, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Self-Supervised Learning of Pretext-Invariant Representations Ishan Misra, Laurens van der Maaten (Facebook AI Research Team) Presenter: Shovito Barua Soumma Date: April 16, 2024 CVPR 2020, [1312 Citations as today]

Background Performance of SSL depends on pretext task Traditional SSL leads to covariant representations that vary with transformations. PIRL aims to learn invariant representations using pretext tasks by making representations similar for transformed versions of the same image implement this by using a jigsaw puzzle-solving task as Pretext

Prior Approach ? ? ? predicts order of the patches ???losses encourage network ?(.) to learn image representations that contain information on transformation t, thereby encouraging it to maintain information that is not semantically relevant some distribution over the transformation ?

Invariant Loss Func. Dn D : set of N negative samples D : add equal weights to all the negative samples in the final loss fn. If Dn = D then it becomes a sigmoid function, ??+1 |Dn| ??

Invariant Loss Func. Positive Term: (Maximize Similarity) Encouraging the representations ?(??) and ?(???) to be similar Enforcing invariance between original and transformed representations Negative Term: (Minimize Similarity I , ??) Encouraging transformed representation ?(???) to be dissimilar from other images' representations ?(?? )

Memory Bank ?? Caching system For each image I, it stores a representation ?? in the memory bank. Computed as an exponential moving average of the representations ?(??) seen for that image ? in previous training iterations. ???= ? ??? 1+ 1 ? ??(??)

Memory Bank ?? It encourages the transformed representation ?(???) to be similar to the memory bank representation of the original image, ?? enforces to learn invariance features Reinforce that the features extracted by head ?(??) align well with the historically aggregated features stored in the memory bank, ensuring that the learned features are consistent over time and across various views of the same image.

Memory Bank ?? : balance between focusing on aligning the transformed and original image representations Vs ensuring consistency of the original image representation with its historical data. By tuning , one can control the emphasis on learning invariance (when is higher) Vs maintaining a stable and consistent representation in the face of ongoing training (when is lower).

Self-Supervised Learning of Pretext-Invariant Representations

Download Presentation

Presentation Transcript

Related

More Related Content