Self-Supervised Learning of Pretext-Invariant Representations

Slide Note
Embed
Share

This presentation discusses a novel approach in self-supervised learning (SSL) called Pretext-Invariant Representations Learning (PIRL). Traditional SSL methods yield covariant representations, but PIRL aims to learn invariant representations using pretext tasks that make representations similar for transformed images. The approach involves predicting patch order to encourage networks to maintain non-semantically relevant information. Invariant loss functions and memory banks are utilized to enforce similarity between original and transformed representations, ensuring consistency and invariance over time and different views of the same image.


Uploaded on May 16, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.



Presentation Transcript


  1. Self-Supervised Learning of Pretext-Invariant Representations Ishan Misra, Laurens van der Maaten (Facebook AI Research Team) Presenter: Shovito Barua Soumma Date: April 16, 2024 CVPR 2020, [1312 Citations as today]

  2. Background Performance of SSL depends on pretext task Traditional SSL leads to covariant representations that vary with transformations. PIRL aims to learn invariant representations using pretext tasks by making representations similar for transformed versions of the same image implement this by using a jigsaw puzzle-solving task as Pretext

  3. Prior Approach ? ? ? predicts order of the patches ???losses encourage network ?(.) to learn image representations that contain information on transformation t, thereby encouraging it to maintain information that is not semantically relevant some distribution over the transformation ?

  4. Invariant Loss Func. Dn D : set of N negative samples D : add equal weights to all the negative samples in the final loss fn. If Dn = D then it becomes a sigmoid function, ??+1 |Dn| ??

  5. Invariant Loss Func. Positive Term: (Maximize Similarity) Encouraging the representations ?(??) and ?(???) to be similar Enforcing invariance between original and transformed representations Negative Term: (Minimize Similarity I , ??) Encouraging transformed representation ?(???) to be dissimilar from other images' representations ?(?? )

  6. Memory Bank ?? Caching system For each image I, it stores a representation ?? in the memory bank. Computed as an exponential moving average of the representations ?(??) seen for that image ? in previous training iterations. ???= ? ??? 1+ 1 ? ??(??)

  7. Memory Bank ?? It encourages the transformed representation ?(???) to be similar to the memory bank representation of the original image, ?? enforces to learn invariance features Reinforce that the features extracted by head ?(??) align well with the historically aggregated features stored in the memory bank, ensuring that the learned features are consistent over time and across various views of the same image.

  8. Memory Bank ?? : balance between focusing on aligning the transformed and original image representations Vs ensuring consistency of the original image representation with its historical data. By tuning , one can control the emphasis on learning invariance (when is higher) Vs maintaining a stable and consistent representation in the face of ongoing training (when is lower).

Related


More Related Content