Positional Encoding in Transformers for Deep Learning in NLP

 
Positional Encoding
(in Transformers)
 
 
Rasoul Akhavan Mahdavi
CS886-002: Deep Learning for NLP
 
Motivation
Approaches
Positional Embedding
Discussion
Conclusion
 
Outline
 
Motivation
 
Recurrent Networks have an inherent notion of order, …
 
 
 
 
 
 
       
but transformers don’t!
 
 
Motivation
 
Words flow simultaneously through
the encoder/decoder stacks
 
 
We need a notion of position!
 
 
Approaches?
 
Linear Position Assignment (1,2,3, …)
Unseen sentence lengths cannot be interpreted
 
Floats in a range ([0,1])
Different meaning dependent on sentence size
 
0
 
1
 
0
 
1
 
0.33
 
0.66
 
0.142
 
0.285
 
0.714
 
0.428
 
0.571
 
0.857
 
I                   love                    this              course
 
I    love  many  things  but  not   this   course
 
Properties
 
Unique positional encoding for each time step
Reasonable notion of relative distance
Independent of sentence size
Deterministic
 
Sinusoidal/Cosinusoidal Positional Embedding
 
Position will be a vector (instead of number)
 
Intuition?
 
 
Visualizing Encodings
 
 
Rough Proof
No “t” involved
 
Visualizing Encodings
 
Consistent Relative Distance
 
position
 
position
 
Discussion
 
Embedding is added. Why add instead of concatenate?
 
Does is last throughout the layers?
 
Why sinuses and cosines?
 
Conclusion
 
Why positional embedding is needed
Drawbacks of simple approaches
Sinusoidal/Cosinusoidal Positional Embedding
 
References
 
https://kazemnejad.com/blog/transformer_architecture_positional_encoding/
https://timodenk.com/blog/linear-relationships-in-the-transformers-positional-encoding/
https://jalammar.github.io/illustrated-transformer/
https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/
https://github.com/tensorflow/tensor2tensor/issues/1591
https://www.reddit.com/r/MachineLearning/comments/cttefo/d_positional_encoding_in_transformer/
 
Thank you!
Questions?
Slide Note
Embed
Share

This presentation delves into the significance and methods of implementing positional encoding in Transformers for natural language processing tasks. It discusses the challenges faced by recurrent networks, introduces approaches like linear position assignment and sinusoidal/cosinusoidal positional embedding, and provides insights on visualizing encodings. The content concludes by exploring the rationale behind adding embeddings, the use of sinusoids, and addresses pertinent questions on the topic.

  • Transformers
  • Positional Encoding
  • Deep Learning
  • NLP

Uploaded on Jul 26, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Positional Encoding (in Transformers) Rasoul Akhavan Mahdavi CS886-002: Deep Learning for NLP

  2. Motivation Approaches Positional Embedding Discussion Outline Conclusion

  3. Motivation Recurrent Networks have an inherent notion of order, but transformers don t!

  4. Motivation Words flow simultaneously through the encoder/decoder stacks

  5. We need a notion of position!

  6. Approaches? Linear Position Assignment (1,2,3, ) Unseen sentence lengths cannot be interpreted Floats in a range ([0,1]) Different meaning dependent on sentence size 0.571 0 0 0.66 0.285 0.857 0.142 1 0.428 0.33 0.714 1 I love many things but not this course I love this course

  7. Properties Unique positional encoding for each time step Reasonable notion of relative distance Independent of sentence size Deterministic

  8. Sinusoidal/Cosinusoidal Positional Embedding Position will be a vector (instead of number)

  9. Intuition?

  10. Visualizing Encodings

  11. Rough Proof ? ? ?. ?? = ??+? sin(???) cos(???) = sin(??(? + ?)) ?. cos(??(? + ?)) No t involved cos(???) sin(???) sin(???) cos(???) ? =

  12. Visualizing Encodings Consistent Relative Distance position position

  13. Discussion Embedding is added. Why add instead of concatenate? Does is last throughout the layers? Why sinuses and cosines?

  14. Conclusion Why positional embedding is needed Drawbacks of simple approaches Sinusoidal/Cosinusoidal Positional Embedding

  15. References https://kazemnejad.com/blog/transformer_architecture_positional_encoding/ https://timodenk.com/blog/linear-relationships-in-the-transformers-positional-encoding/ https://jalammar.github.io/illustrated-transformer/ https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/ https://github.com/tensorflow/tensor2tensor/issues/1591 https://www.reddit.com/r/MachineLearning/comments/cttefo/d_positional_encoding_in_transformer/ Thank you! Questions?

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#