Understanding LSTMs for Deep Learning: A Visual Overview

Slide Note

Delve into the intricate workings of Long Short-Term Memory (LSTM) networks with a series of visual aids and explanations by Dhruv Batra. Explore the intuition behind LSTMs, including memory cells, forget gates, input gates, memory updates, and output gates, shedding light on how these mechanisms enable the model to retain and process information over extended periods.

buschman_b Follow

Uploaded on Oct 09, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

ECE 6504: Deep Learning for Perception Topics: LSTMs (intuition and variants) [Abhishek:] Lua / Torch Tutorial Dhruv Batra Virginia Tech

Administrativia HW3 Out today Due in 2 weeks Please please please please please start early https://computing.ece.vt.edu/~f15ece6504/homework3/ (C) Dhruv Batra 2

RNN Basic block diagram (C) Dhruv Batra 3 Image Credit: Christopher Olah (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

Key Problem Learning long-term dependencies is hard (C) Dhruv Batra 4 Image Credit: Christopher Olah (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

Meet LSTMs How about we explicitly encode memory? (C) Dhruv Batra 5 Image Credit: Christopher Olah (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

LSTMs Intuition: Memory Cell State / Memory (C) Dhruv Batra 6 Image Credit: Christopher Olah (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

LSTMs Intuition: Forget Gate Should we continue to remember this bit of information or not? (C) Dhruv Batra 7 Image Credit: Christopher Olah (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

LSTMs Intuition: Input Gate Should we update this bit of information or not? If so, with what? (C) Dhruv Batra 8 Image Credit: Christopher Olah (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

LSTMs Intuition: Memory Update Forget that + memorize this (C) Dhruv Batra 9 Image Credit: Christopher Olah (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

LSTMs Intuition: Output Gate Should we output this bit of information to deeper layers? (C) Dhruv Batra 10 Image Credit: Christopher Olah (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

LSTMs Intuition: Output Gate Should we output this bit of information to deeper layers? (C) Dhruv Batra 11 Image Credit: Christopher Olah (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

LSTMs A pretty sophisticated cell (C) Dhruv Batra 12 Image Credit: Christopher Olah (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

LSTM Variants #1: Peephole Connections Let gates see the cell state / memory (C) Dhruv Batra 13 Image Credit: Christopher Olah (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

LSTM Variants #2: Coupled Gates Only memorize new if forgetting old (C) Dhruv Batra 14 Image Credit: Christopher Olah (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

LSTM Variants #3: Gated Recurrent Units Changes: No explicit memory; memory = hidden output Z = memorize new and forget old (C) Dhruv Batra 15 Image Credit: Christopher Olah (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

RMSProp Intuition Gradients Direction to Opt Gradients point in the direction of steepest ascent locally Not where we want to go long term Mismatch gradient magnitudes magnitude large = we should travel a small distance magnitude small = we should travel a large distance (C) Dhruv Batra 16 Image Credit: Geoffrey Hinton