Recurrent Neural Networks for Classification Models

1 / 108

Embed Share

Explore the application of Recurrent Neural Networks (RNN) in classification models, understanding how RNN captures dependencies between records for improved model performance. Learn the independence assumption in training data and the need for RNN to model dependencies between sequential records in tables.

ader_ika Follow

Uploaded on Apr 12, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

COMP5331 Other Classification Models: Recurrent Neural Network (RNN) Prepared by Raymond Wong Presented by Raymond Wong raywong@cse RNN 1

Other Classification Models Support Vector Machine (SVM) Neural Network Recurrent Neural Network RNN 2

x1 0 0 1 1 x2 0 1 0 1 d 0 1 1 1 Neural Network We train the model starting from the first record. Neural Network input x1 output Neural Network y x2 Output attribute Input attributes RNN 3

x1 0 0 1 1 x2 0 1 0 1 d 0 1 1 1 Neural Network Neural Network input x1 output Neural Network y x2 Output attribute Input attributes RNN 4

x1 0 0 1 1 x2 0 1 0 1 d 0 1 1 1 Neural Network Neural Network input x1 output Neural Network y x2 Output attribute Input attributes RNN 5

x1 0 0 1 1 x2 0 1 0 1 d 0 1 1 1 Neural Network Neural Network input x1 output Neural Network y x2 Output attribute Input attributes RNN 6

x1 0 0 1 1 x2 0 1 0 1 d 0 1 1 1 Neural Network We train the model with the first record again. Neural Network input x1 output Neural Network y x2 Output attribute Input attributes RNN 7

Neural Network Here, we know that training the model with one record is independent of training the model with another record This means that we assume that records in the table are independent RNN 8

In some cases, the current record is related to the previous records in the table. Thus, records in the table are dependent . We also want to capture this dependency in the model We could use a new model called recurrent neural network for this purpose. RNN 9

Neural Network Neural Network input x1 output Neural Network y x2 Output attribute Input attributes RNN 10

Neural Network Neural Network Record 1 (vector) input x1,1 output Neural Network x1 y1 x1,2 Output attribute Input attributes RNN 11

Neural Network Neural Network input x1 output Neural Network y1 Input vector Output attribute RNN 12

Recurrent Neural Network Recurrent Neural Network (RNN) Neural Network with a Loop input x1 output Recurrent Neural Network (RNN) y1 Input vector Output attribute RNN 13

Recurrent Neural Network Recurrent Neural Network (RNN) input x1 output y1 RNN Input vector Output attribute RNN 14

Unfolded representation of RNN x1 Timestamp = 1 y1 RNN x2 y2 Timestamp = 2 RNN x3 y3 Timestamp = 3 RNN xt yt Timestamp = t RNN RNN 15

xt-1 Timestamp = t-1 yt-1 RNN Internal state variable st-1 xt yt Timestamp = t RNN Internal state variable st xt+1 yt+1 Timestamp = t+1 RNN st+1 Internal state variable RNN 16

xt-1 yt-1 Timestamp = t-1 RNN st-1 xt Timestamp = t RNN yt st xt+1 Timestamp = t+1 yt+1 RNN RNN 17

Limitation It may memorize a lot of past events/values Due to its complex structure, it is more time-consuming for training. RNN 18

RNN 1. Basic RNN 2. Traditional LSTM 3. GRU RNN 19

Basic RNN The basic RNN is very simple. It contains only one single activation function (e.g., tanh and ReLU ). RNN 20

xt-1 yt-1 Timestamp = t-1 RNN st-1 xt Timestamp = t RNN yt st xt+1 Timestamp = t+1 yt+1 RNN RNN 21

xt-1 yt-1 Timestamp = t-1 Basic RNN st-1 xt Timestamp = t Basic RNN yt st xt+1 Timestamp = t+1 yt+1 Basic RNN RNN 22

xt-1 yt-1 Timestamp = t-1 Basic RNN st-1 xt Timestamp = t Memory Unit Basic RNN yt st xt+1 Timestamp = t+1 yt+1 Basic RNN RNN 23

xt-1 yt-1 Timestamp = t-1 Basic RNN st-1 Activation Function xt Timestamp = t Usually, it is tanh or ReLU yt st xt+1 Timestamp = t+1 yt+1 Basic RNN RNN 24

0.7 0.3 W = b = 0.4 xt-1 yt-1 Timestamp = t-1 Basic RNN 0.4 st-1 Activation Function xt Timestamp = t st = tanh(W . [xt, st-1] + b) st = tanh(W . [xt, st-1] + b) yt = st yt = st yt st xt+1 Timestamp = t+1 yt+1 Basic RNN RNN 25

In the following, we want to compute (weight) values in the basic RNN. Similar to the neural network, the basic RNN model has two steps. Step 1 (Input Forward Propagation) Step 2 (Error Backward Propagation) In the following, we focus on Input Forward Propagation . In the basic RNN, Error Backward Propagation could be solved by an existing optimization tool (like Neural Network ). RNN 26

Time t=1 t=2 xt, 1 0.1 0.7 xt, 2 0.4 0.9 y 0.3 0.5 Consider this example with two timestamps. When t = 1 When t = 2 We use the basic RNN to do the training. RNN 27

When t = 1 xt-1 x0 yt-1 y0 Timestamp = t-1 0 Basic RNN st-1 s0 Activation Function xt x1 1 Timestamp = t st = tanh(W . [xt, st-1] + b) st = tanh(W . [xt, st-1] + b) yt = st yt = st yt y1 s1 st xt+1 x2 2 Timestamp = t+1 yt+1 y2 Basic RNN RNN 28

st = tanh(W . [xt, st-1] + b) Time t=1 t=2 xt, 1 0.1 0.7 xt, 2 0.4 0.9 y yt = st 0.3 0.5 Step 1 (Input Forward Propagation) y0 0 s0 0 s1 = tanh(W . [x1, s0] + b) 0.7 0.3 0.4 = tanh(0.7 . 0.1 + 0.3 . 0.4 + 0.4 . 0 + 0.4) 0.1 0.4 0 = tanh( + 0.4) 0.7 0.3 0.4 W = b = 0.4 = tanh(0.59) = 0.5299 s1 = 0.5299 y1 = 0.5299 y1 = s1 = 0.5299 Error = y1 - y = 0.5299 0.3 = 0.2299 RNN 29

st = tanh(W . [xt, st-1] + b) Time t=1 t=2 xt, 1 0.1 0.7 xt, 2 0.4 0.9 y yt = st 0.3 0.5 Step 1 (Input Forward Propagation) s1 y0 0 s0 0 y1 0.5299 0.5299 0.7 0.3 0.4 W = b = 0.4 s1 = 0.5299 y1 = 0.5299 RNN 30

st = tanh(W . [xt, st-1] + b) Time t=1 t=2 xt, 1 0.1 0.7 xt, 2 0.4 0.9 y yt = st 0.3 0.5 Step 1 (Input Forward Propagation) s1 y0 0 s0 0 y1 0.5299 0.5299 s2 = tanh(W . [x2, s1] + b) 0.7 0.3 0.4 = tanh(0.7 . 0.7 + 0.3 . 0.9 + 0.4 . 0.5299 + 0.4) 0.7 0.9 = tanh( + 0.4) 0.7 0.3 0.4 W = b = 0.4 0.5299 = tanh(1.3720) = 0.8791 s2 = 0.8791 y2 = 0.8791 Error = y2 - y = 0.8791 0.5 y2 = s2 = 0.8791 = 0.3791 RNN 31

RNN 1. Basic RNN 2. Traditional LSTM 3. GRU RNN 32

Traditional LSTM Disadvantage of Basic RNN The basic RNN model is too simple . It could not simulate our human brain too much. It is not easy for the basic RNN model to converge (i.e., it may take a very long time to train the RNN model) RNN 33

Traditional LSTM Before we give the details of our brain, we want to emphasize that there is an internal state variable (i.e., variable st) to store our memory (i.e., a value) The next RNN to be described is called the LSTM (Long Short-Term Memory) model. RNN 34

Traditional LSTM It could simulate the brain process. Forget Feature It could decide to forget a portion of the internal state variable. Input Feature It could decide to input a portion of the input variable for the model It could decide the strength of the input for the model (i.e., the activation function) (called the weight of the input) RNN 35

Traditional LSTM Output Feature It could decide to output a portion of the output for the model It could decide the strength of the output for the model (i.e., the activation function) (called the weight of the output) RNN 36

Traditional LSTM Our brain includes the following steps. Forget component Input component Input activation component Internal state component Output component Final output component Forget gate Input gate Input activation gate Input state gate Output gate Final output gate RNN 37

xt-1 yt-1 Timestamp = t-1 RNN st-1 xt Timestamp = t RNN yt st xt+1 Timestamp = t+1 yt+1 RNN RNN 38

xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 xt Timestamp = t Traditional LSTM yt st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 39

xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 xt Timestamp = t Memory Unit Traditional LSTM yt st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 40

0.7 0.3 Wf = bf = 0.4 xt-1 yt-1 Timestamp = t-1 Traditional LSTM 0.4 st-1 ft xt Timestamp = t ft = (Wf [xt, yt-1] + bf) Forget gate ft = (Wf [xt, yt-1] + bf) Sigmoid function (net) yt 1 y = 1 + e-net st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 41

0.2 0.3 bi = 0.2 Wi = xt-1 yt-1 Traditional LSTM 0.4 Timestamp = t-1 st-1 ft xt Timestamp = t it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) Input gate it = (Wi [xt, yt-1] + bi) yt st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 42

0.4 0.2 ba = 0.5 Wa = xt-1 yt-1 Timestamp = t-1 Traditional LSTM 0.1 st-1 ft xt Timestamp = t it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at tanh at = tanh(Wa [xt, yt-1] + ba) Input activation gate yt at = tanh(Wa [xt, yt-1] + ba) st tanh function tanh(net) xt+1 Timestamp = t+1 yt+1 Traditional LSTM y = e2 net 1 e2 net + 1 RNN 43

xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 ft xt Timestamp = t it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at tanh at = tanh(Wa [xt, yt-1] + ba) yt st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 44

xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 ft xt Timestamp = t x it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at + tanh x at = tanh(Wa [xt, yt-1] + ba) st = ft. st-1 + it. at Internal state gate yt st = ft. st-1 + it. at st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 45

xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 ft xt Timestamp = t x it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at + tanh x at = tanh(Wa [xt, yt-1] + ba) st = ft. st-1 + it. at yt st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 46

0.8 0.9 bo = 0.3 Wo = xt-1 yt-1 Timestamp = t-1 Traditional LSTM 0.2 st-1 ft xt Timestamp = t x it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at + tanh x at = tanh(Wa [xt, yt-1] + ba) ot st = ft. st-1 + it. at yt ot = (Wo [xt, yt-1] + bo) Output state gate st ot = (Wo [xt, yt-1] + bo) xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 47

xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 ft xt Timestamp = t x it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at + tanh x at = tanh(Wa [xt, yt-1] + ba) ot st = ft. st-1 + it. at tanh yt ot = (Wo [xt, yt-1] + bo) x yt = ot. tanh(st) Final Output state gate st xt+1 Timestamp = t+1 yt+1 yt = ot. tanh(st) Traditional LSTM RNN 48

xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 ft xt Timestamp = t x it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at + tanh x at = tanh(Wa [xt, yt-1] + ba) ot st = ft. st-1 + it. at tanh yt ot = (Wo [xt, yt-1] + bo) x yt = ot. tanh(st) st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 49

In the following, we want to compute (weight) values in the traditional LSTM. Similar to the neural network, the traditional LSTM model has two steps. Step 1 (Input Forward Propagation) Step 2 (Error Backward Propagation) In the following, we focus on Input Forward Propagation . In the traditional LSTM, Error Backward Propagation could be solved by an existing optimization tool (like Neural Network ). RNN 50

Recurrent Neural Networks for Classification Models

Download Presentation

Presentation Transcript

Related

More Related Content