
Recurrent Neural Networks for Classification Models
Explore the application of Recurrent Neural Networks (RNN) in classification models, understanding how RNN captures dependencies between records for improved model performance. Learn the independence assumption in training data and the need for RNN to model dependencies between sequential records in tables.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
COMP5331 Other Classification Models: Recurrent Neural Network (RNN) Prepared by Raymond Wong Presented by Raymond Wong raywong@cse RNN 1
Other Classification Models Support Vector Machine (SVM) Neural Network Recurrent Neural Network RNN 2
x1 0 0 1 1 x2 0 1 0 1 d 0 1 1 1 Neural Network We train the model starting from the first record. Neural Network input x1 output Neural Network y x2 Output attribute Input attributes RNN 3
x1 0 0 1 1 x2 0 1 0 1 d 0 1 1 1 Neural Network Neural Network input x1 output Neural Network y x2 Output attribute Input attributes RNN 4
x1 0 0 1 1 x2 0 1 0 1 d 0 1 1 1 Neural Network Neural Network input x1 output Neural Network y x2 Output attribute Input attributes RNN 5
x1 0 0 1 1 x2 0 1 0 1 d 0 1 1 1 Neural Network Neural Network input x1 output Neural Network y x2 Output attribute Input attributes RNN 6
x1 0 0 1 1 x2 0 1 0 1 d 0 1 1 1 Neural Network We train the model with the first record again. Neural Network input x1 output Neural Network y x2 Output attribute Input attributes RNN 7
Neural Network Here, we know that training the model with one record is independent of training the model with another record This means that we assume that records in the table are independent RNN 8
In some cases, the current record is related to the previous records in the table. Thus, records in the table are dependent . We also want to capture this dependency in the model We could use a new model called recurrent neural network for this purpose. RNN 9
Neural Network Neural Network input x1 output Neural Network y x2 Output attribute Input attributes RNN 10
Neural Network Neural Network Record 1 (vector) input x1,1 output Neural Network x1 y1 x1,2 Output attribute Input attributes RNN 11
Neural Network Neural Network input x1 output Neural Network y1 Input vector Output attribute RNN 12
Recurrent Neural Network Recurrent Neural Network (RNN) Neural Network with a Loop input x1 output Recurrent Neural Network (RNN) y1 Input vector Output attribute RNN 13
Recurrent Neural Network Recurrent Neural Network (RNN) input x1 output y1 RNN Input vector Output attribute RNN 14
Unfolded representation of RNN x1 Timestamp = 1 y1 RNN x2 y2 Timestamp = 2 RNN x3 y3 Timestamp = 3 RNN xt yt Timestamp = t RNN RNN 15
xt-1 Timestamp = t-1 yt-1 RNN Internal state variable st-1 xt yt Timestamp = t RNN Internal state variable st xt+1 yt+1 Timestamp = t+1 RNN st+1 Internal state variable RNN 16
xt-1 yt-1 Timestamp = t-1 RNN st-1 xt Timestamp = t RNN yt st xt+1 Timestamp = t+1 yt+1 RNN RNN 17
Limitation It may memorize a lot of past events/values Due to its complex structure, it is more time-consuming for training. RNN 18
RNN 1. Basic RNN 2. Traditional LSTM 3. GRU RNN 19
Basic RNN The basic RNN is very simple. It contains only one single activation function (e.g., tanh and ReLU ). RNN 20
xt-1 yt-1 Timestamp = t-1 RNN st-1 xt Timestamp = t RNN yt st xt+1 Timestamp = t+1 yt+1 RNN RNN 21
xt-1 yt-1 Timestamp = t-1 Basic RNN st-1 xt Timestamp = t Basic RNN yt st xt+1 Timestamp = t+1 yt+1 Basic RNN RNN 22
xt-1 yt-1 Timestamp = t-1 Basic RNN st-1 xt Timestamp = t Memory Unit Basic RNN yt st xt+1 Timestamp = t+1 yt+1 Basic RNN RNN 23
xt-1 yt-1 Timestamp = t-1 Basic RNN st-1 Activation Function xt Timestamp = t Usually, it is tanh or ReLU yt st xt+1 Timestamp = t+1 yt+1 Basic RNN RNN 24
0.7 0.3 W = b = 0.4 xt-1 yt-1 Timestamp = t-1 Basic RNN 0.4 st-1 Activation Function xt Timestamp = t st = tanh(W . [xt, st-1] + b) st = tanh(W . [xt, st-1] + b) yt = st yt = st yt st xt+1 Timestamp = t+1 yt+1 Basic RNN RNN 25
In the following, we want to compute (weight) values in the basic RNN. Similar to the neural network, the basic RNN model has two steps. Step 1 (Input Forward Propagation) Step 2 (Error Backward Propagation) In the following, we focus on Input Forward Propagation . In the basic RNN, Error Backward Propagation could be solved by an existing optimization tool (like Neural Network ). RNN 26
Time t=1 t=2 xt, 1 0.1 0.7 xt, 2 0.4 0.9 y 0.3 0.5 Consider this example with two timestamps. When t = 1 When t = 2 We use the basic RNN to do the training. RNN 27
When t = 1 xt-1 x0 yt-1 y0 Timestamp = t-1 0 Basic RNN st-1 s0 Activation Function xt x1 1 Timestamp = t st = tanh(W . [xt, st-1] + b) st = tanh(W . [xt, st-1] + b) yt = st yt = st yt y1 s1 st xt+1 x2 2 Timestamp = t+1 yt+1 y2 Basic RNN RNN 28
st = tanh(W . [xt, st-1] + b) Time t=1 t=2 xt, 1 0.1 0.7 xt, 2 0.4 0.9 y yt = st 0.3 0.5 Step 1 (Input Forward Propagation) y0 0 s0 0 s1 = tanh(W . [x1, s0] + b) 0.7 0.3 0.4 = tanh(0.7 . 0.1 + 0.3 . 0.4 + 0.4 . 0 + 0.4) 0.1 0.4 0 = tanh( + 0.4) 0.7 0.3 0.4 W = b = 0.4 = tanh(0.59) = 0.5299 s1 = 0.5299 y1 = 0.5299 y1 = s1 = 0.5299 Error = y1 - y = 0.5299 0.3 = 0.2299 RNN 29
st = tanh(W . [xt, st-1] + b) Time t=1 t=2 xt, 1 0.1 0.7 xt, 2 0.4 0.9 y yt = st 0.3 0.5 Step 1 (Input Forward Propagation) s1 y0 0 s0 0 y1 0.5299 0.5299 0.7 0.3 0.4 W = b = 0.4 s1 = 0.5299 y1 = 0.5299 RNN 30
st = tanh(W . [xt, st-1] + b) Time t=1 t=2 xt, 1 0.1 0.7 xt, 2 0.4 0.9 y yt = st 0.3 0.5 Step 1 (Input Forward Propagation) s1 y0 0 s0 0 y1 0.5299 0.5299 s2 = tanh(W . [x2, s1] + b) 0.7 0.3 0.4 = tanh(0.7 . 0.7 + 0.3 . 0.9 + 0.4 . 0.5299 + 0.4) 0.7 0.9 = tanh( + 0.4) 0.7 0.3 0.4 W = b = 0.4 0.5299 = tanh(1.3720) = 0.8791 s2 = 0.8791 y2 = 0.8791 Error = y2 - y = 0.8791 0.5 y2 = s2 = 0.8791 = 0.3791 RNN 31
RNN 1. Basic RNN 2. Traditional LSTM 3. GRU RNN 32
Traditional LSTM Disadvantage of Basic RNN The basic RNN model is too simple . It could not simulate our human brain too much. It is not easy for the basic RNN model to converge (i.e., it may take a very long time to train the RNN model) RNN 33
Traditional LSTM Before we give the details of our brain, we want to emphasize that there is an internal state variable (i.e., variable st) to store our memory (i.e., a value) The next RNN to be described is called the LSTM (Long Short-Term Memory) model. RNN 34
Traditional LSTM It could simulate the brain process. Forget Feature It could decide to forget a portion of the internal state variable. Input Feature It could decide to input a portion of the input variable for the model It could decide the strength of the input for the model (i.e., the activation function) (called the weight of the input) RNN 35
Traditional LSTM Output Feature It could decide to output a portion of the output for the model It could decide the strength of the output for the model (i.e., the activation function) (called the weight of the output) RNN 36
Traditional LSTM Our brain includes the following steps. Forget component Input component Input activation component Internal state component Output component Final output component Forget gate Input gate Input activation gate Input state gate Output gate Final output gate RNN 37
xt-1 yt-1 Timestamp = t-1 RNN st-1 xt Timestamp = t RNN yt st xt+1 Timestamp = t+1 yt+1 RNN RNN 38
xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 xt Timestamp = t Traditional LSTM yt st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 39
xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 xt Timestamp = t Memory Unit Traditional LSTM yt st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 40
0.7 0.3 Wf = bf = 0.4 xt-1 yt-1 Timestamp = t-1 Traditional LSTM 0.4 st-1 ft xt Timestamp = t ft = (Wf [xt, yt-1] + bf) Forget gate ft = (Wf [xt, yt-1] + bf) Sigmoid function (net) yt 1 y = 1 + e-net st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 41
0.2 0.3 bi = 0.2 Wi = xt-1 yt-1 Traditional LSTM 0.4 Timestamp = t-1 st-1 ft xt Timestamp = t it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) Input gate it = (Wi [xt, yt-1] + bi) yt st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 42
0.4 0.2 ba = 0.5 Wa = xt-1 yt-1 Timestamp = t-1 Traditional LSTM 0.1 st-1 ft xt Timestamp = t it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at tanh at = tanh(Wa [xt, yt-1] + ba) Input activation gate yt at = tanh(Wa [xt, yt-1] + ba) st tanh function tanh(net) xt+1 Timestamp = t+1 yt+1 Traditional LSTM y = e2 net 1 e2 net + 1 RNN 43
xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 ft xt Timestamp = t it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at tanh at = tanh(Wa [xt, yt-1] + ba) yt st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 44
xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 ft xt Timestamp = t x it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at + tanh x at = tanh(Wa [xt, yt-1] + ba) st = ft. st-1 + it. at Internal state gate yt st = ft. st-1 + it. at st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 45
xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 ft xt Timestamp = t x it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at + tanh x at = tanh(Wa [xt, yt-1] + ba) st = ft. st-1 + it. at yt st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 46
0.8 0.9 bo = 0.3 Wo = xt-1 yt-1 Timestamp = t-1 Traditional LSTM 0.2 st-1 ft xt Timestamp = t x it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at + tanh x at = tanh(Wa [xt, yt-1] + ba) ot st = ft. st-1 + it. at yt ot = (Wo [xt, yt-1] + bo) Output state gate st ot = (Wo [xt, yt-1] + bo) xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 47
xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 ft xt Timestamp = t x it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at + tanh x at = tanh(Wa [xt, yt-1] + ba) ot st = ft. st-1 + it. at tanh yt ot = (Wo [xt, yt-1] + bo) x yt = ot. tanh(st) Final Output state gate st xt+1 Timestamp = t+1 yt+1 yt = ot. tanh(st) Traditional LSTM RNN 48
xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 ft xt Timestamp = t x it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at + tanh x at = tanh(Wa [xt, yt-1] + ba) ot st = ft. st-1 + it. at tanh yt ot = (Wo [xt, yt-1] + bo) x yt = ot. tanh(st) st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 49
In the following, we want to compute (weight) values in the traditional LSTM. Similar to the neural network, the traditional LSTM model has two steps. Step 1 (Input Forward Propagation) Step 2 (Error Backward Propagation) In the following, we focus on Input Forward Propagation . In the traditional LSTM, Error Backward Propagation could be solved by an existing optimization tool (like Neural Network ). RNN 50