Recurrent Neural Networks for Classification Models

comp5331 n.w
1 / 108
Embed
Share

Explore the application of Recurrent Neural Networks (RNN) in classification models, understanding how RNN captures dependencies between records for improved model performance. Learn the independence assumption in training data and the need for RNN to model dependencies between sequential records in tables.

  • Neural Networks
  • RNN
  • Classification Models
  • Data Science

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. COMP5331 Other Classification Models: Recurrent Neural Network (RNN) Prepared by Raymond Wong Presented by Raymond Wong raywong@cse RNN 1

  2. Other Classification Models Support Vector Machine (SVM) Neural Network Recurrent Neural Network RNN 2

  3. x1 0 0 1 1 x2 0 1 0 1 d 0 1 1 1 Neural Network We train the model starting from the first record. Neural Network input x1 output Neural Network y x2 Output attribute Input attributes RNN 3

  4. x1 0 0 1 1 x2 0 1 0 1 d 0 1 1 1 Neural Network Neural Network input x1 output Neural Network y x2 Output attribute Input attributes RNN 4

  5. x1 0 0 1 1 x2 0 1 0 1 d 0 1 1 1 Neural Network Neural Network input x1 output Neural Network y x2 Output attribute Input attributes RNN 5

  6. x1 0 0 1 1 x2 0 1 0 1 d 0 1 1 1 Neural Network Neural Network input x1 output Neural Network y x2 Output attribute Input attributes RNN 6

  7. x1 0 0 1 1 x2 0 1 0 1 d 0 1 1 1 Neural Network We train the model with the first record again. Neural Network input x1 output Neural Network y x2 Output attribute Input attributes RNN 7

  8. Neural Network Here, we know that training the model with one record is independent of training the model with another record This means that we assume that records in the table are independent RNN 8

  9. In some cases, the current record is related to the previous records in the table. Thus, records in the table are dependent . We also want to capture this dependency in the model We could use a new model called recurrent neural network for this purpose. RNN 9

  10. Neural Network Neural Network input x1 output Neural Network y x2 Output attribute Input attributes RNN 10

  11. Neural Network Neural Network Record 1 (vector) input x1,1 output Neural Network x1 y1 x1,2 Output attribute Input attributes RNN 11

  12. Neural Network Neural Network input x1 output Neural Network y1 Input vector Output attribute RNN 12

  13. Recurrent Neural Network Recurrent Neural Network (RNN) Neural Network with a Loop input x1 output Recurrent Neural Network (RNN) y1 Input vector Output attribute RNN 13

  14. Recurrent Neural Network Recurrent Neural Network (RNN) input x1 output y1 RNN Input vector Output attribute RNN 14

  15. Unfolded representation of RNN x1 Timestamp = 1 y1 RNN x2 y2 Timestamp = 2 RNN x3 y3 Timestamp = 3 RNN xt yt Timestamp = t RNN RNN 15

  16. xt-1 Timestamp = t-1 yt-1 RNN Internal state variable st-1 xt yt Timestamp = t RNN Internal state variable st xt+1 yt+1 Timestamp = t+1 RNN st+1 Internal state variable RNN 16

  17. xt-1 yt-1 Timestamp = t-1 RNN st-1 xt Timestamp = t RNN yt st xt+1 Timestamp = t+1 yt+1 RNN RNN 17

  18. Limitation It may memorize a lot of past events/values Due to its complex structure, it is more time-consuming for training. RNN 18

  19. RNN 1. Basic RNN 2. Traditional LSTM 3. GRU RNN 19

  20. Basic RNN The basic RNN is very simple. It contains only one single activation function (e.g., tanh and ReLU ). RNN 20

  21. xt-1 yt-1 Timestamp = t-1 RNN st-1 xt Timestamp = t RNN yt st xt+1 Timestamp = t+1 yt+1 RNN RNN 21

  22. xt-1 yt-1 Timestamp = t-1 Basic RNN st-1 xt Timestamp = t Basic RNN yt st xt+1 Timestamp = t+1 yt+1 Basic RNN RNN 22

  23. xt-1 yt-1 Timestamp = t-1 Basic RNN st-1 xt Timestamp = t Memory Unit Basic RNN yt st xt+1 Timestamp = t+1 yt+1 Basic RNN RNN 23

  24. xt-1 yt-1 Timestamp = t-1 Basic RNN st-1 Activation Function xt Timestamp = t Usually, it is tanh or ReLU yt st xt+1 Timestamp = t+1 yt+1 Basic RNN RNN 24

  25. 0.7 0.3 W = b = 0.4 xt-1 yt-1 Timestamp = t-1 Basic RNN 0.4 st-1 Activation Function xt Timestamp = t st = tanh(W . [xt, st-1] + b) st = tanh(W . [xt, st-1] + b) yt = st yt = st yt st xt+1 Timestamp = t+1 yt+1 Basic RNN RNN 25

  26. In the following, we want to compute (weight) values in the basic RNN. Similar to the neural network, the basic RNN model has two steps. Step 1 (Input Forward Propagation) Step 2 (Error Backward Propagation) In the following, we focus on Input Forward Propagation . In the basic RNN, Error Backward Propagation could be solved by an existing optimization tool (like Neural Network ). RNN 26

  27. Time t=1 t=2 xt, 1 0.1 0.7 xt, 2 0.4 0.9 y 0.3 0.5 Consider this example with two timestamps. When t = 1 When t = 2 We use the basic RNN to do the training. RNN 27

  28. When t = 1 xt-1 x0 yt-1 y0 Timestamp = t-1 0 Basic RNN st-1 s0 Activation Function xt x1 1 Timestamp = t st = tanh(W . [xt, st-1] + b) st = tanh(W . [xt, st-1] + b) yt = st yt = st yt y1 s1 st xt+1 x2 2 Timestamp = t+1 yt+1 y2 Basic RNN RNN 28

  29. st = tanh(W . [xt, st-1] + b) Time t=1 t=2 xt, 1 0.1 0.7 xt, 2 0.4 0.9 y yt = st 0.3 0.5 Step 1 (Input Forward Propagation) y0 0 s0 0 s1 = tanh(W . [x1, s0] + b) 0.7 0.3 0.4 = tanh(0.7 . 0.1 + 0.3 . 0.4 + 0.4 . 0 + 0.4) 0.1 0.4 0 = tanh( + 0.4) 0.7 0.3 0.4 W = b = 0.4 = tanh(0.59) = 0.5299 s1 = 0.5299 y1 = 0.5299 y1 = s1 = 0.5299 Error = y1 - y = 0.5299 0.3 = 0.2299 RNN 29

  30. st = tanh(W . [xt, st-1] + b) Time t=1 t=2 xt, 1 0.1 0.7 xt, 2 0.4 0.9 y yt = st 0.3 0.5 Step 1 (Input Forward Propagation) s1 y0 0 s0 0 y1 0.5299 0.5299 0.7 0.3 0.4 W = b = 0.4 s1 = 0.5299 y1 = 0.5299 RNN 30

  31. st = tanh(W . [xt, st-1] + b) Time t=1 t=2 xt, 1 0.1 0.7 xt, 2 0.4 0.9 y yt = st 0.3 0.5 Step 1 (Input Forward Propagation) s1 y0 0 s0 0 y1 0.5299 0.5299 s2 = tanh(W . [x2, s1] + b) 0.7 0.3 0.4 = tanh(0.7 . 0.7 + 0.3 . 0.9 + 0.4 . 0.5299 + 0.4) 0.7 0.9 = tanh( + 0.4) 0.7 0.3 0.4 W = b = 0.4 0.5299 = tanh(1.3720) = 0.8791 s2 = 0.8791 y2 = 0.8791 Error = y2 - y = 0.8791 0.5 y2 = s2 = 0.8791 = 0.3791 RNN 31

  32. RNN 1. Basic RNN 2. Traditional LSTM 3. GRU RNN 32

  33. Traditional LSTM Disadvantage of Basic RNN The basic RNN model is too simple . It could not simulate our human brain too much. It is not easy for the basic RNN model to converge (i.e., it may take a very long time to train the RNN model) RNN 33

  34. Traditional LSTM Before we give the details of our brain, we want to emphasize that there is an internal state variable (i.e., variable st) to store our memory (i.e., a value) The next RNN to be described is called the LSTM (Long Short-Term Memory) model. RNN 34

  35. Traditional LSTM It could simulate the brain process. Forget Feature It could decide to forget a portion of the internal state variable. Input Feature It could decide to input a portion of the input variable for the model It could decide the strength of the input for the model (i.e., the activation function) (called the weight of the input) RNN 35

  36. Traditional LSTM Output Feature It could decide to output a portion of the output for the model It could decide the strength of the output for the model (i.e., the activation function) (called the weight of the output) RNN 36

  37. Traditional LSTM Our brain includes the following steps. Forget component Input component Input activation component Internal state component Output component Final output component Forget gate Input gate Input activation gate Input state gate Output gate Final output gate RNN 37

  38. xt-1 yt-1 Timestamp = t-1 RNN st-1 xt Timestamp = t RNN yt st xt+1 Timestamp = t+1 yt+1 RNN RNN 38

  39. xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 xt Timestamp = t Traditional LSTM yt st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 39

  40. xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 xt Timestamp = t Memory Unit Traditional LSTM yt st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 40

  41. 0.7 0.3 Wf = bf = 0.4 xt-1 yt-1 Timestamp = t-1 Traditional LSTM 0.4 st-1 ft xt Timestamp = t ft = (Wf [xt, yt-1] + bf) Forget gate ft = (Wf [xt, yt-1] + bf) Sigmoid function (net) yt 1 y = 1 + e-net st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 41

  42. 0.2 0.3 bi = 0.2 Wi = xt-1 yt-1 Traditional LSTM 0.4 Timestamp = t-1 st-1 ft xt Timestamp = t it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) Input gate it = (Wi [xt, yt-1] + bi) yt st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 42

  43. 0.4 0.2 ba = 0.5 Wa = xt-1 yt-1 Timestamp = t-1 Traditional LSTM 0.1 st-1 ft xt Timestamp = t it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at tanh at = tanh(Wa [xt, yt-1] + ba) Input activation gate yt at = tanh(Wa [xt, yt-1] + ba) st tanh function tanh(net) xt+1 Timestamp = t+1 yt+1 Traditional LSTM y = e2 net 1 e2 net + 1 RNN 43

  44. xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 ft xt Timestamp = t it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at tanh at = tanh(Wa [xt, yt-1] + ba) yt st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 44

  45. xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 ft xt Timestamp = t x it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at + tanh x at = tanh(Wa [xt, yt-1] + ba) st = ft. st-1 + it. at Internal state gate yt st = ft. st-1 + it. at st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 45

  46. xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 ft xt Timestamp = t x it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at + tanh x at = tanh(Wa [xt, yt-1] + ba) st = ft. st-1 + it. at yt st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 46

  47. 0.8 0.9 bo = 0.3 Wo = xt-1 yt-1 Timestamp = t-1 Traditional LSTM 0.2 st-1 ft xt Timestamp = t x it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at + tanh x at = tanh(Wa [xt, yt-1] + ba) ot st = ft. st-1 + it. at yt ot = (Wo [xt, yt-1] + bo) Output state gate st ot = (Wo [xt, yt-1] + bo) xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 47

  48. xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 ft xt Timestamp = t x it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at + tanh x at = tanh(Wa [xt, yt-1] + ba) ot st = ft. st-1 + it. at tanh yt ot = (Wo [xt, yt-1] + bo) x yt = ot. tanh(st) Final Output state gate st xt+1 Timestamp = t+1 yt+1 yt = ot. tanh(st) Traditional LSTM RNN 48

  49. xt-1 yt-1 Timestamp = t-1 Traditional LSTM st-1 ft xt Timestamp = t x it ft = (Wf [xt, yt-1] + bf) it = (Wi [xt, yt-1] + bi) at + tanh x at = tanh(Wa [xt, yt-1] + ba) ot st = ft. st-1 + it. at tanh yt ot = (Wo [xt, yt-1] + bo) x yt = ot. tanh(st) st xt+1 Timestamp = t+1 yt+1 Traditional LSTM RNN 49

  50. In the following, we want to compute (weight) values in the traditional LSTM. Similar to the neural network, the traditional LSTM model has two steps. Step 1 (Input Forward Propagation) Step 2 (Error Backward Propagation) In the following, we focus on Input Forward Propagation . In the traditional LSTM, Error Backward Propagation could be solved by an existing optimization tool (like Neural Network ). RNN 50

Related


More Related Content