L7: Neural Network 101 — DNN and GNN

Slide Note
Embed
Share

Basics of neural networks including DNN and GNN Cong, their optimization opportunities, and their applications in machine learning. Presented by Callie Hao, Assistant Professor at Georgia Institute of Technology.


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.



Uploaded on Dec 21, 2023 | 2 Views


Presentation Transcript


  1. L7: Neural Network 101 DNN and GNN Cong (Callie) Hao callie.hao@ece.gatech.edu Assistant Professor ECE, Georgia Institute of Technology Sharc-lab @ Georgia Tech https://sharclab.ece.gatech.edu/

  2. Outline Introduction to Machine Learning Neural Networks o Multi-Layer Perceptron (MLP) o Convolution Neural Network (CNN) o Recurrent Neural Network (RNN) o Graph Neural Network (GNN) Optimization Opportunities Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 2

  3. Why Machine Learning? Why Now? Big Data o Large unstructured data sets flood us everyday Data Science o Extract knowledge/insight from data Machine Learning o For specific tasks, resembles human intelligence Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 3

  4. Overview Of The Machine Learning Process Training: Train the desired model, let machine learn intelligence o Computationally intensive o Huge amount of data, long training time ImageNet: millions of training data. Weeks to train o GPUs, ASICs Inference: Infers things about new data based on its training o Computationally intensive o Real time Mobile device, IoTs o ASICs, GPUs, FPGAs o Where most acceleration work focuses But we shouldn t ignore training! Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 4

  5. Machine Learning Training is Heavy Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 5

  6. Outline Introduction to Machine Learning Neural Networks o Multi-Layer Perceptron (MLP) o Convolution Neural Network (CNN) o Recurrent Neural Network (RNN) o Graph Neural Network (GNN) Optimization Opportunities Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 6

  7. Classification Identify to which of a set of categories an observation belongs o Or rather, which category is the most dominant Classification process o Provide examples of classes Training data (label) o Adjust model weights for different input data Training process o Assign all new input data to the model Make prediction for classification Linear Classification Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 7

  8. What about non-linear separable? Requires non-linearity o Called activation function Rectified Linear Unit (ReLU) This is a Neural Network! Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 8

  9. Basic Perceptron Basic Perceptron https://pub.towardsai.net/perceptron-a-basic-neural-network-model-for-deep-learning-21aea56e3216 Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 9

  10. Multi-Layer Perceptron (MLP) Terminology o Input layer, Hidden layer(s), Output layer o Deep Neural Network (DNN): more than one hidden layers o X: Input feature, W: Weights (Filter), b: bias, Y: Label o Activation function (at hidden layer) Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 10

  11. Forward and Backward Propagation Forward propagation (FP) o Given the parameters and the input data, compute the label o Happens in inference Backward propagation (BP) o To determine the desired model parameters (weights) during training o Given the input data (X) and the target label (T), how to compute the parameters (W and b) o Happens in training Training needs both FP and BP Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 11

  12. Forward Propagation Computation Take a 2-layer MLP as an example ? ? ? ? ?1+ ?1 ?2+ ?2 + ?3 = ? Vector-matrix multiplication Activation function Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 12

  13. Different Type of Neural Networks Real-world ML models are much more complicated than MLP o Large number of layers Deep Neural Networks (DNN) o On image data with weight sharing Convolution Neural Networks (CNN) o With time dependence Recurrent Neural Networks (RNN) o On graph data Graph Neural Networks (GNN) o With complex valued weights and activations Complex Valued Neural Network (CVNN) Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 13

  14. Outline Introduction to Machine Learning Neural Networks o Multi-Layer Perceptron (MLP) o Convolution Neural Network (CNN) https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks o Recurrent Neural Network (RNN) o Graph Neural Network (GNN) Optimization Opportunities Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 14

  15. Convolution Neural Network MLP for image? o Consider an image of size 250 x 250 o Vectorize the 2D image to a 1D vector as input feature o For each hidden node, it requires 250x250 weights ~ 62,500 o How about multiple hidden layer? Bigger image? o Too many weights, computational and memory expensive Can we better utilize the image features and also reduce the number of weights? Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 15

  16. CNN: Translation Invariant Observation: even the input image shifts, the output (classification) stays unchanged. o A CAT is still a CAT! o If a feature is useful in some locations during training, detectors for that feature will be available in all locations during testing. Translation Invariant: if a detector (filter) learnt a useful feature to detect 'CAT', it will capture 'CAT' wherever its location is in an image at testing time Udacity Course 730, Deep Learning (L3 Convolutional Neural Networks). Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 16

  17. Basic Structure of CNN https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 17

  18. CNN Terminologies Filter (Kernel) 3x3, 5x5, etc. Receptive Field Fully-Connected (FC) Layers Feature Maps (Activations, Tensors) Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 18

  19. A Closer Look at CNN Computation ? ? + ? ? + ? ? + ? ? + ? ? + ? ? + ? ? + ? ? + ? ? a f k p u b g l q v w x c h m n r d i e J o t y 1 4 5 6 7 8 9 2 3 s Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 19

  20. A Closer Look at CNN Computation ? ? + ? ? + ? ? + ? ? + ? ? + ? ? + ? ? + ? ? + ? ? a f k p u b g l q v w x c h m n r d i e J o t y 1 4 5 6 7 8 9 2 3 s Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 20

  21. A Closer Look at CNN Computation ? ? + ? ? + ? ? + ? ? + ? ? + ? ? + ? ? + ? ? + ? ? a f k p u b g l q v w x c h m n r d i e J o t y 1 4 5 6 7 8 9 2 3 s One output channel One input channel One kernel channel Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 21

  22. CNN Computation with Multiple Channels a3 b3 c3 d3 e3 a2 b2 c2 d2 e2 ( ) f3 b1 g3 c1 h3 d1 i3 e1 j3 1.3 2.3 3.3 a1 f2 g2 m3 n3 h2 m2 n2 i1 m1 n1 i2 j2 k3 g1 l3 h1 o3 1.2 2.2 3.2 4.3 5.3 6.3 4.2 5.2 6.2 2.1 3.1 4.1 5.1 6.1 f1 j1 k2 l2 o2 1.1 p3 l1 q3 r3 s3 o1 t3 7.3 8.3 9.3 7.2 8.2 9.2 7.1 8.1 9.1 k1 p2 v3 w3 x3 q2 r2 v2 w2 x2 r1 v1 w1 x1 s2 t2 u3 q1 y3 p1 s1 t1 u2 y2 u1 y1 1 output channel 3 kernel channel 3 input channel Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 22

  23. CNN Computation with Multiple Channels for(int h = 0; h < 5; h++) { for(int w = 0; w < 5; w++) { float sum = 0; for(int ci = 0; ci < 3; ci++) { for(int m = 0; m < 3; m++) { for(int n = 0; n < 3; n++) { sum += A[ci][h+m][w+n] * W[ci][m][n]; } } } B[h][w] = sum; } } a3 b3 c3 d3 e3 f3 g3 h3 i3 j3 1.3 2.3 3.3 a2 b2 c2 d2 e2 k3 b1 l3 c1 m3 d1 n3 e1 o3 f2 g2 h2 i2 j2 1.2 2.2 3.2 4.3 5.3 6.3 a1 p3 g1 q3 h1 r3 i1 s3 j1 t3 k2 l2 m2 n2 o2 1.1 2.1 3.1 4.2 5.2 8.3 6.2 7.3 9.3 f1 u3 l1 v3 m1 w3 n1 x3 o1 y3 p2 q2 r2 s2 t2 4.1 5.1 6.1 7.2 8.2 9.2 k1 u2 v2 w2 x2 y2 7.1 8.1 9.1 p1 q1 r1 s1 t1 u1 v1 w1 x1 y1 Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 23

  24. CNN Computation with Multiple Channels Kernel 1 1.3 2.3 3.3 1.2 2.2 3.2 1.1 2.1 3.1 a3 b3 c3 d3 e3 4.3 5.3 6.3 4.2 5.2 6.2 4.1 5.1 6.1 f3 g3 h3 i3 j3 a2 b2 c2 d2 e2 7.3 8.3 9.3 7.2 8.2 9.2 k3 b1 l3 c1 m3 d1 n3 e1 o3 7.1 8.1 9.1 f2 g2 h2 i2 j2 a1 Output Channel 1 p3 g1 q3 h1 r3 i1 s3 j1 t3 k2 l2 m2 n2 o2 f1 u3 l1 v3 m1 w3 n1 x3 o1 y3 p2 q2 r2 s2 t2 k1 1.3 2.3 3.3 1.2 2.2 3.2 u2 v2 w2 x2 y2 1.1 2.1 3.1 p1 q1 r1 s1 t1 4.3 5.3 6.3 4.2 5.2 6.2 4.1 5.1 6.1 u1 v1 w1 x1 y1 7.3 8.3 9.3 7.2 8.2 9.2 7.1 8.1 9.1 3 input channel Output Channel 2 Kernel 2 Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 24

  25. Outline Introduction to Machine Learning Neural Networks o Multi-Layer Perceptron (MLP) o Convolution Neural Network (CNN) o Recurrent Neural Network (RNN) o Graph Neural Network (GNN) Optimization Opportunities Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 25

  26. Recurrent Neural Network So far, the neural network only considers the current input. What about time? We assume all inputs are independent, this may not be good assumption for certain tasks o Time sequence, for example, speech and handwriting Input is a sequence, how to learn the sequential information? o Make the neural network considers the previous output together with the current input This is a sentence Callie is teaching her class Dies ist ein Satz ??? Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 26

  27. Recurrent Neural Network A class of neural network with backward edges o Can learn sequential information o Indirectly factor in all previous inputs o Vanishing effect: The input long time ago has very little effect o Only short-termmemory, like human (for example myself) [image source] https://wiki.tum.de/display/lfdv/Recurrent+Neural+Networks+- +Combination+of+RNN+and+CNN Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 27

  28. Long Short-Term Memory (LSTM) How about long-term memory? Can a neural network learn the pattern of both type of memories? o Long + Short Instead of only using the current input and previous output, also add a memory component o If the memory brings information of long time ago, it will directly affect the current output Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 28

  29. Long Short-Term Memory (LSTM) Long-term Memory Previous Output (short-term) Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 29

  30. Outline Introduction to Machine Learning Neural Networks o Multi-Layer Perceptron (MLP) o Convolution Neural Network (CNN) o Recurrent Neural Network (RNN) o Graph Neural Network (GNN) Optimization Opportunities Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 30

  31. Graph Neural Network (GNN) Traditional neural networks are designed for simple sequences & grids Speech/Text [Slide credit: http://web.stanford.edu/class/cs224w] Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 31

  32. Graph Neural Network (GNN) Reality: A lot of real-world data does not live on grids o Arbitrary size and complex topological structure o No fixed node ordering or reference point o Often dynamic and have multimodal features Image credit: Madhavicmu / Wikimedia Commons/CC-BY-SA-4.0 Economic Networks Protein Interaction Networks Social Networks Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 32

  33. Graph Neural Network (GNN) Main Idea: Pass massages between pairs of nodes and agglomerate [Slide credit: Structured deep models: Deep learning on graphs and beyond] Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 33

  34. How is GNN Computed? Key idea: Generate node embeddings based on local network neighborhoods o Node embedding: a vector to represent the node features MLP MLP MLP MLP [Slide credit: http://web.stanford.edu/class/cs224w] Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 34

  35. Outline Introduction to Machine Learning Neural Networks o Multi-Layer Perceptron (MLP) o Convolution Neural Network (CNN) o Recurrent Neural Network (RNN) o Graph Neural Network (GNN) Optimization Opportunities Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 35

  36. Opportunities for ML Acceleration Four basic principles: o Specialization o Parallelism o Memory localization and optimization o Reducing overhead What else? o Huge model size o And it s getting crazier! Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 36

  37. Huge ML Model Size https://moon-walker.medium.com/ai-service%EB%A5%BC-%EC%9C%84%ED%95%9C-%ED%98%84%EC%8B%A4%EC%A0%81%EC%9D%B8-%EC%A0%91%EA%B7%BC-%EB%B0%A9%EB%B2%95- 3-massive-ai-inference-94f75b0fc64f Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 37

  38. Model Compression https://neuralmagic.com/blog/pruning-overview/ Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 38

  39. Data Quantization From 32-bit floating point (FP32) to 8-bit integer (INT8) Intuition: 1.0001 and 0.9992 (in most cases) make no difference in ML Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 39

  40. Data Quantization https://developer.nvidia.com/blog/achieving-fp32-accuracy-for-int8- inference-using-quantization-aware-training-with-tensorrt/ Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 40

  41. Summary Introduction to ML MLP, CNN, RNN, GNN ML Acceleration Opportunities Callie Hao | Sharc-lab @ Georgia Institute of Technology https://sharclab.ece.gatech.edu/ 41

Related