Artificial Neural Networks (ANN) and Perceptron in Machine Learning

 
Artificial Neural
Networks
 
What are Artificial Neural
Networks (ANN)?
 
"Colored neural network" by Glosser.ca - Own work, Derivative of File:Artificial neural network.svg. Licensed under CC BY-
SA 3.0 via Commons -
https://commons.wikimedia.org/wiki/File:Colored_neural_network.svg#/media/File:Colored_neural_network.svg
Why ANN?
 
Why ANN?
 
Why ANN?
 
 
Why ANN?
 
Nature of target function is unknown.
 
Interpretability of function is not important.
 
Slow training time is ok.
 
Perceptron
 
LMS / Delta Rule for Learning a
Perceptron model
 
LMS / Delta Rule for Learning a
Perceptron model
 
Learning rate
 
Demo of Simple Synthetic Dataset
 
Demo of Simple Synthetic Dataset
 
Convex problem! Guaranteed convergence!
Problems with Perceptron ANN
 
Only works for linearly separable data.
Solution? – multi layer
 
 
Very large terabyte dataset. A single gradient
computation will take days 
Solution? – Stochastic Gradient Descent
 
Stochastic Gradient Descent (SGD)
 
Non-Linear Decision Boundary?
 
Derivative of sigmoid?
Questions about the Sigmoid
Unit?
 
How do we connect the neurons?
For this lesson, linear chain – 
multilayer feedforward.
   Outside this lesson:-
 
Pretty much anything you like
 
How do we train?
 
Backpropagation algorithm
 
Backpropagation Algorithm
 
Each layer does two things
Compute the derivative of E
w.r.t. its parameters.
Why?
 
Compute the derivative of E
w.r.t. its input.
The reason for this will
be obvious when we do
it.
 
Dealing with Vector Data
 
Partial derivatives change to
gradients.
 
Scalar multiplication changes
to vector matrix products or
sometimes even tensor
vector products.
 
Problems
 
Sigmoid units – many of them – vanishing gradients
ReLU units, pretraining using unsupervised learning.
 
Local optimum – non convex problem
Momentum, SGD, small weights for initialization
 
Overfitting
Use validation data for early stopping, weight decay.
 
Lots of parameter tuning
Use several thousand computers to try several parameters and pick
the best.
 
Lack of Interpretability
 
Demo on Face Pose Estimation
 
Demo on Face Pose Estimation
 
Input representation
Downsample image and divide by 255.
 
Output representation
1 of 4 encoding
 
Other learning parameters
Learning rate – 0.3, momentum – 0
Single sample SGD.
 
Let’s see the code
 
Demo on Face Pose Estimation
 
Layer 2
weights
 
Layer 1
weights
 
Right
 
Left
 
Up
 
Straight
 
Expressive Power
 
Two layers of sigmoid units – any Boolean function.
 
Two layer network with sigmoid units in the hidden
layer and (unthresholded) linear units in the output
layer - Any bounded continuous function. (Cybenko
1989, Hornik et. al. 1989)
 
A network of three layers, where the output layer again
has linear units - Any function. (Cybenko 1988).
 
So multi layer Sigmoid Units are the ultimate
supervised learning thing - right?  Nope 
 
Deep Learning
 
Sigmoid ANNs need to be very fat.
 
Instead we can go deep and thin. But then we have
vanishing gradients!
 
Use ReLUs.
 
Still too Many Parameters
 
1 Megapixel image over 1000 categories. A single
layer network will itself need 1 billion parameters.
 
Convolutional Neural Networks help us scale to
large images with very few parameters.
 
Convolutional Neural Network
 
AlexNet: ImageNet classification challenge ILSVRC 2012 winner
 
Benefits of CNNs
 
The number of weights is now much less than 1
million for a 1 mega pixel image.
The small number of weights can use different
parts of the image as training data. Thus we have
several orders of magnitude more data to train the
fewer number of weights.
We can get translation invariance for free.
Fewer parameters take less memory and thus all
the computations can be carried out in memory in
a GPU or across multiple processors.
 
CNN toolboxes
 
MatConvNet – matlab; based focussing primarily on
2D images.
Tensorflow – python; supports a wide variety of
neural networks.
Torch – lua (pytorch for python); supports a lot of
things but not very popular for computer vision.
Caffe – De-facto framework for image classification
like problems.
lots of others … (mxnet, theano, lasagna, keras).
 
Thank you
 
Feel free to email me your questions at
aravindh.mahendran@new.ox.ac.uk
 
 
Strongly recommend this
book for basics
 
References
 
Cybenko 1989 - 
https://www.dartmouth.edu/~gvc/Cybenko_MCSS.pdf
Cybenko 1988 – Continuous Valued Neural Networks with two Hidden Layers
are Sufficient (Technical Report), Department of Computer Science, Tufts
University, Medford, MA
Fukushima 1980 -
http://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushi
ma1980.pdf
Hinton 2006 - 
http://www.cs.toronto.edu/~fritz/absps/ncfast.pdf
Hornick et. al. 1989 -
http://www.sciencedirect.com/science/article/pii/0893608089900208
Krizhevsky et. al. 2012 - 
http://papers.nips.cc/paper/4824-imagenet-
classification-with-deep-convolutional-neural-networks.pdf
Lecun 1998 - 
http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf
Tom Mitchell, Machine Learning, 1997
"MatConvNet - Convolutional Neural Networks for MATLAB", A. Vedaldi and K.
Lenc, 
Proc. of the ACM Int. Conf. on Multimedia
, 2015.
Slide Note
Embed
Share

Artificial Neural Networks (ANN) are a key component of machine learning, used for tasks like image recognition and natural language processing. The Perceptron model is a building block of ANNs, learning from data to make predictions. The LMS/Delta Rule is utilized to adjust model parameters during training. Challenges like linear separability are addressed through multi-layer networks and techniques like Stochastic Gradient Descent. Explore the intricacies of ANN to unlock their potential in solving complex problems.

  • Neural Networks
  • Machine Learning
  • Perceptron
  • Stochastic Gradient Descent
  • LMS/Delta Rule

Uploaded on Sep 19, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Artificial Neural Networks

  2. What are Artificial Neural Networks (ANN)? "Colored neural network" by Glosser.ca - Own work, Derivative of File:Artificial neural network.svg. Licensed under CC BY- SA 3.0 via Commons - https://commons.wikimedia.org/wiki/File:Colored_neural_network.svg#/media/File:Colored_neural_network.svg

  3. Why ANN?

  4. Why ANN?

  5. Why ANN?

  6. Why ANN? Nature of target function is unknown. Interpretability of function is not important. Slow training time is ok.

  7. Perceptron

  8. LMS / Delta Rule for Learning a Perceptron model Need to learn the w parameter for a given problem. Delta rule is not the perceptron rule. Perceptron rule is rarely used now a days. This is gradient descent. ? 1 2 ??= 2? ?? ?? ?=1 Loss Function

  9. LMS / Delta Rule for Learning a Perceptron model Initialize w to small random values Repeat until satisfied ? ???? =? ?? ? ?=? ??( ??) ? ? ????? Learning rate

  10. Demo of Simple Synthetic Dataset

  11. Demo of Simple Synthetic Dataset Convex problem! Guaranteed convergence!

  12. Problems with Perceptron ANN Only works for linearly separable data. Solution? multi layer Very large terabyte dataset. A single gradient computation will take days Solution? Stochastic Gradient Descent

  13. Stochastic Gradient Descent (SGD) Approximate the gradient with a small number of examples may be just 1 data point. ? 1 2 ??= 2? ?? ?? ?=1 Can prove it is arbitrarily close to the true gradient for small enough learning rate. Try modifying the demo code at home to implement SGD.

  14. Non-Linear Decision Boundary? Derivative of sigmoid?

  15. Questions about the Sigmoid Unit? How do we connect the neurons? For this lesson, linear chain multilayer feedforward. Input Layer 1 Layer 2 Outside this lesson:- Pretty much anything you like How do we train? Backpropagation algorithm

  16. Backpropagation Algorithm ? Each layer does two things Compute the derivative of E w.r.t. its parameters. Why? ?1= ?1? ?2= ? ?1 Compute the derivative of E w.r.t. its input. The reason for this will be obvious when we do it. ?3= ?3?2 ?4= ? ?3 ? =1 2 2? ?4

  17. Dealing with Vector Data ? Partial derivatives change to gradients. ?1= ?1? ?2= ? ?1 Scalar multiplication changes to vector matrix products or sometimes even tensor vector products. ?3= ?3?2 ?4= ? ?3 ? =1 2 2 ? ?4

  18. Problems Sigmoid units many of them vanishing gradients ReLU units, pretraining using unsupervised learning. Local optimum non convex problem Momentum, SGD, small weights for initialization Overfitting Use validation data for early stopping, weight decay. Lots of parameter tuning Use several thousand computers to try several parameters and pick the best. Lack of Interpretability

  19. Demo on Face Pose Estimation

  20. Demo on Face Pose Estimation Input representation Downsample image and divide by 255. Output representation 1 of 4 encoding Let s see the code Other learning parameters Learning rate 0.3, momentum 0 Single sample SGD.

  21. Demo on Face Pose Estimation Layer 1 weights Layer 2 weights Right Up Left Straight

  22. Expressive Power Two layers of sigmoid units any Boolean function. Two layer network with sigmoid units in the hidden layer and (unthresholded) linear units in the output layer - Any bounded continuous function. (Cybenko 1989, Hornik et. al. 1989) A network of three layers, where the output layer again has linear units - Any function. (Cybenko 1988). So multi layer Sigmoid Units are the ultimate supervised learning thing - right? Nope

  23. Deep Learning Sigmoid ANNs need to be very fat. Instead we can go deep and thin. But then we have vanishing gradients! Use ReLUs.

  24. Still too Many Parameters 1 Megapixel image over 1000 categories. A single layer network will itself need 1 billion parameters. Convolutional Neural Networks help us scale to large images with very few parameters.

  25. Convolutional Neural Network AlexNet: ImageNet classification challenge ILSVRC 2012 winner

  26. Benefits of CNNs The number of weights is now much less than 1 million for a 1 mega pixel image. The small number of weights can use different parts of the image as training data. Thus we have several orders of magnitude more data to train the fewer number of weights. We can get translation invariance for free. Fewer parameters take less memory and thus all the computations can be carried out in memory in a GPU or across multiple processors.

  27. CNN toolboxes MatConvNet matlab; based focussing primarily on 2D images. Tensorflow python; supports a wide variety of neural networks. Torch lua (pytorch for python); supports a lot of things but not very popular for computer vision. Caffe De-facto framework for image classification like problems. lots of others (mxnet, theano, lasagna, keras).

  28. Thank you Feel free to email me your questions at aravindh.mahendran@new.ox.ac.uk Strongly recommend this book for basics

  29. References Cybenko 1989 - https://www.dartmouth.edu/~gvc/Cybenko_MCSS.pdf Cybenko 1988 Continuous Valued Neural Networks with two Hidden Layers are Sufficient (Technical Report), Department of Computer Science, Tufts University, Medford, MA Fukushima 1980 - http://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushi ma1980.pdf Hinton 2006 - http://www.cs.toronto.edu/~fritz/absps/ncfast.pdf Hornick et. al. 1989 - http://www.sciencedirect.com/science/article/pii/0893608089900208 Krizhevsky et. al. 2012 - http://papers.nips.cc/paper/4824-imagenet- classification-with-deep-convolutional-neural-networks.pdf Lecun 1998 - http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf Tom Mitchell, Machine Learning, 1997 "MatConvNet - Convolutional Neural Networks for MATLAB", A. Vedaldi and K. Lenc, Proc. of the ACM Int. Conf. on Multimedia, 2015.

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#