Understanding Backpropagation in Adaptive Networks

1 / 14

Embed Share

Learn about backpropagation (BP) in adaptive networks, a systematic way to compute gradients, reinvented in 1986 for multilayer perceptrons (MLP), and its applications in two-layer and three-layer adaptive networks.

jiairsk Follow

Uploaded on Apr 19, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Backpropagation (BP) J.-S. Roger Jang ( ) jang@mirlab.org http://mirlab.org/jang MIR Lab, CSIE Dept. National Taiwan University 2025/4/19

Basics in Differentiation Definition y = f x y =?? ? ?= lim f x+ ? f(x) ? ??= lim ? 0 ? 0 Basic formulas y = xm y = mxm 1 y = ex y = ex y = ln(x) y = 1/x Extensions y = f g x g x Chain rule: y = f g x Multiplication: y = f x g x y = f x g x + f(x)g (x) 2/15

Adaptive Networks About adaptive networks A feedforward network where each node can have any function. MLP is only a special case x Node 1: ? = ? + ??? Node 2: ? = ln ?? + ? Node 3: ? = ?2+ ?3/? 1 u o 3 Network input: ?,? Network output: ? Network parameters: ?,?,? Overall function: v y 2 ? = ?( ?,? ; , , ) ?????? ?????? ?????????? 3/15

Simple Derivatives Review of derivatives ? = ? ? ?? ??= ? ? Network representation x y f(.) 4/15

Chain Rule for One-input Composite Functions Chain rule for one-input composite functions: ?? ??=?? ?? ??=?? ? ?? ? ?? ? = ? ? ? = ? ? ? = ? ? ? ?? ?? Network representation y x z f(.) g(.) 5/15

Chain Rule for Two-input Composite Functions Chain rule for two-input composite functions: ? = ? ? ? = ? ? ? = ?,? ?? ??=?? ?? ??+?? ?? ?? ? = ? ? ,? ? ?? ?? Network representation y f(.) u h(. , .) x g(.) z 6/15

Backpropagation in Adaptive Networks Backpropagation (BP) A systematic way to compute gradient from output toward input in an adaptive network. Reinvented in 1986 for MLP. AKA ordered derivatives. A way to compute gradient, not gradient descent itself. 7/15

BP in Two-layer Adaptive Networks x 1 u o 3 v y 2 Node 1: ? = ? + ??? Node 2: ? = ln ?? + ? Node 3: ? = ?2+ ?3/? 8/15

BP in Three-layer Adaptive Networks p x 1 3 u o 5 v y 2 4 q = + y Node : 1 p x ln e ( ) = + Node 2 : q x y ( pq ) = + 2 3 Node : 3 / u p q = Node 4 : v o = + Node 5 : ln ln u v v u 9/15

BP from Next Layer to Current Layer General formula for backpropagation, assuming o is the network s final output is a parameter in node 1 x Backpropagation! y1 ?? ??= ?? ??1 ??1 ??+?? ?? ??=?? ??2 ??+?? ?? ?? ??3 ?? 4 1 ??2 ??3 y2 ?? 5 2 ?? ??:Derivative in current layer ?? ??1, y3 ?? ??2, ?? ??3: Derivative in next layer 6 3 10/15

BP in Two-layer MLP x1 1 y1 o 1 y2 x2 2 1 x ( ) = + + = y x x 1 11 1 12 2 1 + + + ( ) x 1 e 11 1 12 2 1 1 x ( ) = + + = y x x 2 21 1 22 2 2 + + + ( ) x 1 e 21 1 22 2 2 1 y ( ) = + + = o y y 11 1 12 2 1 + + + ( ) y 1 e 11 1 12 2 1 12/15

BP in Three-layer MLP y1 x1 1 z1 1 o 1 x2 2 y2 z2 2 y3 x3 3 1 ( ) = + + + = y x x x 1 1 2 2 3 3 i i i i i + + + + ( ) x x x 1 e 1 1 2 2 3 3 i i i i 1 ( ) = + + + = z y y y 1 1 2 2 3 3 i i i i i + + + + ( ) y y y 1 e 1 1 2 2 3 3 i i i i 1 z ( ) = + + = o z z 11 1 12 2 1 + + + ( ) z 1 e 13/15 11 1 12 2 1

Gradient Vanishing in DNN Gradient vanishing due to cascaded sigmoidal functions x1 x4 ?2= ?(?2?1+ ?2) ?3= ?(?3?2+ ?3) ?4= ?(?4?3+ ?4) y = ? x y = y(1 y) 1 4 ??4 ??1 =??2 ??1 =?2(1-?2)?2?3(1 ?3)?3?4(1-?4)?4 1 4?2 ??3 ??2 ??4 ??3 1 4?3 1 4?4= 1 43?2?3?4 Solutions Different learning rates for different layers Skip connections 14/15

Exercises Express the derivative y in terms of y: Derive the derivative of tanh(x/2) in terms of sigmoid(x) Express tanh(x/2) in terms of sigmoid(x). Given y=sigmoid(x) and y =y(1-y), find the derivative of tanh(x/2). 15/15

Understanding Backpropagation in Adaptive Networks

Download Presentation

Presentation Transcript

Related

More Related Content