
Understanding Receptive Fields in Deep Learning
Delve into the importance of receptive fields in deep learning models such as ResNets, exploring how each convolutional layer adds to the receptive field size and the solutions for handling large image inputs efficiently. Discover the evolution from traditional convolutional networks to advanced techniques like Transformers. Uncover insights on achieving depth in models and enhancing training efficiency for better generalization.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
CS 7150 Deep Learning
Link to student presentation https://docs.google.com/presentation/d/1UqelvVWBRpaNc7655ysjn 8ybnbgkfucxBuwTB_Kr4qY/edit?usp=sharing
Historical Context In the wake of AlexNet How to achieve depth > 5 layers? How to train with better generalization? How to converge faster? A torrent of innovations Fully convolutional networks Batchnorm Residual connections Today: Transformers are still residual networks.
Topics Receptive field Highway networks
Receptive fields For convolution with kernel size K, each element in the output depends on a K x K receptive field in the input Input Output Slide credit: D Fouhey & J Johnson
Receptive fields Each successive convolution adds K 1 to the receptive field size With L layers the receptive field size is 1 + L * (K 1) Input Output Careful receptive field in the input vs receptive field in the previous layer Hopefully clear from context! Slide credit: D Fouhey & J Johnson
Receptive fields Each successive convolution adds K 1 to the receptive field size With L layers the receptive field size is 1 + L * (K 1) Input Output Problem: For large images we need many layers for each output to see the whole image image Slide credit: D Fouhey & J Johnson
Receptive fields Each successive convolution adds K 1 to the receptive field size With L layers the receptive field size is 1 + L * (K 1) Input Output Problem: For large images we need many layers for each output to see the whole image image Solution: Downsample inside the network Slide credit: D Fouhey & J Johnson
Batch Normalization Problem: Estimates depend on minibatch; can t do this at test-time! ? ??=1 ? ? ? Per-channel mean, shape is D ? ??,? Input: ?=1 ? 2=1 Learnable scale and shift parameters: ?,? ? 2 Per-channel std, shape is D ?? ? ??,? ?? ?=1 ??,?=??,? ?? Normalized x, Shape is N x D 2+ ? ?? Learning ? = ?, ? = ? will recover the identity function (in expectation) Output, Shape is N x D ??,?= ?? ??,?+ ?? Slide credit: D Fouhey & J Johnson
Batch Normalization: Test-Time ? ??=1 (Running) average of values seen during ? ? ? Per-channel mean, shape is D ? training ??,? Input: ?=1 ? 2=1 Learnable scale and shift parameters: ?,? ? 2 (Running) average of Per-channel std, shape is D ?? ? values seen during training ??,? ?? ?=1 ??,?=??,? ?? Normalized x, Shape is N x D 2+ ? ?? Learning ? = ?, ? = ? will recover the identity function (in expectation) Output, Shape is N x D ??,?= ?? ??,?+ ?? Slide credit: D Fouhey & J Johnson
Batch Normalization: Test-Time ? ??=1 (Running) average of values seen during Input: ? ? ? Per-channel mean, shape is D ? training ??,? ?=1 ? 2=1 Learnable scale and shift parameters: ?,? ? 2 (Running) average of Per-channel std, shape is D ?? ? values seen during training ??,? ?? ?=1 ??,?=??,? ?? Normalized x, Shape is N x D 2+ ? ?? During testing batchnorm becomes a linear operator! Can be fused with the previous fully-connected or conv layer Output, Shape is N x D ??,?= ?? ??,?+ ?? Slide credit: D Fouhey & J Johnson
Residual Networks Solution: Change the network so learning identity functions with extra layers is easy! relu H(x) F(x) + x conv conv Additive shortcut relu F(x) relu conv conv X X Plain block Residual Block He et al, Deep Residual Learning for Image Recognition , CVPR 2016 Slide credit: D Fouhey & J Johnson
Residual Networks Solution: Change the network so learning identity functions with extra layers is easy! relu H(x) F(x) + x If you set these to 0, the whole block will compute the identity function! conv conv Additive shortcut relu F(x) relu conv conv X X Plain block Residual Block He et al, Deep Residual Learning for Image Recognition , CVPR 2016 Slide credit: D Fouhey & J Johnson
Residual Networks Softmax FC 1000 Pool A residual network is a stack of many residual blocks 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 relu 3x3 conv, 512 3x3 conv, 512, /2 Regular design, like VGG: each residual block has two 3x3 conv F(x) + x .. . 3x3 conv, 128 3x3 conv, 128 3x3 conv 3x3 conv, 128 3x3 conv, 128 relu F(x) 3x3 conv, 128 3x3 conv, 128, / 2 3x3 conv Network is divided into stages: the first block of each stage halves the resolution (with stride-2 conv) and doubles the number of channels 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 X Residual block 3x3 conv, 64 3x3 conv, 64 Pool 7x7 conv, 64, / 2 Input He et al, Deep Residual Learning for Image Recognition , CVPR 2016 Slide credit: D Fouhey & J Johnson