Block-grained Scaling of Deep Neural Networks for Mobile Vision

 
1
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
1
 
Wireless Network Communication, Dr. Yu Wang
Agenda
 
1.
Introduction to DNN problem in Edge Computing
2.
Deep Compression
3.
Limitation of model-grained approach
4.
Introduction to LegoDNN architecture
5.
Paper contribution
6.
Evaluation (LegoDNN vs baseline models)
7.
Related Work
8.
Conclusion
9.
Question
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
2
Deep Neural Network (DNN)
 
Deep Neural Networks (DNN) generally optimize for
performance in terms of accuracy
 
As a result, DNN are huge and have parameters in the
order of millions
 
The very popular AlexNet has around 62 million
parameters, A trained AlexNet takes around 200MB of
space
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
3
 
Credit: Xu et al, 2019, Wikipedia Alexnet
Mobile vision systems are ubiquitous.
 
Mobile Vision Application:
Image Classification
Semantic Segmentation
Object Detection
Anomaly Detection
Pose Estimation
Action Recognition
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
4
 
The emergence of Al chipsets enables the on-
device deep learning on mobile vision systems.
 
Intel
Movidius VPU
 
Credit: https://github.com/LINC-BIT/legodnn
Mobile vision systems are ubiquitous.
 
Two Issues:
Memory
Battery
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
5
 
The emergence of Al chipsets enables the on-device deep learning on mobile vision systems.
 
Apple A11 Bionic
Neural Engine
 
Intel
Movidius VPU
 
WikiPedia: Apple A11, Raspberry Pi, Intel Movidius VPU
Motivation
 
Energy consumption is dominated by memory access.
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
6
 
Large networks do not fit in on-chip storage and hence require the more costly DRAM accesses.
If deep model were small enough to fit in SRAM, that would then reduce energy consumption.
 
 Han et al, Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding, ICLR 2016
Deep Compression
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
7
 
 Song Han et al, Deep Compression: Compressing Deep Neural Network with Puning, Trained Quantization and Huffman Coding, ICLR 2016
 
35 X decrease in AlexNet size from 240MB to 6.9MB
Deep Compression: Network Pruning
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
8
 
 Song Han et al, Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding, ICLR 2016
 
 Three steps:
Train the model: 
we start by learning the connectivity via normal
network training.
Prune the small-weight connections: 
all connections with weights
below a threshold are removed from the network
.
Retrain:
 Finally, we retrain the network to learn the final weights for
the remaining sparse connections
.
Deep Compression: Weight Sharing
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
9
 
 Song Han et al, Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding, ICLR 2016
 
 Each filter weight are partitioned into 
k
 clusters using 
K-means.
 
Reduce Weight Representation
Deep Compression: Quantization and Huffman Coding
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
10
 
 Song Han et al, Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding, ICLR 2016
 
 In their experiment (theoretically) they found no loss of accuracy when quantized upto 8-bits
instead of 32-bits
HUFFMAN CODING
popular weight will be assigned less bits for representation while rare weights will be
assigned larger bits.
 
The compression pipeline can save 35× to 49× parameter storage with no loss of accuracy.
Limitation of model-grained scaling
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
11
 
 
 We need to be able to 
scale the model size
 based on the available resources
The previous techniques suffer 
severe accuracy loss
 once its size is changed
 
 
 
O
n
l
i
n
e
 
S
c
a
l
i
n
g
Limitation of model-grained scaling
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
12
 
 
(a) Incurring large accuracy loss  (b) leaving considerable proportions of resources unused
Limitation of model-grained scaling
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
13
 
 
 An 
ideal
 scaling that fully uses the 
available resources
 requires a dauntingly 
large
number of descendant models
, which gives rise to two major challenges in practice:
First Challenge: 
construct large number of descendant models (e.g. 1 million)
using moderate amount of time such as hours.
Second Challenge:  
how to form the correct combination of these descendant
models to maximize accuracy (given the correct resources)
 
LegoDNN
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
14
 
https://github.com/LINC-BIT/legodnn
 
LegoDNN: framework based on block structure (ResNet),  Training environment represents block information
LegoDNN
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
15
 
https://github.com/LINC-BIT/legodnn
 
LegoDNN: Block are independent from other parts of the DNN, at runtime dynamic selection of descendant block is supported
Contribution of this paper
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
16
 
 
F
a
s
t
 
g
e
n
e
r
a
t
i
o
n
 
o
f
 
l
a
r
g
e
 
s
c
a
l
i
n
g
 
s
p
a
c
e
 
v
i
a
 
b
l
o
c
k
 
r
e
-
t
r
a
i
n
i
n
g
.
 
r
e
-
t
r
a
i
n
s
 
t
h
e
i
r
 
d
e
s
c
e
n
d
a
n
t
 
b
l
o
c
k
s
 
i
n
d
e
p
e
n
d
e
n
t
 
o
f
 
t
h
e
w
h
o
l
e
 
n
e
t
w
o
r
k
.
F
i
n
e
-
g
r
a
i
n
e
d
 
D
N
N
 
s
c
a
l
i
n
g
 
v
i
a
 
b
l
o
c
k
 
s
w
i
t
c
h
i
n
g
.
 
r
u
n
 
-
t
i
m
e
 
o
p
t
i
m
a
l
 
s
e
l
e
c
t
i
o
n
 
o
f
 
d
e
s
c
e
n
d
a
n
t
 
b
l
o
c
k
s
 
f
o
r
 
m
a
x
i
m
u
m
a
c
c
u
r
a
c
y
,
 
u
n
l
i
k
e
 
f
i
l
t
e
r
 
p
r
u
n
i
n
g
 
n
o
 
n
e
e
d
 
t
o
 
r
e
t
r
a
i
n
 
t
h
e
 
e
n
t
i
r
e
 
D
N
N
.
I
m
p
l
e
m
e
n
t
a
t
i
o
n
 
a
n
d
 
E
v
a
l
u
a
t
i
o
n
.
 
t
h
e
y
 
d
e
s
i
g
n
 
a
n
d
 
d
e
v
e
l
o
p
 
T
e
n
s
o
r
F
l
o
w
 
a
p
p
l
i
c
a
t
i
o
n
s
,
 
a
n
d
 
e
v
a
l
u
a
t
e
 
a
g
a
i
n
s
t
 
 
s
t
a
t
e
-
o
f
-
t
h
e
-
a
r
t
 
t
e
c
h
n
i
q
u
e
s
.
I also found a PyTorch repository which was created 3 months ago
 
Summary of experimental result:
1.
Implement and tested on smartphones (Huawei) and embedded devices (Raspberry Pi)
2.
Compare LegoNet to other models:
a.
1,296x to 279,936x more descendant models.
b.
76% less time for training than other techniques.
c.
31.74% increase in accuracy.
d.
Saved 71.07% energy on switching models.
3.
Testing multi-DNN scenario.
4.
Tested on multiple applications.
Section 2: Motivation - How to enlarge the model exploration spaces through blocks?
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
17
 
Limitation 1: limited exploration of model choices.
 Traditional model-grained scaling techniques treat a
DNN as a whole and transform it into several descendant models of different sizes
 
In three popular DNNs (VGG16, YOLOv3, and ResNet18), the block-grained scaling provides as much as 7K to 1.6M model sizes
by just transforming the original blocks into 5 to 40 descendant blocks.
Section 2: Motivation - How to accelerate the training of block-grain models?
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
18
 
Limitation 2: Resource intensive and time consuming training.
 Traditional model-grained accuracy is
fine, but it suffers from block generation time.
 
When handling large datasets such as ImageNet or COCO2017, the generation of a block takes several hours.
Section 2: Motivation - how to optimally assemble descendant block on the fly?
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
19
 
Limitation 3: Focus on off-line structure pruning.
 Given large search space of descendant models, an
efficient scaling approach is required to quickly find the optimal combination of blocks under run-time resource
dynamics.
Section 3: Design of LegoDNN
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
20
 
Overview: Offline vs Online
 
https://github.com/LINC-BIT/legodnn
Section 3: Design of LegoDNN
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
21
 
Overview: Offline stage
Block Extractor: 
extract blocks
from original model convolution
layers.
Descendant Block generator:
responsible for pruning and
generate multiple size descendant
block.
Block Retrainer:  
retrain
descendant models to improve
their accuracies (more about this
in next slide)
Block Profiler: 
module to
encapsulate statistical information
of the block:
Size
memory
Accuracy
Latency
 
 
https://github.com/LINC-BIT/legodnn
Section 3: Design of LegoDNN
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
22
 
Overview: offline - Block Retainer
 
 
Section 3: Design of LegoDNN
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
23
 
Overview: Online
Latency Profile: 
compute the latency reduction percentage of blocks of edge devices.
Scaling Optimizer: 
block selection to maximize accuracy and satisfies latency condition and memory limitations. (next slides)
Block Switcher:
 switch blocks at runtime by using optimization algorithm. (following Slides)
 
 
https://github.com/LINC-BIT/legodnn
Section 3: Design of LegoDNN
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
24
 
Single-DNN Optimization: 
at runtime scaling optimizer finds the optimal combination of descendant
blocks, under four constraints.
 
Accuracy
Latency
Model Size
Block Scale
Section 3: Design of LegoDNN
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
25
 
Overview: Online
Multi-DNN optimization: 
When considering 𝑚DNNs (𝑚>1), the optimization objective can be extended
as:
 
Section 3: Design of LegoDNN
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
26
 
Overview: Online Block Exchanges (Switcher)
Filter Pruning
 and 
Low rank decomposition
 directly replace the entire DNN (large I/O overhead)
NestDNN
 designed descendant model exchange to solve the previous issue but introduced two other issues:
Generation time increased by 50 % to 200% comparing filter pruning and low rank decomposition
When the model is loaded into the memory, new layers need to be constructed using deep learning libraries.
LegoDNN only need to exchange blocks rather than the entire DNN,  all block can be integrated without extra operation. (you can’t mix
and match descendant blocks with other blocks :) )
 
Section 4 EVALUATION
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
27
 
*  Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. Compression of deep convolutional neural networks for
fast and low power mobile applications. In ICLR’16, 2015
 
 Experimental Setup
Testbeds: 
Tested on four  different phones and embedded platforms.
Image Recognition: 
VGG16 and ResNet18, datasets used Cifar-10 (60K images and 10
classes and ImageNet2012 (1.2M and 1000 class) respectively
Real-Time Object Detection:
 YOLOv3 and COCO2017 0.33M images 1.5M object and 80
classes, mAP @ 0.5 over IoU threshold = 0.5
Multi-DNN Scenarios: 
three ranges
Small 1-6 running application
Medium 2-8 running application
Large 3-10 running application
Baselines Model: 
four state-of-the-art model-grained scaling techniques
Filter Pruning
NestDNN
Low rank decomposition
Knowledge distillation *
Section 4 EVALUATION
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
28
E
v
a
l
u
a
t
i
o
n
 
o
f
 
O
f
f
l
i
n
e
 
M
o
d
e
l
/
B
l
o
c
k
 
G
e
n
e
r
a
t
i
o
n
.
 
Time (hour) VGG16, ResNet18 and YOLOv3 have 5, 5, and 8 blocks respectively.
 
Evaluation of LegoDNN Components
Section 4 EVALUATION
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
29
E
v
a
l
u
a
t
i
o
n
 
o
f
 
A
c
c
u
r
a
c
y
/
L
a
t
e
n
c
y
 
T
r
a
d
e
o
f
f
.
 
Evaluation of LegoDNN Components
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
30
 
Comparison of inference accuracies between LegoDNN and model scaling approaches
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
31
 
Comparison of energy consumptions in model scaling
Section 4 EVALUATION
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
32
 
E
v
a
l
u
a
t
i
o
n
 
o
f
 
M
u
l
t
i
-
D
N
N
 
S
c
e
n
a
r
i
o
Two mobile mobile devices CPU based
smartphone and GPU-based embedded
device.
an application’s available resource decreases
when more co-running applications exist.
LegoDNN provides a block-level scaling by
just switching 3 blocks, thus still maintaining
accurate models (high prediction confidences)
with tighter resources.
This application is randomly selected from
VGG16, ResNet18, and YOLOv3
Object detection and Recognition have
different accuracy metrics they used “
the
percentage of loss in the average accuracy
Section 4 EVALUATION
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
33
P
e
r
f
o
r
m
a
n
c
e
 
u
n
d
e
r
 
D
i
f
f
e
r
e
n
t
 
L
o
a
d
s
.
 
Evaluation of Multi-DNN Scenario
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
34
 
Performance comparison under maximum accuracy and minimum latency scenarios
Section 5 DISCUSSIONS
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
35
A
p
p
l
i
c
a
b
i
l
i
t
y
 
o
f
 
L
e
g
o
D
N
N
 
t
o
 
D
N
N
s
.
 
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
36
 
Applicability of LegoDNN to seven categories of DNNs
Section 6 RELATED WORK
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
37
L
e
g
o
D
N
N
 
S
i
m
i
l
a
r
 
A
p
p
r
o
a
c
h
e
s
 
Section 7 CONCLUSION
Mina Gabriel - CIS  5639
LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision
38
P
a
p
e
r
 
C
o
n
c
l
u
s
i
o
n
 
 
39
 
QUESTIONS?
Slide Note
Embed
Share

This presentation explores the challenges of optimizing Deep Neural Networks (DNN) for mobile vision systems due to their large size and high energy consumption. The LegoDNN framework introduces a block-grained scaling approach to reduce memory access energy consumption by compressing DNNs. The agenda includes discussions on deep compression, limitations of model-grained methods, LegoDNN architecture, contributions, evaluations, and related work, emphasizing the importance of energy-efficient mobile vision applications.

  • Deep Neural Networks
  • Mobile Vision Systems
  • Energy Consumption
  • LegoDNN
  • Block-grained Scaling

Uploaded on Apr 05, 2024 | 8 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Wireless Network Communication, Dr. Yu Wang 1 1 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  2. Agenda 1. Introduction to DNN problem in Edge Computing 2. Deep Compression 3. Limitation of model-grained approach 4. Introduction to LegoDNN architecture 5. Paper contribution 6. Evaluation (LegoDNN vs baseline models) 7. Related Work 8. Conclusion 9. Question 2 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  3. Deep Neural Network (DNN) Deep Neural Networks (DNN) generally optimize for performance in terms of accuracy As a result, DNN are huge and have parameters in the order of millions The very popular AlexNet has around 62 million parameters, A trained AlexNet takes around 200MB of space Credit: Xu et al, 2019, Wikipedia Alexnet 3 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  4. Mobile vision systems are ubiquitous. Mobile Vision Application: Image Classification Semantic Segmentation Object Detection Anomaly Detection Pose Estimation Action Recognition Intel Movidius VPU The emergence of Al chipsets enables the on- device deep learning on mobile vision systems. Credit: https://github.com/LINC-BIT/legodnn 4 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  5. Mobile vision systems are ubiquitous. Two Issues: Memory Battery Apple A11 Bionic Neural Engine Intel Movidius VPU The emergence of Al chipsets enables the on-device deep learning on mobile vision systems. WikiPedia: Apple A11, Raspberry Pi, Intel Movidius VPU 5 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  6. Motivation Energy consumption is dominated by memory access. Operation Energy [pJ] 32 bit int ADD 0.1 32 bit float ADD 0.9 32 bit SRAM Cache 5 32 bit DRAM Memory 640 Large networks do not fit in on-chip storage and hence require the more costly DRAM accesses. If deep model were small enough to fit in SRAM, that would then reduce energy consumption. Han et al, Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding, ICLR 2016 6 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  7. Deep Compression 35 X decrease in AlexNet size from 240MB to 6.9MB Song Han et al, Deep Compression: Compressing Deep Neural Network with Puning, Trained Quantization and Huffman Coding, ICLR 2016 7 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  8. Deep Compression: Network Pruning Three steps: Train the model: we start by learning the connectivity via normal network training. Prune the small-weight connections: all connections with weights below a threshold are removed from the network. Retrain: Finally, we retrain the network to learn the final weights for the remaining sparse connections. Song Han et al, Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding, ICLR 2016 8 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  9. Deep Compression: Weight Sharing Each filter weight are partitioned into k clusters using K-means. Reduce Weight Representation Song Han et al, Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding, ICLR 2016 9 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  10. Deep Compression: Quantization and Huffman Coding In their experiment (theoretically) they found no loss of accuracy when quantized upto 8-bits instead of 32-bits HUFFMAN CODING popular weight will be assigned less bits for representation while rare weights will be assigned larger bits. The compression pipeline can save 35 to 49 parameter storage with no loss of accuracy. Song Han et al, Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding, ICLR 2016 10 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  11. Limitation of model-grained scaling We need to be able to scale the model size based on the available resources The previous techniques suffer severe accuracy loss once its size is changed Online Scaling To provide online scaling, these techniques need to first transform a DNN model into several descendant models of smaller sizes offline, and then select one of them to upgrade or downgrade the model size at run-time. 11 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  12. Limitation of model-grained scaling (a) Incurring large accuracy loss (b) leaving considerable proportions of resources unused 12 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  13. Limitation of model-grained scaling An ideal scaling that fully uses the available resources requires a dauntingly large number of descendant models, which gives rise to two major challenges in practice: First Challenge: construct large number of descendant models (e.g. 1 million) using moderate amount of time such as hours. Second Challenge: how to form the correct combination of these descendant models to maximize accuracy (given the correct resources) 13 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  14. LegoDNN LegoDNN: framework based on block structure (ResNet), Training environment represents block information https://github.com/LINC-BIT/legodnn 14 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  15. LegoDNN LegoDNN: Block are independent from other parts of the DNN, at runtime dynamic selection of descendant block is supported https://github.com/LINC-BIT/legodnn 15 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  16. Contribution of this paper Fast generation of large scaling space via block re-training. re-trains their descendant blocks independent of the whole network. Fine-grained DNN scaling via block switching. run -time optimal selection of descendant blocks for maximum accuracy, unlike filter pruning no need to retrain the entire DNN. Implementation and Evaluation. they design and develop TensorFlow applications, and evaluate against state-of- the-art techniques. I also found a PyTorch repository which was created 3 months ago Summary of experimental result: 1. 2. Implement and tested on smartphones (Huawei) and embedded devices (Raspberry Pi) Compare LegoNet to other models: a. 1,296x to 279,936x more descendant models. b. 76% less time for training than other techniques. c. 31.74% increase in accuracy. d. Saved 71.07% energy on switching models. Testing multi-DNN scenario. Tested on multiple applications. 3. 4. 16 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  17. Section 2: Motivation - How to enlarge the model exploration spaces through blocks? Limitation 1: limited exploration of model choices. Traditional model-grained scaling techniques treat a DNN as a whole and transform it into several descendant models of different sizes In three popular DNNs (VGG16, YOLOv3, and ResNet18), the block-grained scaling provides as much as 7K to 1.6M model sizes by just transforming the original blocks into 5 to 40 descendant blocks. 17 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  18. Section 2: Motivation - How to accelerate the training of block-grain models? Limitation 2: Resource intensive and time consuming training. Traditional model-grained accuracy is fine, but it suffers from block generation time. When handling large datasets such as ImageNet or COCO2017, the generation of a block takes several hours. 18 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  19. Section 2: Motivation - how to optimally assemble descendant block on the fly? Limitation 3: Focus on off-line structure pruning. Given large search space of descendant models, an efficient scaling approach is required to quickly find the optimal combination of blocks under run-time resource dynamics. 19 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  20. Section 3: Design of LegoDNN Overview: Offline vs Online https://github.com/LINC-BIT/legodnn 20 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  21. Section 3: Design of LegoDNN Overview: Offline stage Block Extractor: extract blocks from original model convolution layers. Descendant Block generator: responsible for pruning and generate multiple size descendant block. Block Retrainer: retrain descendant models to improve their accuracies (more about this in next slide) Block Profiler: module to encapsulate statistical information of the block: Size memory Accuracy Latency https://github.com/LINC-BIT/legodnn 21 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  22. Section 3: Design of LegoDNN Overview: offline - Block Retainer 22 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  23. Section 3: Design of LegoDNN Overview: Online Latency Profile: compute the latency reduction percentage of blocks of edge devices. Scaling Optimizer: block selection to maximize accuracy and satisfies latency condition and memory limitations. (next slides) Block Switcher: switch blocks at runtime by using optimization algorithm. (following Slides) https://github.com/LINC-BIT/legodnn 23 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  24. Section 3: Design of LegoDNN Single-DNN Optimization: at runtime scaling optimizer finds the optimal combination of descendant blocks, under four constraints. Accuracy Latency Model Size Block Scale 24 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  25. Section 3: Design of LegoDNN Overview: Online Multi-DNN optimization: When considering ?DNNs (?>1), the optimization objective can be extended as: 25 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  26. Section 3: Design of LegoDNN Overview: Online Block Exchanges (Switcher) Filter Pruning and Low rank decomposition directly replace the entire DNN (large I/O overhead) NestDNN designed descendant model exchange to solve the previous issue but introduced two other issues: Generation time increased by 50 % to 200% comparing filter pruning and low rank decomposition When the model is loaded into the memory, new layers need to be constructed using deep learning libraries. LegoDNN only need to exchange blocks rather than the entire DNN, all block can be integrated without extra operation. (you can t mix and match descendant blocks with other blocks :) ) 26 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  27. Section 4 EVALUATION Experimental Setup Testbeds: Tested on four different phones and embedded platforms. Image Recognition: VGG16 and ResNet18, datasets used Cifar-10 (60K images and 10 classes and ImageNet2012 (1.2M and 1000 class) respectively Real-Time Object Detection: YOLOv3 and COCO2017 0.33M images 1.5M object and 80 classes, mAP @ 0.5 over IoU threshold = 0.5 Multi-DNN Scenarios: three ranges Small 1-6 running application Medium 2-8 running application Large 3-10 running application Baselines Model: four state-of-the-art model-grained scaling techniques Filter Pruning NestDNN Low rank decomposition Knowledge distillation * * Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. Compression of deep convolutional neural networks for fast and low power mobile applications. In ICLR 16, 2015 27 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  28. Section 4 EVALUATION Evaluation of LegoDNN Components Evaluation of Offline Model/Block Generation. LegoDNN uses the shortest generation time, the base models will take at least several years to generate the same number of optional models as with LegoDNN. Time (hour) VGG16, ResNet18 and YOLOv3 have 5, 5, and 8 blocks respectively. 28 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  29. Section 4 EVALUATION Evaluation of LegoDNN Components Evaluation of Accuracy/Latency Tradeoff. First, LegoDNN consistently achieves higher accuracies (increase by 31.74%) than baseline approaches at every available resource. Second, On average, it achieves 52.66% higher accuracy when the available memory is smaller than 30MB. 29 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  30. Comparison of inference accuracies between LegoDNN and model scaling approaches 30 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  31. Comparison of energy consumptions in model scaling 31 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  32. Section 4 EVALUATION Evaluation of Multi-DNN Scenario Two mobile mobile devices CPU based smartphone and GPU-based embedded device. an application s available resource decreases when more co-running applications exist. LegoDNN provides a block-level scaling by just switching 3 blocks, thus still maintaining accurate models (high prediction confidences) with tighter resources. This application is randomly selected from VGG16, ResNet18, and YOLOv3 Object detection and Recognition have different accuracy metrics they used the percentage of loss in the average accuracy 32 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  33. Section 4 EVALUATION Evaluation of Multi-DNN Scenario Performance under Different Loads. LegoDNN achieves considerable performance improvements against baseline approaches by delivering either lower latencies (minimal latency scenario) or smaller accuracy losses (maximum accuracy scenario).Overall, our approach achieves 21.62% reductions (at most 2x) in latency and 69.71% reductions in accuracy losses. 33 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  34. Performance comparison under maximum accuracy and minimum latency scenarios 34 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  35. Section 5 DISCUSSIONS Applicability of LegoDNN to DNNs. LegoDNN first framework that supports block-grained scaling of DNNs for mobile vision system generalized to support most of state-of-the-art DNNs (reported on 7 different DNN categories i.e ResNET layer dimension like depth (ResNet152) and width (ResNeXt29) ) LegoDNN is inapplicable when any filter in a DNN cannot be pruned 35 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  36. Applicability of LegoDNN to seven categories of DNNs 36 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  37. Section 6 RELATED WORK LegoDNN Similar Approaches Model-grained DNN scaling: Applying structured pruning in a different way, mainly generate limited model numbers to fulfil requirements. Scheduling of multiple DNNs: modern mobile vision systems need to execute multi DNN to optimize their latency, accuracy and energy. (Similar to LegoDNN in a different approach) Relationship to device-specific DNN generation and search: this technique studies the generation of optimal DNN based on specific devices, in another word pre-generate DNN models according to features in specific devices , unlike what LegoDNN do during run-time. 37 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  38. Section 7 CONCLUSION Paper Conclusion LegoDNN was presented in this paper as a large exploration of DNN models for mobile vision system Future work will explore the re-training of descendant blocks in the federated learning framework. Such re-training allows a descendant block to learn from multiple participants DNNs to increase its accuracy and generalization. 38 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

  39. QUESTIONS? 39

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#