DNN Inference Optimization Challenge Overview

 
ITU-ML5G-PS-018: 
DNN Inference Optimization Challenge
(Adlik/ZTE)
17 July 2020
 
Register 
here
Join us on 
Slack
 
Organizer
 
Sponsors
 
 
Liya Yuan, ZTE         yuan.liya@zte.com.cn
 
B
a
c
k
g
r
o
u
n
d
 
2
 
Edge
Edge
Deploy
Deploy
 
On-device
On-device
Deploy
Deploy
 
Cloud
Cloud
Deploy
Deploy
 
AI Applications
AI Applications
 
Trained DNN
models
 
Inference
 
High accuracy
 
High complexity
 
Computationally
expensive
 
Memory intensive
 
 
 
Rapidly rising data-center
consumption
 
Development of edge
computing
 
Resource constraint in edge
devices
 
Efficient inference:
 
Computation cost
 
Memory footprint
 
Latency
 
D
N
N
 
i
n
f
e
r
e
n
c
e
 
o
p
t
i
m
i
z
a
t
i
o
n
3
 
Efficient model design
Model pruning
Model quantization
Knowledge distillation
 
Intel MKL-DNN
Nvidia TensorRT
 
Intel Knights Landing CPU
NVidia PASCAL GP100 GPU
Google TPU
Nvidia Tegra
 
M
o
d
e
l
 
p
r
u
n
i
n
g
 
4
 
Pruning:
Removing redundant connections
Challenges:
How to evaluate importance of parameters?
How to effectively reduce computation and latency?
How to recover accuracy?
 
Source: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture15.pdf
 
https://jacobgil.github.io/deeplearning/pruning-deep-learning
 
M
o
d
e
l
 
q
u
a
n
t
i
z
a
t
i
o
n
 
5
 
Quantization:
Reduce the precision requirements for the weights and activations
Challenge:
What to quantize?(Weight, activation, gradient)
To what degree? (8 bit, 4 bit, 2 bit, 1 bit)
With what parameters? (step size, clipping value)
How to balence trade-Off between compression and accuracy?
 
A
d
l
i
k
 
p
r
u
n
i
n
g
 
&
 
q
u
a
n
t
i
z
a
t
i
o
n
 
6
Quantizer
Pre-trained FP32
Model
 
Quantize with Calibration
INT-8 Inference
Pruner
 
Prune Filter
 
https://github.com/Adlik/model_optimizer
 
P
run
er:
Support 
channel pruning and filter pruning, reducing the
number of parameters and flops.
Quantizer:
Supporting 8-bit Calibration Quantization
 
K
n
o
w
l
e
d
g
e
 
d
i
s
t
i
l
l
a
t
i
o
n
 
7
 
Knowledge distillation:
Model compression by teaching a smaller
network, step by step, exactly what to do
using a bigger already trained network
 
Challenge:
Introducing multiple teachers
 introducing a teaching assistant
 
M
o
d
e
l
 
p
a
r
t
i
t
i
t
i
o
n
 
8
 
Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge
 
T
a
s
k
 
Target models: 
BERT
; 
MobileNet-V3
; 
ResNet 50
 
9
 
Design and implement optimization solution
 
B
oosted computing efficiency; Reduced memory footprint; Improved Latency
 
S
u
b
m
i
t
t
i
n
g
 
How?
1) Create a 
private Github repository
 to contain their work.
2) Add 
AdlikChallenge
 as a collaborator.  The repository must 
be made public
 before submission deadline and
should 
be accessible till the end
 of the final event of the ITU challenge.
 
What?
1) 
Source code
 
T
he whole optimzation solution
 
P
erformance test demo
2) 
A description document.
I
nstructions on
 how to verify the optimization performance with the source code
 Other contents of the document include but are not limited to: insight, opinion and analysis of model
optimization; selected target model and reason
; base model
; solution, algorithm used; description and
comparison of optimization results, etc.
 
10
 
E
v
a
l
u
a
t
i
o
n
 
c
r
i
t
e
r
i
a
 
Effect of model optimization (50%):
Reasonable
 trade-off between accuracy and efficiency
. The selected model type, loss of
accuracy, compression rate of model parameters and computation 
will be taken into
account
.
Solution advantage (30%):
Whether the solution is reasonable and whether the solution has enough practicability,
innovation and universality.
Problem analysis (10%):
Whether there is a deep original insight into the problem, and whether the analysis of the
key elements of the problem is accurate and reasonable.
Completeness (10%):
Whether the requirements of the competition are fulfilled according to the proposed
scheme and design.
 
11
 
T
i
p
s
 
1) See the problem statement on 
Adlik wiki
.
 
2) If you don't have an 
ITU account
, please follow the guidance to create one for
challenge registration.
 
3) 
Register
 on ITU AI/ML in 5G challenge website with your ITU account.
 
4) Fill out the ITU AI/ML in 5G Challenge Participants Survey​​ to 
select problem
statement ITU-ML5G-PS-018
. You can enroll as a team with 1-4 members.
 
5) Begin to work on this problem and submit your results. We will begin to accept
submissions from 
July 1st, 2020
 and the submission deadline is 
October 10th, 2020
.
 
12
 
13
Q & A
Slide Note
Embed
Share

The DNN Inference Optimization Challenge, organized by Liya Yuan from ZTE, focuses on optimizing deep neural network (DNN) models for efficient inference on-device, at the edge, and in the cloud. The challenge addresses the need for high accuracy while minimizing data center consumption and inference complexity. Techniques such as model pruning, quantization, and hardware optimization are explored to achieve efficient model design and reduce computational costs. Participants can engage in kernel optimization and leverage tools like Intel MKL-DNN and Nvidia TensorRT for model optimization. The challenge also delves into the importance of pruning redundant connections and reducing precision requirements for weights and activations through quantization. Notable solutions from Adlik feature channel and filter pruning, as well as 8-bit calibration quantization, showcasing significant parameter and flop reductions with minimal impact on accuracy.

  • DNN Inference Optimization
  • Model Pruning
  • Model Quantization
  • Edge Computing
  • Deep Learning

Uploaded on Jul 16, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. ITU-ML5G-PS-018: DNN Inference Optimization Challenge (Adlik/ZTE) 17 July 2020 https://wiki.lfai.foundation/display/ADLIK/DNN+Inference+Optimization+Challenge Liya Yuan, ZTE yuan.liya@zte.com.cn Organizer Sponsors Register here Join us on Slack

  2. Background Background Trained DNN models Inference On-device Deploy Edge Deploy Cloud Deploy AI Applications High accuracy Rapidly rising data-center consumption Efficient inference: High complexity Computation cost Development of edge computing Computationally expensive Memory footprint Resource constraint in edge devices Latency Memory intensive 2

  3. DNN inference optimization DNN inference optimization Kernel optimization Model optimization Hardware optimization Efficient model design Model pruning Model quantization Knowledge distillation Intel MKL-DNN Nvidia TensorRT Intel Knights Landing CPU NVidia PASCAL GP100 GPU Google TPU Nvidia Tegra 3

  4. Model pruning Model pruning Pruning: Removing redundant connections Challenges: How to evaluate importance of parameters? How to effectively reduce computation and latency? How to recover accuracy? https://jacobgil.github.io/deeplearning/pruning-deep-learning Source: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture15.pdf 4

  5. Model quantization Model quantization Quantization: Reduce the precision requirements for the weights and activations Challenge: What to quantize?(Weight, activation, gradient) To what degree? (8 bit, 4 bit, 2 bit, 1 bit) With what parameters? (step size, clipping value) How to balence trade-Off between compression and accuracy? 5

  6. Adlik pruning & quantization Adlik pruning & quantization Pruner: Support channel pruning and filter pruning, reducing the number of parameters and flops. Quantizer: Supporting 8-bit Calibration Quantization Pre-trained FP32 Model Prune Filter Pruner ResNet-50 Top-1 Parameters MACs Size Quantize with Calibration baseline 76.19% 25.61M 5.10*107 99MB Quantizer pruned 75.50% 17.43M 3.47*107 67MB pruned+quantized(TF- Lite) 75.3% 17.43M 3.47*107 18MB INT-8 Inference https://github.com/Adlik/model_optimizer 6

  7. Knowledge distillation Knowledge distillation Knowledge distillation: Model compression by teaching a smaller network, step by step, exactly what to do using a bigger already trained network Challenge: Introducing multiple teachers introducing a teaching assistant 7

  8. Model partitition Model partitition Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge 8

  9. Task Task Target models: BERT; MobileNet-V3; ResNet 50 Design and implement optimization solution Boosted computing efficiency; Reduced memory footprint; Improved Latency 9

  10. Submitting Submitting How? 1) Create a private Github repository to contain their work. 2) Add AdlikChallenge as a collaborator. The repository must be made public before submission deadline and should be accessible till the end of the final event of the ITU challenge. What? 1) Source code The whole optimzation solution Performance test demo 2) A description document. Instructions on how to verify the optimization performance with the source code Other contents of the document include but are not limited to: insight, opinion and analysis of model optimization; selected target model and reason; base model; solution, algorithm used; description and comparison of optimization results, etc. 10

  11. Evaluation criteria Evaluation criteria Effect of model optimization (50%): Reasonable trade-off between accuracy and efficiency. The selected model type, loss of accuracy, compression rate of model parameters and computation will be taken into account. Solution advantage (30%): Whether the solution is reasonable and whether the solution has enough practicability, innovation and universality. Problem analysis (10%): Whether there is a deep original insight into the problem, and whether the analysis of the key elements of the problem is accurate and reasonable. Completeness (10%): Whether the requirements of the competition are fulfilled according to the proposed scheme and design. 11

  12. Tips Tips 1) See the problem statement on Adlik wiki. 2) If you don't have an ITU account, please follow the guidance to create one for challenge registration. 3) Register on ITU AI/ML in 5G challenge website with your ITU account. 4) Fill out the ITU AI/ML in 5G Challenge Participants Survey to select problem statement ITU-ML5G-PS-018. You can enroll as a team with 1-4 members. 5) Begin to work on this problem and submit your results. We will begin to accept submissions from July 1st, 2020 and the submission deadline is October 10th, 2020. 12

  13. Q & A 13

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#