DNN Inference Optimization Challenge Overview

ITU-ML5G-PS-018:

DNN Inference Optimization Challenge

(Adlik/ZTE)

17 July 2020

Register

here

Join us on

Slack

Organizer

Sponsors

https://wiki.lfai.foundation/display/ADLIK/DNN+Inference+Optimization+Challenge

Liya Yuan, ZTE         yuan.liya@zte.com.cn

Edge

Edge

Deploy

Deploy

On-device

On-device

Deploy

Deploy

Cloud

Cloud

Deploy

Deploy

AI Applications

AI Applications

Trained DNN

models

Inference

•

High accuracy

•

High complexity

•

Computationally

expensive

•

Memory intensive

•

Rapidly rising data-center

consumption

•

Development of edge

computing

•

Resource constraint in edge

devices

Efficient inference:

•

Computation cost

•

Memory footprint

•

Latency

Efficient model design

Model pruning

Model quantization

Knowledge distillation

Intel MKL-DNN

Nvidia TensorRT

Intel Knights Landing CPU

NVidia PASCAL GP100 GPU

Google TPU

Nvidia Tegra

Pruning:

Removing redundant connections

Challenges:

How to evaluate importance of parameters?

How to effectively reduce computation and latency?

How to recover accuracy?

Source: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture15.pdf

https://jacobgil.github.io/deeplearning/pruning-deep-learning

Quantization:

Reduce the precision requirements for the weights and activations

Challenge:

What to quantize?(Weight, activation, gradient)

To what degree? (8 bit, 4 bit, 2 bit, 1 bit)

With what parameters? (step size, clipping value)

How to balence trade-Off between compression and accuracy?

Quantizer

Pre-trained FP32

Model

Quantize with Calibration

INT-8 Inference

Pruner

Prune Filter

https://github.com/Adlik/model_optimizer

run

er:

Support

channel pruning and filter pruning, reducing the

number of parameters and flops.

Quantizer:

Supporting 8-bit Calibration Quantization

Knowledge distillation:

Model compression by teaching a smaller

network, step by step, exactly what to do

using a bigger already trained network

Challenge:

Introducing multiple teachers

 introducing a teaching assistant

Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge

Target models:

BERT

MobileNet-V3

ResNet 50

Design and implement optimization solution

oosted computing efficiency; Reduced memory footprint; Improved Latency

How?

1) Create a

private Github repository

 to contain their work.

2) Add

AdlikChallenge

 as a collaborator.  The repository must

be made public

 before submission deadline and

should

be accessible till the end

 of the final event of the ITU challenge.

What?

1)

Source code

•

he whole optimzation solution

•

erformance test demo

2)

A description document.

•

nstructions on

 how to verify the optimization performance with the source code

•

 Other contents of the document include but are not limited to: insight, opinion and analysis of model

optimization; selected target model and reason

; base model

; solution, algorithm used; description and

comparison of optimization results, etc.

•

Effect of model optimization (50%):

Reasonable

 trade-off between accuracy and efficiency

. The selected model type, loss of

accuracy, compression rate of model parameters and computation

will be taken into

account

•

Solution advantage (30%):

Whether the solution is reasonable and whether the solution has enough practicability,

innovation and universality.

•

Problem analysis (10%):

Whether there is a deep original insight into the problem, and whether the analysis of the

key elements of the problem is accurate and reasonable.

•

Completeness (10%):

Whether the requirements of the competition are fulfilled according to the proposed

scheme and design.

1) See the problem statement on

Adlik wiki

2) If you don't have an

ITU account

, please follow the guidance to create one for

challenge registration.

3)

Register

 on ITU AI/ML in 5G challenge website with your ITU account.

4) Fill out the ITU AI/ML in 5G Challenge Participants Survey to

select problem

statement ITU-ML5G-PS-018

. You can enroll as a team with 1-4 members.

5) Begin to work on this problem and submit your results. We will begin to accept

submissions from

July 1st, 2020

 and the submission deadline is

October 10th, 2020

Q & A

Slide Note

Embed Share

Download

The DNN Inference Optimization Challenge, organized by Liya Yuan from ZTE, focuses on optimizing deep neural network (DNN) models for efficient inference on-device, at the edge, and in the cloud. The challenge addresses the need for high accuracy while minimizing data center consumption and inference complexity. Techniques such as model pruning, quantization, and hardware optimization are explored to achieve efficient model design and reduce computational costs. Participants can engage in kernel optimization and leverage tools like Intel MKL-DNN and Nvidia TensorRT for model optimization. The challenge also delves into the importance of pruning redundant connections and reducing precision requirements for weights and activations through quantization. Notable solutions from Adlik feature channel and filter pruning, as well as 8-bit calibration quantization, showcasing significant parameter and flop reductions with minimal impact on accuracy.

buddy Follow

Uploaded on Jul 16, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

ITU-ML5G-PS-018: DNN Inference Optimization Challenge (Adlik/ZTE) 17 July 2020 https://wiki.lfai.foundation/display/ADLIK/DNN+Inference+Optimization+Challenge Liya Yuan, ZTE yuan.liya@zte.com.cn Organizer Sponsors Register here Join us on Slack

Background Background Trained DNN models Inference On-device Deploy Edge Deploy Cloud Deploy AI Applications High accuracy Rapidly rising data-center consumption Efficient inference: High complexity Computation cost Development of edge computing Computationally expensive Memory footprint Resource constraint in edge devices Latency Memory intensive 2

DNN inference optimization DNN inference optimization Kernel optimization Model optimization Hardware optimization Efficient model design Model pruning Model quantization Knowledge distillation Intel MKL-DNN Nvidia TensorRT Intel Knights Landing CPU NVidia PASCAL GP100 GPU Google TPU Nvidia Tegra 3

Model pruning Model pruning Pruning: Removing redundant connections Challenges: How to evaluate importance of parameters? How to effectively reduce computation and latency? How to recover accuracy? https://jacobgil.github.io/deeplearning/pruning-deep-learning Source: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture15.pdf 4

Model quantization Model quantization Quantization: Reduce the precision requirements for the weights and activations Challenge: What to quantize?(Weight, activation, gradient) To what degree? (8 bit, 4 bit, 2 bit, 1 bit) With what parameters? (step size, clipping value) How to balence trade-Off between compression and accuracy? 5

Adlik pruning & quantization Adlik pruning & quantization Pruner: Support channel pruning and filter pruning, reducing the number of parameters and flops. Quantizer: Supporting 8-bit Calibration Quantization Pre-trained FP32 Model Prune Filter Pruner ResNet-50 Top-1 Parameters MACs Size Quantize with Calibration baseline 76.19% 25.61M 5.10*107 99MB Quantizer pruned 75.50% 17.43M 3.47*107 67MB pruned+quantized(TF- Lite) 75.3% 17.43M 3.47*107 18MB INT-8 Inference https://github.com/Adlik/model_optimizer 6

Knowledge distillation Knowledge distillation Knowledge distillation: Model compression by teaching a smaller network, step by step, exactly what to do using a bigger already trained network Challenge: Introducing multiple teachers introducing a teaching assistant 7

Model partitition Model partitition Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge 8

Task Task Target models: BERT; MobileNet-V3; ResNet 50 Design and implement optimization solution Boosted computing efficiency; Reduced memory footprint; Improved Latency 9

Submitting Submitting How? 1) Create a private Github repository to contain their work. 2) Add AdlikChallenge as a collaborator. The repository must be made public before submission deadline and should be accessible till the end of the final event of the ITU challenge. What? 1) Source code The whole optimzation solution Performance test demo 2) A description document. Instructions on how to verify the optimization performance with the source code Other contents of the document include but are not limited to: insight, opinion and analysis of model optimization; selected target model and reason; base model; solution, algorithm used; description and comparison of optimization results, etc. 10

Evaluation criteria Evaluation criteria Effect of model optimization (50%): Reasonable trade-off between accuracy and efficiency. The selected model type, loss of accuracy, compression rate of model parameters and computation will be taken into account. Solution advantage (30%): Whether the solution is reasonable and whether the solution has enough practicability, innovation and universality. Problem analysis (10%): Whether there is a deep original insight into the problem, and whether the analysis of the key elements of the problem is accurate and reasonable. Completeness (10%): Whether the requirements of the competition are fulfilled according to the proposed scheme and design. 11

Tips Tips 1) See the problem statement on Adlik wiki. 2) If you don't have an ITU account, please follow the guidance to create one for challenge registration. 3) Register on ITU AI/ML in 5G challenge website with your ITU account. 4) Fill out the ITU AI/ML in 5G Challenge Participants Survey to select problem statement ITU-ML5G-PS-018. You can enroll as a team with 1-4 members. 5) Begin to work on this problem and submit your results. We will begin to accept submissions from July 1st, 2020 and the submission deadline is October 10th, 2020. 12

Q & A 13

DNN Inference Optimization Challenge Overview

Download Presentation

Presentation Transcript

Related

More Related Content