DNN Inference Optimization Challenge Overview

Slide Note

The DNN Inference Optimization Challenge, organized by Liya Yuan from ZTE, focuses on optimizing deep neural network (DNN) models for efficient inference on-device, at the edge, and in the cloud. The challenge addresses the need for high accuracy while minimizing data center consumption and inference complexity. Techniques such as model pruning, quantization, and hardware optimization are explored to achieve efficient model design and reduce computational costs. Participants can engage in kernel optimization and leverage tools like Intel MKL-DNN and Nvidia TensorRT for model optimization. The challenge also delves into the importance of pruning redundant connections and reducing precision requirements for weights and activations through quantization. Notable solutions from Adlik feature channel and filter pruning, as well as 8-bit calibration quantization, showcasing significant parameter and flop reductions with minimal impact on accuracy.

buddy Follow

Uploaded on Jul 16, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

ITU-ML5G-PS-018: DNN Inference Optimization Challenge (Adlik/ZTE) 17 July 2020 https://wiki.lfai.foundation/display/ADLIK/DNN+Inference+Optimization+Challenge Liya Yuan, ZTE yuan.liya@zte.com.cn Organizer Sponsors Register here Join us on Slack

Background Background Trained DNN models Inference On-device Deploy Edge Deploy Cloud Deploy AI Applications High accuracy Rapidly rising data-center consumption Efficient inference: High complexity Computation cost Development of edge computing Computationally expensive Memory footprint Resource constraint in edge devices Latency Memory intensive 2

DNN inference optimization DNN inference optimization Kernel optimization Model optimization Hardware optimization Efficient model design Model pruning Model quantization Knowledge distillation Intel MKL-DNN Nvidia TensorRT Intel Knights Landing CPU NVidia PASCAL GP100 GPU Google TPU Nvidia Tegra 3

Model pruning Model pruning Pruning: Removing redundant connections Challenges: How to evaluate importance of parameters? How to effectively reduce computation and latency? How to recover accuracy? https://jacobgil.github.io/deeplearning/pruning-deep-learning Source: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture15.pdf 4

Model quantization Model quantization Quantization: Reduce the precision requirements for the weights and activations Challenge: What to quantize?(Weight, activation, gradient) To what degree? (8 bit, 4 bit, 2 bit, 1 bit) With what parameters? (step size, clipping value) How to balence trade-Off between compression and accuracy? 5

Adlik pruning & quantization Adlik pruning & quantization Pruner: Support channel pruning and filter pruning, reducing the number of parameters and flops. Quantizer: Supporting 8-bit Calibration Quantization Pre-trained FP32 Model Prune Filter Pruner ResNet-50 Top-1 Parameters MACs Size Quantize with Calibration baseline 76.19% 25.61M 5.10*107 99MB Quantizer pruned 75.50% 17.43M 3.47*107 67MB pruned+quantized(TF- Lite) 75.3% 17.43M 3.47*107 18MB INT-8 Inference https://github.com/Adlik/model_optimizer 6

Knowledge distillation Knowledge distillation Knowledge distillation: Model compression by teaching a smaller network, step by step, exactly what to do using a bigger already trained network Challenge: Introducing multiple teachers introducing a teaching assistant 7

Model partitition Model partitition Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge 8

Task Task Target models: BERT; MobileNet-V3; ResNet 50 Design and implement optimization solution Boosted computing efficiency; Reduced memory footprint; Improved Latency 9

Submitting Submitting How? 1) Create a private Github repository to contain their work. 2) Add AdlikChallenge as a collaborator. The repository must be made public before submission deadline and should be accessible till the end of the final event of the ITU challenge. What? 1) Source code The whole optimzation solution Performance test demo 2) A description document. Instructions on how to verify the optimization performance with the source code Other contents of the document include but are not limited to: insight, opinion and analysis of model optimization; selected target model and reason; base model; solution, algorithm used; description and comparison of optimization results, etc. 10

Evaluation criteria Evaluation criteria Effect of model optimization (50%): Reasonable trade-off between accuracy and efficiency. The selected model type, loss of accuracy, compression rate of model parameters and computation will be taken into account. Solution advantage (30%): Whether the solution is reasonable and whether the solution has enough practicability, innovation and universality. Problem analysis (10%): Whether there is a deep original insight into the problem, and whether the analysis of the key elements of the problem is accurate and reasonable. Completeness (10%): Whether the requirements of the competition are fulfilled according to the proposed scheme and design. 11

Tips Tips 1) See the problem statement on Adlik wiki. 2) If you don't have an ITU account, please follow the guidance to create one for challenge registration. 3) Register on ITU AI/ML in 5G challenge website with your ITU account. 4) Fill out the ITU AI/ML in 5G Challenge Participants Survey to select problem statement ITU-ML5G-PS-018. You can enroll as a team with 1-4 members. 5) Begin to work on this problem and submit your results. We will begin to accept submissions from July 1st, 2020 and the submission deadline is October 10th, 2020. 12

Q & A 13

DNN Inference Optimization Challenge Overview

Download Presentation

Presentation Transcript

Related

More Related Content