Block-grained Scaling of Deep Neural Networks for Mobile Vision

Slide Note

This presentation explores the challenges of optimizing Deep Neural Networks (DNN) for mobile vision systems due to their large size and high energy consumption. The LegoDNN framework introduces a block-grained scaling approach to reduce memory access energy consumption by compressing DNNs. The agenda includes discussions on deep compression, limitations of model-grained methods, LegoDNN architecture, contributions, evaluations, and related work, emphasizing the importance of energy-efficient mobile vision applications.

everlee Follow

Uploaded on Apr 05, 2024 | 14 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Wireless Network Communication, Dr. Yu Wang 1 1 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Agenda 1. Introduction to DNN problem in Edge Computing 2. Deep Compression 3. Limitation of model-grained approach 4. Introduction to LegoDNN architecture 5. Paper contribution 6. Evaluation (LegoDNN vs baseline models) 7. Related Work 8. Conclusion 9. Question 2 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Deep Neural Network (DNN) Deep Neural Networks (DNN) generally optimize for performance in terms of accuracy As a result, DNN are huge and have parameters in the order of millions The very popular AlexNet has around 62 million parameters, A trained AlexNet takes around 200MB of space Credit: Xu et al, 2019, Wikipedia Alexnet 3 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Mobile vision systems are ubiquitous. Mobile Vision Application: Image Classification Semantic Segmentation Object Detection Anomaly Detection Pose Estimation Action Recognition Intel Movidius VPU The emergence of Al chipsets enables the on- device deep learning on mobile vision systems. Credit: https://github.com/LINC-BIT/legodnn 4 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Mobile vision systems are ubiquitous. Two Issues: Memory Battery Apple A11 Bionic Neural Engine Intel Movidius VPU The emergence of Al chipsets enables the on-device deep learning on mobile vision systems. WikiPedia: Apple A11, Raspberry Pi, Intel Movidius VPU 5 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Motivation Energy consumption is dominated by memory access. Operation Energy [pJ] 32 bit int ADD 0.1 32 bit float ADD 0.9 32 bit SRAM Cache 5 32 bit DRAM Memory 640 Large networks do not fit in on-chip storage and hence require the more costly DRAM accesses. If deep model were small enough to fit in SRAM, that would then reduce energy consumption. Han et al, Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding, ICLR 2016 6 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Deep Compression 35 X decrease in AlexNet size from 240MB to 6.9MB Song Han et al, Deep Compression: Compressing Deep Neural Network with Puning, Trained Quantization and Huffman Coding, ICLR 2016 7 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Deep Compression: Network Pruning Three steps: Train the model: we start by learning the connectivity via normal network training. Prune the small-weight connections: all connections with weights below a threshold are removed from the network. Retrain: Finally, we retrain the network to learn the final weights for the remaining sparse connections. Song Han et al, Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding, ICLR 2016 8 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Deep Compression: Weight Sharing Each filter weight are partitioned into k clusters using K-means. Reduce Weight Representation Song Han et al, Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding, ICLR 2016 9 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Deep Compression: Quantization and Huffman Coding In their experiment (theoretically) they found no loss of accuracy when quantized upto 8-bits instead of 32-bits HUFFMAN CODING popular weight will be assigned less bits for representation while rare weights will be assigned larger bits. The compression pipeline can save 35 to 49 parameter storage with no loss of accuracy. Song Han et al, Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding, ICLR 2016 10 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Limitation of model-grained scaling We need to be able to scale the model size based on the available resources The previous techniques suffer severe accuracy loss once its size is changed Online Scaling To provide online scaling, these techniques need to first transform a DNN model into several descendant models of smaller sizes offline, and then select one of them to upgrade or downgrade the model size at run-time. 11 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Limitation of model-grained scaling (a) Incurring large accuracy loss (b) leaving considerable proportions of resources unused 12 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Limitation of model-grained scaling An ideal scaling that fully uses the available resources requires a dauntingly large number of descendant models, which gives rise to two major challenges in practice: First Challenge: construct large number of descendant models (e.g. 1 million) using moderate amount of time such as hours. Second Challenge: how to form the correct combination of these descendant models to maximize accuracy (given the correct resources) 13 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

LegoDNN LegoDNN: framework based on block structure (ResNet), Training environment represents block information https://github.com/LINC-BIT/legodnn 14 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

LegoDNN LegoDNN: Block are independent from other parts of the DNN, at runtime dynamic selection of descendant block is supported https://github.com/LINC-BIT/legodnn 15 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Contribution of this paper Fast generation of large scaling space via block re-training. re-trains their descendant blocks independent of the whole network. Fine-grained DNN scaling via block switching. run -time optimal selection of descendant blocks for maximum accuracy, unlike filter pruning no need to retrain the entire DNN. Implementation and Evaluation. they design and develop TensorFlow applications, and evaluate against state-of- the-art techniques. I also found a PyTorch repository which was created 3 months ago Summary of experimental result: 1. 2. Implement and tested on smartphones (Huawei) and embedded devices (Raspberry Pi) Compare LegoNet to other models: a. 1,296x to 279,936x more descendant models. b. 76% less time for training than other techniques. c. 31.74% increase in accuracy. d. Saved 71.07% energy on switching models. Testing multi-DNN scenario. Tested on multiple applications. 3. 4. 16 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Section 2: Motivation - How to enlarge the model exploration spaces through blocks? Limitation 1: limited exploration of model choices. Traditional model-grained scaling techniques treat a DNN as a whole and transform it into several descendant models of different sizes In three popular DNNs (VGG16, YOLOv3, and ResNet18), the block-grained scaling provides as much as 7K to 1.6M model sizes by just transforming the original blocks into 5 to 40 descendant blocks. 17 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Section 2: Motivation - How to accelerate the training of block-grain models? Limitation 2: Resource intensive and time consuming training. Traditional model-grained accuracy is fine, but it suffers from block generation time. When handling large datasets such as ImageNet or COCO2017, the generation of a block takes several hours. 18 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Section 2: Motivation - how to optimally assemble descendant block on the fly? Limitation 3: Focus on off-line structure pruning. Given large search space of descendant models, an efficient scaling approach is required to quickly find the optimal combination of blocks under run-time resource dynamics. 19 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Section 3: Design of LegoDNN Overview: Offline vs Online https://github.com/LINC-BIT/legodnn 20 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Section 3: Design of LegoDNN Overview: Offline stage Block Extractor: extract blocks from original model convolution layers. Descendant Block generator: responsible for pruning and generate multiple size descendant block. Block Retrainer: retrain descendant models to improve their accuracies (more about this in next slide) Block Profiler: module to encapsulate statistical information of the block: Size memory Accuracy Latency https://github.com/LINC-BIT/legodnn 21 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Section 3: Design of LegoDNN Overview: offline - Block Retainer 22 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Section 3: Design of LegoDNN Overview: Online Latency Profile: compute the latency reduction percentage of blocks of edge devices. Scaling Optimizer: block selection to maximize accuracy and satisfies latency condition and memory limitations. (next slides) Block Switcher: switch blocks at runtime by using optimization algorithm. (following Slides) https://github.com/LINC-BIT/legodnn 23 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Section 3: Design of LegoDNN Single-DNN Optimization: at runtime scaling optimizer finds the optimal combination of descendant blocks, under four constraints. Accuracy Latency Model Size Block Scale 24 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Section 3: Design of LegoDNN Overview: Online Multi-DNN optimization: When considering ?DNNs (?>1), the optimization objective can be extended as: 25 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Section 3: Design of LegoDNN Overview: Online Block Exchanges (Switcher) Filter Pruning and Low rank decomposition directly replace the entire DNN (large I/O overhead) NestDNN designed descendant model exchange to solve the previous issue but introduced two other issues: Generation time increased by 50 % to 200% comparing filter pruning and low rank decomposition When the model is loaded into the memory, new layers need to be constructed using deep learning libraries. LegoDNN only need to exchange blocks rather than the entire DNN, all block can be integrated without extra operation. (you can t mix and match descendant blocks with other blocks :) ) 26 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Section 4 EVALUATION Experimental Setup Testbeds: Tested on four different phones and embedded platforms. Image Recognition: VGG16 and ResNet18, datasets used Cifar-10 (60K images and 10 classes and ImageNet2012 (1.2M and 1000 class) respectively Real-Time Object Detection: YOLOv3 and COCO2017 0.33M images 1.5M object and 80 classes, mAP @ 0.5 over IoU threshold = 0.5 Multi-DNN Scenarios: three ranges Small 1-6 running application Medium 2-8 running application Large 3-10 running application Baselines Model: four state-of-the-art model-grained scaling techniques Filter Pruning NestDNN Low rank decomposition Knowledge distillation * * Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. Compression of deep convolutional neural networks for fast and low power mobile applications. In ICLR 16, 2015 27 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Section 4 EVALUATION Evaluation of LegoDNN Components Evaluation of Offline Model/Block Generation. LegoDNN uses the shortest generation time, the base models will take at least several years to generate the same number of optional models as with LegoDNN. Time (hour) VGG16, ResNet18 and YOLOv3 have 5, 5, and 8 blocks respectively. 28 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Section 4 EVALUATION Evaluation of LegoDNN Components Evaluation of Accuracy/Latency Tradeoff. First, LegoDNN consistently achieves higher accuracies (increase by 31.74%) than baseline approaches at every available resource. Second, On average, it achieves 52.66% higher accuracy when the available memory is smaller than 30MB. 29 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Comparison of inference accuracies between LegoDNN and model scaling approaches 30 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Comparison of energy consumptions in model scaling 31 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Section 4 EVALUATION Evaluation of Multi-DNN Scenario Two mobile mobile devices CPU based smartphone and GPU-based embedded device. an application s available resource decreases when more co-running applications exist. LegoDNN provides a block-level scaling by just switching 3 blocks, thus still maintaining accurate models (high prediction confidences) with tighter resources. This application is randomly selected from VGG16, ResNet18, and YOLOv3 Object detection and Recognition have different accuracy metrics they used the percentage of loss in the average accuracy 32 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Section 4 EVALUATION Evaluation of Multi-DNN Scenario Performance under Different Loads. LegoDNN achieves considerable performance improvements against baseline approaches by delivering either lower latencies (minimal latency scenario) or smaller accuracy losses (maximum accuracy scenario).Overall, our approach achieves 21.62% reductions (at most 2x) in latency and 69.71% reductions in accuracy losses. 33 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Performance comparison under maximum accuracy and minimum latency scenarios 34 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Section 5 DISCUSSIONS Applicability of LegoDNN to DNNs. LegoDNN first framework that supports block-grained scaling of DNNs for mobile vision system generalized to support most of state-of-the-art DNNs (reported on 7 different DNN categories i.e ResNET layer dimension like depth (ResNet152) and width (ResNeXt29) ) LegoDNN is inapplicable when any filter in a DNN cannot be pruned 35 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Applicability of LegoDNN to seven categories of DNNs 36 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Section 6 RELATED WORK LegoDNN Similar Approaches Model-grained DNN scaling: Applying structured pruning in a different way, mainly generate limited model numbers to fulfil requirements. Scheduling of multiple DNNs: modern mobile vision systems need to execute multi DNN to optimize their latency, accuracy and energy. (Similar to LegoDNN in a different approach) Relationship to device-specific DNN generation and search: this technique studies the generation of optimal DNN based on specific devices, in another word pre-generate DNN models according to features in specific devices , unlike what LegoDNN do during run-time. 37 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

Section 7 CONCLUSION Paper Conclusion LegoDNN was presented in this paper as a large exploration of DNN models for mobile vision system Future work will explore the re-training of descendant blocks in the federated learning framework. Such re-training allows a descendant block to learn from multiple participants DNNs to increase its accuracy and generalization. 38 Mina Gabriel - CIS 5639 LegoDNN: Block-grained Scaling of Deep Neural Networks for Mobile Vision

QUESTIONS? 39

Block-grained Scaling of Deep Neural Networks for Mobile Vision

Download Presentation

Presentation Transcript

Related

More Related Content