Ensuring Reliability of Deep Neural Network Architectures

Reliability Assurance for Deep Neural Network Architectures

Against Numerical Defects

University

of

Illinois

Urbana-Champaign

Yuhao

Zhang

University of

Wisconsin-Madison

Luyao

Ren

Peking

University

Yingfei

Xiong

Peking

University

Tao

Xie

Peking

University

Numerical

Defects

result = architecture(test_data, weights)

rint(result)

>> inf / nan

When

numerical

defect

triggered,

rather

than

meaningful

numbers

•

One of most common issues in

DL

programs

1,2

•

At

architecture

level

All

models

sharing

same

architecture

inherit

the

defects

→

Affect lots of developers & users

[1]

An empirical study on tensorflow program bugs

ISSTA 2018

[2]

DeepStability: A study of unstable numerical methods and their solutions in deep

learning. ICSE 2022

Hours, Days, Weeks, Months, …

•

Output

inf

or

nan

, result in

system crashes

•

Disastrous

 in high availability scenarios

Example: DL for threat monitoring, DL for

system controlling

•

Wastes massive time and resource to fix

①

Locate Nodes with

Numerical

Defects

②

③

uggest

to

Add

Clipping

Operators

Solve:

How to support generic architectures,

including dynamic ones?

Solve:

How to generate training data to

confirm feasibility?

Solve:

How to fix or bypass numerical

defects automatically?

adaptive block merging

Running

Example

(DNN

architectures

are

auto-convertible

to

acyclic

computational

graphs)

Numerical defect triggered when vulnerable operator receives invalid input

•

E.g., negative input for

Log

 operator



•

•

Starting from interval of

input & weight nodes

•

Induce

input

interval of

each operator

•

Too

Costly!

Large

Optimization of Static Analysis

(1,9)

(3,9)

(3,9)

(1,9)

Log(x)

→

 Log(clip(x, min=

1e-6

))

Users:

Model

Trainers:

Architecture

Designers:

MatMul

Add

Softmax

Sub

Problem Formulation

Given

     : nodes

to

clip,

generate

clipping

threshold

for

all nodes

such that

As long as

inputs

for

specified

nodes

are

clipped,

vulnerable

operators always receive valid input

Clip

input

nodes

(e.g.,             )

Clip

weight

nodes

(e.g.,             )

Clip

internal

nodes

(e.g.,                        …)

MatMul

Add

Softmax

Sub

Fix @ here

How to find fix?

Abstraction

Optimization

Log

…

…

(Vulnerable node

Log

needs positive input value)

Impose this fix on

•

nodes’

interval bounds

Symbolic

Linked

vulnerable operator’s

interval bound

•

Result

Highlights

Existing best tool

DEBAR

 only detects 48 defects

detected

defects

Existing

Tool

(DEBAR)

Our

Tool

(RANUM)

Out

of

runs,

•

RANUM

succeeds

in

runs

•

Avg time: 17.31s

•

Random

succeeds

in

runs

•

Avg time: 334.14s

②

①

Measure

success

rate

on

triggering

numerical

defects

③

Thank

you!

Questions

welcome!

Reliability Assurance for Deep Neural Network Architectures

Against Numerical Defects

Linyi

Li

University

of

Illinois

Urbana-Champaign

Yuhao

Zhang

University of Wisconsin-Madison

Luyao

Ren

Peking

University

Yingfei

Xiong

Peking

University

Tao

Xie

Peking

University

github.com/llylly/RANUM

arxiv.org/abs/2302.06086

Appendix

Prior Work for DNN Numerical Reliability

Dynamic Detection: GRIST

[2]

•

Generate failure-exhibiting weights &

inference-phase inputs

•

•

[1]

Detecting numerical bugs in neural network architectures

ESEC/FSE 2020

[2]

Exposing numerical bugs in deep learning via gradient back-propagation. ESEC/FSE 2021

Static Detection: DEBAR

[1]

•

Label all

operators

with

possible

numerical

defects

•

•

No

Debugging

 approach

•

How to fix numerical defects automatically?

①

Locate Nodes with

Numerical

Defects

②

③

uggest

to

Add

Clipping

Operators

Solve:

How to support generic architectures,

including dynamic ones?

Solve:

How to generate training data to

confirm feasibility?

Solve:

How to fix or bypass numerical

defects automatically?

adaptive block merging

Generate Training Data

to Confirm Feasibility

[1]

Exposing numerical bugs in deep learning via gradient back-propagation. ESEC/FSE 2021

Problem Formulation

Given DNN architecture,

weights

, and

initial weights

Find

uch that after training, model weights

From existing unit test

generation

approach

De

fined

by

architecture

Connection

to

Deep

Gradient

Leakage

Attack

(For

ease

of

debugging,

consider

one-step

training)

learning

rate

Connection

to

Deep

Gradient

Leakage

Attack



Find

input

sample

to

match

the

known

gradient

Ada

pt

Deep

Gradient

Leakage (DLG)

Attack

[1]

to

solve

[1]

Deep

Leakage

from

Gradients.

NeurIPS

Results

on

Task 2: Generate Training Data

•

Measure

success

rate

on

triggering

numerical

defects

Out

of

runs,

•

RANUM

succeeds

in

runs

•

Avg time: 17.31s



•

Random

succeeds

in

runs

•

Avg time: 334.14s

Slide Note

Hello, everyone, good morning!

I’m Linyi Li from UIUC, and I’m here to introduce our work on

reliability assurance for deep neural network architectures, in other words, DNN architectures, against numerical defects.

This is joint work with Yuhao Zhang from Wisconsin, and Luyao Ren, Yingfei Xiong, and Tao Xie, the corresponding author, from Peking University.

Embed Share

Download

This study focuses on assuring the reliability of deep neural network architectures against numerical defects, highlighting the importance of addressing issues that lead to unreliable outputs such as NaN or inf. The research emphasizes the widespread and disastrous consequences of numerical defects in high availability scenarios, detailing common problems and proposing solutions to enhance the robustness of DNN models.

theon Follow

Uploaded on May 15, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Reliability Assurance for Deep Neural Network Architectures Against Numerical Defects Luyao Ren Peking University Yingfei Xiong Peking University Yuhao Zhang University of Wisconsin-Madison Linyi Li University of Illinois Urbana-Champaign Tao Xie Peking University

Background DNN Architectures Defined by Python Programs Define DNN architecture with Python libraries (e.g., PyTorch, Tensorflow) 1 2 3 4 5 6 7 8 9 input_data = tf.placeholder("float", [32, 784], name='x-input') input_labels = tf.placeholder("float", [32, 10], name='y-input ) self.W_ = tf.Variable(tf.zeros([n_features, n_classes]), name='weights') self.b_ = tf.Variable(tf.zeros([n_classes]), name='biases') model_output = tf.nn.softmax(tf.matmul(self.input_data, self.W_) + self.b_) cost = -tf.reduce_mean(ref_input * tf.log(model_output) + (1 - ref_input) * tf.log(1 - model_output), name='cost') self.obj_function = tf.reduce_min(tf.abs(model_output), name='obj_function') Initialize weights from random DNN model = architecture + weights weights = random_init() model = architecture( , weights) 10 11 12 13 Training phase: Feed in training dataset to update weights weights = architecture(training_data, weights) DL program defines DNN architecture Inference phase: Feed in test data instance to get prediction result = architecture(test_data, weights) 2

Numerical Defects Define DNN architecture with Python libraries (e.g., PyTorch, Tensorflow) result = architecture(test_data, weights) print(result) >> inf / nan Initialize weights from random DNN model = architecture + weights weights = random_init() model = architecture( , weights) When numerical defect triggered, DNN model outputs inf or nan rather than meaningful numbers Training phase: Feed in training dataset to update weights weights = architecture(training_data, weights) Inference phase: Feed in test data instance to get prediction result = architecture(test_data, weights) 3

Numerical Defects are Widespread & Disastrous Serious consequence: Output inf or nan, result in system crashes Disastrous in high availability scenarios Example: DL for threat monitoring, DL for system controlling Wastes massive time and resource to fix Widely exist: One of most common issues in DL programs [1,2] At architecture level All models sharing same architecture inherit the defects Affect lots of developers & users Hours, Days, Weeks, Months, [1] An empirical study on tensorflow program bugs. ISSTA 2018 [2] DeepStability: A study of unstable numerical methods and their solutions in deep learning. ICSE 2022 4

We propose RANUM First Holistic Framework for Assuring Numerical Reliability Key Technique Solved Open Problems Input: DNN Architecture Solve: How to support generic architectures, including dynamic ones? Static Analysis with adaptive block merging Potential-Defect Detection Locate Nodes with Numerical Defects Solve: How to generate training data to confirm feasibility? Feasibility Confirmation Step a: Generate Failure-Exhibiting Unit Tests DLG Attack Step b: Generate Failure-Exhibiting System Tests Solve: How to fix or bypass numerical defects automatically? Abstraction Optimization with differentiable abstraction Fix Suggestion Suggest to Add Clipping Operators 5

Running Example (DNN architectures are auto-convertible to acyclic computational graphs)

Task: Detection of Potential Defects Numerical defect triggered when vulnerable operator receives invalid input E.g., negative input for Log operator For all operators, detect whether its input can be invalid 7

Approach Static Analysis based on Inteval Bound Propagation Key idea: Propagating intervals of possible values through nodes Starting from interval of input & weight nodes [ 10,10 , 10,10 ] 10,10 10,10 10,10 10,10 x-input 1 weights 3 MatMul 2 [ 200,200 , 200,200 ] [ 10,10 , 10,10 ] 4 5 Add Induce input interval of each operator [ 210,210 , 210,210 ] biases 6 Softmax Const=1 [ 0,1 , 0,1 ] 7 8 Sub 17 Abs 10 Log ! [ 0,1 , 0,1 ] If interval overlaps invalid range, alert [ 0,1 , 0,1 ] 9 Log ! 8

Too Costly! Modern DNN is large some data are large tensors E.g., ImageNet model input size = 3 224 224 1.5 105 Outcome: Per-element interval abstraction is too costly Complexity (# operators avg # operands avg # tensor size) Large A few times slower than model inference 9

Optimization of Static Analysis Observation: For large tensors, usually the same valid input interval across different elements Solution: Merge blocks in a tensor (adaptively and selectively) (2,8) (4,5) (3,8) (4,6) (3,9) (1,9) (1,9) (2,9) (3,9) (3,9) 4 Blocks (3,7) (3,6) (4,7) (4,8) (1,9) (3,9) (4,8) (5,8) (1,9) (2,9) Interval Bound (??,??) for 4 4 Tensor 10

Task: Fix Suggestion via Input Clipping Observation: developers fix numerical defects by clipping Log(x) Log(clip(x, min=1e-6)) 1 x-input weights Regularize input Clip input nodes (e.g., ) 1 3 MatMul 2 Users: 4 5 Add 11 biases 6 Softmax Const=1 Regularize weights Clip weight nodes (e.g., ) 2 Model Trainers: 7 y-input 8 Sub ! 17 Abs ! 10 Log 11 9 Log 4 12Const=1 18 ReduceMin 15 Mul obj_function 14 Mul 13 Sub Change architecture Clip internal nodes (e.g., 3 MatMul Architecture Designers: 16 Add cost 19 ReduceMin ) 6 Softmax 8 Sub 5 Add 11

Problem Formulation Clip input nodes (e.g., ) Clip weight nodes (e.g., ) Clip internal nodes (e.g., ) 6 Softmax Given generate clipping threshold such that : nodes to clip, 1 11 for all nodes 4 2 3 MatMul 5 Add 8 Sub As long as inputs for specified nodes are clipped, vulnerable operators always receive valid input 12

Example - A Successful Fix: Clip on weight node 2 [ 10,10 , 10,10 ] 10,10 10,10 2,2 10,10 10,10 2,2 2,2 2,2 x-input 1 Fix @ here weights 3 MatMul 2 [ 200,200 , 200,200 ] [ 20,20 , 20,20 ] [ 10,10 , 10,10 ] 4 5 Add [ 210,210 , 210,210 ] [ 30,30 , 30,30 ] 8.8 10 27,1 8.8 10 27, (8.8 10 27,1 8.8 10 27)] ) 10 27, (8.8 10 27,1 8.8 10 27)] biases [( 6 Softmax ) Const=1 [ 0,1 , 0,1 ] 8.8 10 27,1 8.8 7 [( 10 27, 8.8 10 27,1 8.8 8 Sub [( ) 17 Abs 10 Log ! [ 0,1 , 0,1 ] (8.8 10 27,1 8.8 10 27)] [ 0,1 , 0,1 ] 9 Log ! 13

How to find fix? Abstraction Optimization High-level idea: adjust interval bounds of nodes in to deviate bounds for vulnerable nodes away from invalid range ? ? ? ? 2 1 ????= { , } 2 1 Impose this fix on 2 (Vulnerable node Log needs positive input value) ? ? Log 0 14

Abstraction Optimization: Optimizing differentiable interval bound We construct symbolic interval bound [?, ?] [?, ?] nodes [?, ?] Symbolic Linked vulnerable operator s interval bound interval bounds Allows for gradient computation: d? Gradient shows impact of potential fix on vulnerable operator d?,d? d?,d? d?,d? d? 15

Abstraction Optimization: Optimizing differentiable interval bound We construct symbolic interval bound Allows for gradient computation: d? Gradient shows impact of potential fix on vulnerable operator d?,d? d?,d? d?,d? d? Gradient based optimization on bounds of nodes in less overlap between ?vulernable,?vulernable & invalid range ??,invalid towards Till no overlap fix found! 16

Potential-Defect Detection Result Highlights 79 # detected defects RANUM detects all 79 true defects in the benchmark in 3 seconds Existing best tool DEBAR only detects 48 defects 48 Existing Tool (DEBAR) Our Tool (RANUM) Feasibility Confirmation Fix Suggestion Are RANUM fixes better than human developers'? Measure success rate on triggering numerical defects Out of 790 runs, RANUM succeeds in 733 runs Avg time: 17.31s Yes for most (30 out of 53) cases 133 Random succeeds in 649 runs Avg time: 334.14s 30 7 13% success rate increase & 19.3X time saving RANUM > Developers', or Deverlopers' not exist RANUM = Developers' No need to fix RANUM < Developers'

Reliability Assurance for Deep Neural Network Architectures Against Numerical Defects Linyi Li Yuhao Zhang University of Illinois Urbana-Champaign University of Wisconsin-Madison Luyao Ren Peking University Yingfei Xiong Peking University Tao Xie Peking University Thank you! Questions welcome! arxiv.org/abs/2302.06086 github.com/llylly/RANUM

Appendix

Prior Work for DNN Numerical Reliability Dynamic Detection: GRIST [2] Generate failure-exhibiting weights & inference-phase inputs Limitation: weights not attainable via training How to generate training data to confirm feasibility? Static Detection: DEBAR[1] Label all operators with possible numerical defects Limitation: supports only static architecture How to support generic architectures, including dynamic ones? No Debugging approach How to fix numerical defects automatically? [1] Detecting numerical bugs in neural network architectures. ESEC/FSE 2020 [2] Exposing numerical bugs in deep learning via gradient back-propagation. ESEC/FSE 2021 20

Task 2: Feasibility Confirmation via Training Data Generation Three Tasks Three Open Questions Key Technique Input: DNN Architecture Solve: How to support generic architectures, including dynamic ones? Static Analysis with adaptive block merging Potential-Defect Detection Locate Nodes with Numerical Defects Solve: How to generate training data to confirm feasibility? Feasibility Confirmation Step a: Generate Failure-Exhibiting Unit Tests DLG Attack Step b: Generate Failure-Exhibiting System Tests Solve: How to fix or bypass numerical defects automatically? Abstraction Optimization with differentiable abstraction Fix Suggestion Suggest to Add Clipping Operators 21

Generate Training Data to Confirm Feasibility Define DNN architecture with Python libraries (e.g., PyTorch, Tensorflow) Initialize weights from random DNN model = architecture + weights weights = random_init() model = architecture( , weights) Detection method has false positives Need to confirm defect feasibility Prior work [1]: Generate unit test (?infer,?) to trigger defect at inference phase Problem: Generated weights ? may not be attainable via training Training phase: Feed in training dataset to update weights weights = architecture(training_data, weights) Inference phase: Feed in test data instance to get prediction result = architecture(test_data, weights) [1] Exposing numerical bugs in deep learning via gradient back-propagation. ESEC/FSE 2021 22

Problem Formulation Goal: Generate training data to trigger defect Problem Formulation: Given DNN architecture, weights , and initial weights , Defined by architecture From existing unit test generation approach Find , such that after training, model weights 23

Connection to Deep Gradient Leakage Attack (For ease of debugging, consider one-step training) : learning rate 24

Connection to Deep Gradient Leakage Attack Find input sample to match the known gradient Adapt Deep Gradient Leakage (DLG) Attack [1] to solve [1] Deep Leakage from Gradients. NeurIPS 2019 25

Results on Task 2: Generate Training Data Measure success rate on triggering numerical defects Out of 790 runs, RANUM succeeds in 733 runs Avg time: 17.31s 13% success rate increase & 19.3X time saving Random succeeds in 649 runs Avg time: 334.14s 26

Ensuring Reliability of Deep Neural Network Architectures

Download Presentation

Presentation Transcript

Related

More Related Content