Optimization of Multilayer Perceptron Output with ReLU Activation Function Using MIP Approach
This research focuses on developing a systematic optimization model that incorporates a ReLU activation function-based neural network as input. The model generates a linear output that can be modeled as MILP and solved using a Mixed-Integer Programming approach. By producing scalable surrogate models, the system aims to handle varying sizes and complexities effectively.
Uploaded on Sep 29, 2024 | 0 Views
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Optimization of multilayer perceptron output with ReLU activation function Shashwat Shashwat Koranne Koranne, Hardik Panchal, Zachary Wilson, Nick Sahinidis , Hardik Panchal, Zachary Wilson, Nick Sahinidis Carnegie Mellon University Carnegie Mellon University Shiva Shiva Kameswaran Kameswaran, , Niranjan ExxonMobil Corporate Strategic Research ExxonMobil Corporate Strategic Research Niranjan Subrahmanya Subrahmanya 1
Problem statement Problem statement Build a Build a systematic optimization model systematic optimization model which: which: Incorporates a ReLU activation function based neural network as the input Incorporates a ReLU activation function based neural network as the input Generates a linear model of the output which can be modeled as MILP and Generates a linear model of the output which can be modeled as MILP and solved using Mixed solved using Mixed- -Integer Programming (MIP) approach Integer Programming (MIP) approach Produces surrogate models that scale well with size and complexity of the Produces surrogate models that scale well with size and complexity of the system system 2
Mixed-integer model MIP reformulation of the max operator: Governing Equations The ReLU activation function is written in GAMS using big-M constraints Notation Hidden layer activation ReLU transfer function Output function Every node requires two binary variables 3
Background and approach Background and approach Multi Multi- -Layer Layer Perceptron Perceptron (MLP) (MLP) is is a a feedforward feedforward artificial artificial neural neural network network Objective Objective: : Optimize the MLP network using a scalable MIP approach. Optimize the MLP network using a scalable MIP approach. Specify Specify network structure and train weights and biases network structure and train weights and biases Deep architecture Simple architecture Hidden LayerOutput Input Generate Generate MIP formulation of MIP formulation of ReLU ReLU neural network neural network 4
Computational study Computational study Goal: Optimize GAMS model of a trained neural network with linear rectified units utilizing a benchmark Goal: Optimize GAMS model of a trained neural network with linear rectified units utilizing a benchmark example example Algebraic form Six hump camel function Global minima 5
Computational study Computational study ReLU surrogate models 3 hidden layer 30 nodes 1 hidden layer 200 nodes 1 hidden layer 10 nodes -1.33 0.6 20 46 54 0.013 Global minimum Training time (s) Continuous variables -1.12 23.7 400 806 1004 1.27 -1.17 6.2 60 126 154 0.12 Binary variables Equations Solution time (s) 6
Conclusions Conclusions A feed A feed- -forward neural network with rectified linear units that forward neural network with rectified linear units that Admits a mixed-integer programming model Avoids the classical issue of non-convexities induced by traditional transfer functions Opens neural network optimization and training to rigorous optimization Future steps will focus on Future steps will focus on The application of the MIP formulation to a wide variety of problems stemming from complex systems Investigation of the scalability of MIP-based ReLU models 7