Optimizing Deep Learning: Methods and Insights
Exploring gradient-free and derivative-free optimization methods for deep learning, including insights on search space of deep networks and alternative approaches like ant colony optimization and simulated annealing. Emphasizes the importance of architecture and simpler training methods for improved performance on large datasets.
Uploaded on Oct 05, 2024 | 0 Views
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Gradient free optimization for deep learning Usman Roshan NJIT
Derivative free optimization Pros: Can handle any activation function (for example sign) Free from vanishing and exploding gradient problems Cons: May take longer than gradient search Does it work for deep learning and what do we know there?
What do we know about the search space of deep networks? From The Loss Surfaces of Multilayer Networks , AISTATS 2014
Other methods for deep learning optimization Ant colony optimization Simulated annealing Both report minor improvements Previous studies show the importance of architecture Even gradient descent is a bit of an overkill. For example random weights will go a long way in deep learning. Dropout zeros out many nodes. Perhaps even simpler training methods may be better on large datasets