Stella X. Yu : Papers / Google Scholar

How to Guess a Gradient

Utkarsh Singhal and Brian Cheung and Kartik Chandra and Jonathan Ragan-Kelley and Joshua B. Tenenbaum and Tomaso A. Poggio and Stella X. Yu

Neural Information Processing Systems Workshop on Optimization for Machine Learning, New Orleans, Louisiana, 15 December 2023

Paper | Poster | arXiv

Abstract

What can you say about the gradient of a neural network without \emph{computing a loss} or \emph{knowing the label?} This may sound like a strange question: surely the answer is very little.'' However, in this paper, we show that gradients are more structured than previously thought. They lie in a predictable low-dimensional subspace which depends on the network architecture and incoming features.Exploiting this structure can significantly improve gradient-free optimization schemes based on directional derivatives, which until now have struggled to scale beyond small networks trained on MNIST. We study how to narrow the gap in optimization performance between methods that calculate exact gradients and those that use directional derivatives, demonstrate new phenomena that occur when using these methods, and highlight new challenges in scaling these methods.

Keywords

gradient, deep learning