A Sparsified Online Newton Method for Training Large Neural Networks

Staff - Faculty of Informatics

Date: 21 February 2023 / 13:15 - 14:30

USI Campus Est, room D0.02, Sector D

Speaker: Prof.  Inderjit Dhillon, UT Austin

Second-order methods have enormous potential in improving the convergence of deep neural network (DNN) training, but are prohibitive due to their large memory and compute requirements. Furthermore, computing the descent direction requires high precision computation for stable training, as the matrices involved often have high condition number. In this work, we develop a memory efficient second order algorithm named Sparsified Online Newton (SONew) that uses specific sparsity patterns of the gradient second moment matrix. SONew incurs the same computational cost as first-order methods while outperforming them in large scale benchmarks. The algorithm emerges from using the LogDet matrix divergence measure; we combine it with sparsity constraints to minimize regret in the online convex optimization framework. Our mathematical analysis allows us to reduce the condition number of our sparse preconditioning matrix. We conduct large scale experiments on a diverse set of benchmarks including deep autoencoder, Vision Transformers, and a Graph Neural Network and show state of the art performance compared to tuned first order methods.

Inderjit Dhillon is the Gottesman Family Centennial Professor of Computer Science and Mathematics at UT Austin, where he is also the Director of the ICES Center for Big Data Analytics. Currently he is on leave from UT Austin and is a Distinguished Scientist at Google. Prior to that, he was Vice President and Distinguished Scientist at Amazon, and headed the Amazon Research Lab in Berkeley, California, where he and his team developed and deployed state-of-the-art machine learning methods for Amazon Search. His main research interests are in machine learning, big data, deep learning, network analysis, linear algebra and optimization. He received his B.Tech. degree from IIT Bombay, and Ph.D. from UC Berkeley. Inderjit has received several awards, including the ICES Distinguished Research Award, the SIAM Outstanding Paper Prize, the Moncrief Grand Challenge Award, the SIAM Linear Algebra Prize, the University Research Excellence Award, and the NSF Career Award. He has published over 200 journal and conference papers, and has served on the Editorial Board of the Journal of Machine Learning Research, the IEEE Transactions of Pattern Analysis and Machine Intelligence, Foundations and Trends in Machine Learning and the SIAM Journal for Matrix Analysis and Applications. Inderjit is an ACM Fellow, an IEEE Fellow, a SIAM Fellow, an AAAS Fellow and a AAAI Fellow.

Host: Prof. Olaf Schenk