A Sparsified Online Newton Method for Training Large Neural Networks

Facoltà di scienze informatiche - Segreterie degli studi

Data: 21 febbraio 2023 / 13:15 - 14:30

USI Campus Est, room D0.02, Sector D

Speaker: Prof. Inderjit Dhillon, UT Austin

Abstract:
Second-order methods have enormous potential in improving the convergence of deep neural network (DNN) training, but are prohibitive due to their large memory and compute requirements. Furthermore, computing the descent direction requires high precision computation for stable training, as the matrices involved often have high condition number. In this work, we develop a memory efficient second order algorithm named Sparsified Online Newton (SONew) that uses specific sparsity patterns of the gradient second moment matrix. SONew incurs the same computational cost as first-order methods while outperforming them in large scale benchmarks. The algorithm emerges from using the LogDet matrix divergence measure; we combine it with sparsity constraints to minimize regret in the online convex optimization framework. Our mathematical analysis allows us to reduce the condition number of our sparse preconditioning matrix. We conduct large scale experiments on a diverse set of benchmarks including deep autoencoder, Vision Transformers, and a Graph Neural Network and show state of the art performance compared to tuned first order methods.

Biography:
Inderjit Dhillon is the Gottesman Family Centennial Professor of Computer Science and Mathematics at UT Austin, where he is also the Director of the ICES Center for Big Data Analytics. Currently he is on leave from UT Austin and is a Distinguished Scientist at Google. Prior to that, he was Vice President and Distinguished Scientist at Amazon, and headed the Amazon Research Lab in Berkeley, California, where he and his team developed and deployed state-of-the-art machine learning methods for Amazon Search. His main research interests are in machine learning, big data, deep learning, network analysis, linear algebra and optimization. He received his B.Tech. degree from IIT Bombay, and Ph.D. from UC Berkeley. Inderjit has received several awards, including the ICES Distinguished Research Award, the SIAM Outstanding Paper Prize, the Moncrief Grand Challenge Award, the SIAM Linear Algebra Prize, the University Research Excellence Award, and the NSF Career Award. He has published over 200 journal and conference papers, and has served on the Editorial Board of the Journal of Machine Learning Research, the IEEE Transactions of Pattern Analysis and Machine Intelligence, Foundations and Trends in Machine Learning and the SIAM Journal for Matrix Analysis and Applications. Inderjit is an ACM Fellow, an IEEE Fellow, a SIAM Fellow, an AAAS Fellow and a AAAI Fellow.

Host: Prof. Olaf Schenk

Facoltà

Contatti

Facoltà di scienze informatiche - Segreterie degli studi

+41 58 666 46 90

[email protected]

Allegati

Add to your calendar

Condividi

Facebook

X

LinkedIn

Whatsapp

Email

Università

Studiare all'USI

Ricerca

Innovazione

Notizie ed eventi

A Sparsified Online Newton Method for Training Large Neural Networks

Facoltà

Contatti

Allegati

Condividi

Stampa

Indicazioni

Resta in contatto

Per la tua sicurezza