Search filters

List of works by Zeyuan Allen-Zhu

A Convergence Theory for Deep Learning via Over-Parameterization

scientific article published on 9 November 2018

Backward Feature Correction: How Deep Learning Performs Deep (Hierarchical) Learning

scientific article published on 13 January 2020

Byzantine Stochastic Gradient Descent

Can SGD Learn Recurrent Neural Networks with Provable Generalization?

scholarly article by Zeyuan Allen-Zhu & Yuanzhi Li published 2019 in Advances in Neural Information Processing Systems 32

Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters

scientific article published in January 2016

How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Is Q-Learning Provably Efficient?

LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain

scientific article published in January 2016

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls

scientific article published in January 2017

LoRA: Low-Rank Adaptation of Large Language Models

scientific article published on 17 June 2021

NEON2: Finding Local Minima via First-Order Oracles

Natasha 2: Faster Non-Convex Optimization Than SGD

On the Convergence Rate of Training Recurrent Neural Networks

scholarly article by Zeyuan Allen-Zhu et al published 2019 in Advances in Neural Information Processing Systems 32

Optimal Black-Box Reductions Between Optimization Objectives

scientific article published in January 2016

Sparse sign-consistent Johnson-Lindenstrauss matrices: compression with neuroscience-based constraints

scientific article

The Lingering of Gradients: How to Reuse Gradients Over Time

What Can ResNet Learn Efficiently, Going Beyond Kernels?