Damek Davis

I am an associate professor of operations research at Cornell University. I received my Ph.D. in mathematics from the University of California, Los Angeles, in 2015. My PhD advisors were Wotao Yin and Stefano Soatto.


I am interested in the mathematics of data science, particularly the interplay of optimization, signal processing, statistics, and machine learning. I enjoy designing and analyzing learning algorithms based on optimization. You can read more about my research in my research overview or publications. Also check out my two favorite recent projects on accelerated root-finding and accelerated optimization on Twitter.


My research has received several awards, including a Sloan Research Fellowship in Mathematics, the INFORMS Optimization Society Young Researchers Prize, an NSF CAREER Award, and the SIAM Activity Group on Optimization Best Paper Prize. I am currently an associate editor at Mathematical Programming and Foundations of Computational Mathematics.


CV | Email | Twitter | Github | Google Scholar


Students

Current PhD Students
Tao Jiang
(Status: A exam passed)


Graduated PhD Students
Liwei Jiang
(Next position: Georgia Tech ISYE Postdoc, then Asst Prof at Purdue)
Vasilis Charisopoulos
(Next position: University of Chicago postdoc, then Asst Prof at University of Washington, Seattle)
Mateo Díaz
(Next position: Caltech CMS postdoc, then Asst Prof at Johns Hopkins University)
Ben Grimmer
(Next position: Asst Prof at Johns Hopkins University)


Teaching

Lecture notes
Optimization: Structure, Duality, Calculus, and Algorithms
Draft of F’19 notes for my course ORIE 6300
(Last Update: 1/2020)


Honors and awards

SIAM Activity Group on Optimization Best Paper Prize (2023)
NSF CAREER Award (2021)
Sloan Research Fellowship in Mathematics (2020)
INFORMS Optimization Society Young Researchers Prize (2019)
Finalist for the Best Paper Prize for Young Researchers in Continuous Optimization (2019)
Finalist for A. W. Tucker Prize for outstanding doctoral thesis (2018)
NSF Math Postdoctoral Fellowship (2015)
Pacific Journal of Mathematics Dissertation Prize (2015)
INFORMS Optimization Society Student Paper Prize (2014)
NSF Graduate Research Fellowship (2010)
Elected to Phi Beta Kappa (2009)


Research overview

(My complete publication list can be viewed below, on my CV, or on Google Scholar.)


Learning algorithms work exceptionally well in practice, but we have yet to find a coherent mathematical foundation explaining when they work and how to improve their performance. The challenge is that most learning algorithms rely on fitting highly nonlinear models via simple nonconvex optimization heuristics, and except for a few exceptional cases, there is no guarantee they will find global optima. Despite this and NP-hardness, simple heuristics often succeed, and over the last few years, I have studied why and when they do.


I spend a lot of time thinking about neural networks. I am particularly interested in whether we can provide provable convergence guarantees to standard training algorithms or substantially improve existing methods. Deep networks fall outside the scope of classical optimization theory since they lead to problems that lack conventionally helpful notions of convexity or smoothness. Taking the inherent nonsmooth structure of neural networks seriously is crucial to understand these methods. I study this structure and the associated algorithms by using and developing tools from several disciplines, including nonsmooth/variational analysis, tame geometry, and high-dimensional statistics.


Algorithms for nonsmooth optimization in machine learning

While neural networks are nonsmooth, they are not pathological — they are built from just a few simple components, like polynomials, exponentials, logs, max’s, min’s, and absolute values. The best model of such non-pathological functions available in optimization is the so-called tame class, a class which appears in several of my papers and precludes cantor function-esque behavior. I have spent much time trying to uncover notions of beneficial “partial” smoothness in tame optimization problems to exploit this structure in algorithms.


While tame problems comprise virtually all tasks of interest, they lack enough structure to endow simple iterative methods with global efficiency guarantees. A class with more structure, which I view as a stepping stone between convex functions and general neural networks, is the so-called weakly convex class. These are functions that differ from convex functions by a simple quadratic. This class is deceptively simple yet surprisingly broad. It includes, for example, all C^2 smooth functions (on compact sets) and all compositions of Lipschitz convex functions with smooth mappings: h(c(x)). These losses appear throughout data science, particularly in low-rank matrix recovery problems (e.g., matrix completion and sensing).


My group has been working towards understanding the convergence of simple iterative methods, such as the stochastic subgradient method (SGD), on the tame and weakly convex problem classes. We have also been working towards designing methods that outperform SGD.


I will briefly summarize some of the contributions of my group. For those interested, you can find a brief technical introduction to some of my papers in the expository note:


Subgradient methods under weak convexity and tame geometry
Damek Davis, Dmitriy Drusvyatskiy
SIAG/OPT Views and News (2020)


An exponential speedup for “generic” tame problems

We developed the first first-order method that (locally) converges nearly linearly (i.e., exponentially fast) on “generic” tame problems. This result shows that we can exponentially(!) surpass the “speed limit” of gradient methods derived by Nemirovski and Yudin in the 80s -- if we wait a bit. The result applies to “almost every problem” in practice. We found this super surprising!


A nearly linearly convergent first-order method for nonsmooth functions with quadratic growth
Damek Davis, Liwei Jiang
Foundations of Computational Mathematics (to appear) | code | Twitter thread


A superlinearly convergent method for “generic” tame equations

We developed the first algorithm that (locally) converges nearly superlinearly (i.e., double exponentially fast) on “generic” tame equations.


A superlinearly convergent subgradient method for sharp semismooth problems
Vasileios Charisopoulos, Damek Davis
Mathematics of Operations Research (2023) | code | Twitter thread


Training guarantees for SGD on tame and weakly convex functions

We showed that the stochastic subgradient method (e.g., backpropagation) converges to first-order critical points on virtually any neural network.


Stochastic subgradient method converges on tame functions
Damek Davis, Dmitriy Drusvyatskiy, Sham Kakade, Jason D. Lee
Foundations of Computational Mathematics (2018) | Talk


We proved the first sample/computational efficiency guarantees for the stochastic subgradient method on the weakly convex class.


Stochastic model-based minimization of weakly convex functions
Damek Davis, Dmitriy Drusvyatskiy
SIAM Journal on Optimization (2018) | blog


Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems.
Damek Davis, Benjamin Grimmer
SIAM Journal on Optimization (2018) [ code ]


Avoidable saddle points in nonsmooth optimization

We developed the concept of an avoidable nonsmooth saddle point — nonoptimal points that algorithms may approach. The proper formulation of this concept is well-known in C^2 smooth optimization but was missing even for C^1 functions. We showed that both first-order and proximal methods do not converge to these points on “generic” tame problems:


Talk: avoiding saddle points in nonsmooth optimization
Updated (11/2021) | video


Proximal methods avoid active strict saddles of weakly convex functions
Damek Davis, Dmitriy Drusvyatskiy
Foundations of Computational Mathematics (2021)


Escaping strict saddle points of the Moreau envelope in nonsmooth optimization
Damek Davis, Mateo Díaz, Dmitriy Drusvyatskiy
SIAM Journal on Optimization (2022)


Active manifolds, stratifications, and convergence to local minima in nonsmooth optimization
Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang
Manuscript (2022)


Asymptotic normality of SGD in nonsmooth optimization

We characterized the asymptotic distribution of the error sequence in stochastic subgradient methods, proving it is asymptotically normal with “optimal covariance” on “generic” tame problems.


Asymptotic normality and optimality in nonsmooth stochastic approximation
Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang
Manuscript (2023)


Low-rank matrix recovery: a stepping stone to neural networks

We achieved the first sample complexity optimal and computationally optimal methods for several low-rank matrix recovery based on nonsmooth weakly convex formulations. Nonsmoothness was crucial to establishing these rates since prior smooth formulations suffered from “poor conditioning.”


Composite optimization for robust rank one bilinear sensing
Vasileios Charisopoulos, Damek Davis, Mateo Diaz, Dmitriy Drusvyatskiy
IMA Journal on Information and Inference (2020) [ code ]


The nonsmooth landscape of phase retrieval
Damek Davis, Dmitriy Drusvyatskiy, Courtney Paquette
IMA Journal on Numerical Analysis (2017) | Talk


Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence
Vasileios Charisopoulos, Yudong Chen, Damek Davis, Mateo Díaz, Lijun Ding, Dmitriy Drusvyatskiy
Foundations of Computational Mathematics (2019) | code


Other selected work

Besides my work on nonconvex learning algorithms, I also have worked on clustering and convex optimization algorithms.


Provable clustering methods and a potential statistical-to-computational gap

Clustering is a fundamental statistical problem of dividing a dataset into two or more groups. Our work on this talk topic focuses on the classical setting wherein both clusters follow a Gaussian distribution with identical covariance but distinct means. When the covariance matrix is known or “nearly spherical,” there are efficient algorithms to perform the clustering and achieve the “Bayes-optimal error rate.” When the covariance is unknown or poorly conditioned, no known algorithms achieve the Bayes-optimal rate.


Our contribution to this topic is a surprising dichotomy for clustering with an unknown covariance matrix: on the one hand, the maximum likelihood estimator uncovers the correct clustering and achieves the Bayes-optimal error; on the other, we give evidence that no known algorithm can compute the maximum likelihood estimator unless one increases the number of samples by an order of magnitude. Thus, we conjecture that there is a statistical-to-computational gap for this classical statistical problem.


Clustering a Mixture of Gaussians with Unknown Covariance
Damek Davis, Mateo Diaz, Kaizheng Wang
Manuscript (2021)


Three-Operator Splitting and the complexity of splitting methods

I focused on a class of convex optimization algorithms called operator-splitting methods for my PhD thesis. An operator splitting method is a technique for writing the solution of a “structured” convex optimization problem as the fixed-point of a well-behaved nonlinear operator. Algorithmically, one then finds the fixed-point of the operator through, e.g., the classical fixed-point iteration. My best-known contributions to the topic include the (1) “Three-Operator-Splitting” method, which has been widely used throughout computational imaging, and (2) my work that established the convergence rates of several classical splitting methods, such as the Douglas-Rachford splitting method and Alternating Direction Method of Multipliers (ADMM).


A Three-Operator Splitting Scheme and its Optimization Applications
Damek Davis, Wotao Yin
Set-Valued and Variational Analysis (2017)


Convergence rate analysis of several splitting schemes
Damek Davis, Wotao Yin
Splitting Methods in Communication and Imaging, Science and Engineering (2017)


Publications

Preprints | Conference papers | Journal papers | Book chapters | Expository | Reports


Preprints


Computational Microscopy beyond Perfect Lenses
Xingyuan Lu, Minh Pham, Elisa Negrini, Damek Davis, Stanley J. Osher, Jianwei Miao
Manuscript (2023)


Active manifolds, stratifications, and convergence to local minima in nonsmooth optimization
Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang
Manuscript (2022)


Asymptotic normality and optimality in nonsmooth stochastic approximation
Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang
Manuscript (2023)


Clustering a Mixture of Gaussians with Unknown Covariance
Damek Davis, Mateo Diaz, Kaizheng Wang
Manuscript (2021)


Stochastic optimization over proximally smooth sets
Damek Davis, Dmitriy Drusvyatskiy, Zhan Shi
Manuscript (2020)


Conference papers


Aiming towards the minimizers: fast convergence of SGD for overparametrized problems
Chaoyue Liu, Dmitriy Drusvyatskiy, Mikhail Belkin, Damek Davis, Yi-An Ma
NeurIPS (2023)


A gradient sampling method with complexity guarantees for Lipschitz functions in high and low dimensions
Damek Davis, Dmitriy Drusvyatskiy, Yin Tat Lee, Swati Padmanabhan, Guanghao Ye
NeurIPS (2022)
Oral Presentation (top ~1%)


High probability guarantees for stochastic convex optimization
Damek Davis, Dmitriy Drusvyatskiy
In Conference on Learning Theory (2020)


Global Convergence of EM Algorithm for Mixtures of Two Component Linear Regression
Jeongyeol Kwon, Wei Qian, Constantine Caramanis, Yudong Chen, and Damek Davis
Conference on Learning Theory (2019)


The Sound of APALM Clapping: Faster Nonsmooth Nonconvex Optimization with Stochastic Asynchronous PALM
Damek Davis, Brent Edmunds, Madeleine Udell
Neural Information Processing Systems (2016) | report


Multiview Feature Engineering and Learning
Jingming Dong, Nikos Karianakis, Damek Davis, Joshua Hernandez, Jonathan Balzer and Stefano Soatto
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)


Asymmetric sparse kernel approximations for large-scale visual search.
Damek Davis, Jonathan Balzer, Stefano Soatto
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)


Journal papers


A nearly linearly convergent first-order method for nonsmooth functions with quadratic growth
Damek Davis, Liwei Jiang
Foundations of Computational Mathematics (to appear) | code | Twitter thread


Stochastic algorithms with geometric step decay converge linearly on sharp functions
Damek Davis, Dmitriy Drusvyatskiy, Vasileios Charisopoulos
Mathematical Programming (to appear) | code


A superlinearly convergent subgradient method for sharp semismooth problems
Vasileios Charisopoulos, Damek Davis
Mathematics of Operations Research (2023) | code | Twitter Thread


Escaping strict saddle points of the Moreau envelope in nonsmooth optimization
Damek Davis, Mateo Díaz, Dmitriy Drusvyatskiy
SIAM Journal on Optimization (2022)


Variance reduction for root-finding problems
Damek Davis
Mathematical Programming (to appear)


Conservative and semismooth derivatives are equivalent for semialgebraic maps
Damek Davis, Dmitriy Drusvyatskiy
Set-Valued and Variational Analysis (to appear)


From low probability to high confidence in stochastic convex optimization
Damek Davis, Dmitriy Drusvyatskiy, Lin Xiao, Junyu Zhang
Journal of Machine Learning Research (to appear)


Proximal methods avoid active strict saddles of weakly convex functions
Damek Davis, Dmitriy Drusvyatskiy
Foundations of Computational Mathematics (2021)


Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence
Vasileios Charisopoulos, Yudong Chen, Damek Davis, Mateo Díaz, Lijun Ding, Dmitriy Drusvyatskiy
Foundations of Computational Mathematics (to appear) | code


Composite optimization for robust rank one bilinear sensing
Vasileios Charisopoulos, Damek Davis, Mateo Diaz, Dmitriy Drusvyatskiy
IMA Journal on Information and Inference (2020) | code


Graphical Convergence of Subgradients in Nonconvex Optimization and Learning
Damek Davis, Dmitriy Drusvyatskiy
Mathematics of Operations Research (to appear)


Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems.
Damek Davis, Benjamin Grimmer
SIAM Journal on Optimization (to appear) | code


Trimmed Statistical Estimation via Variance Reduction
Aleksandr Aravkin, Damek Davis
Mathematics of Operations Research (2019) | video


Stochastic subgradient method converges on tame functions.
Damek Davis, Dmitriy Drusvyatskiy, Sham Kakade, Jason D. Lee
Foundations of Computational Mathematics (to appear)
Finalist for the Best Paper Prize for Young Researchers in Continuous Optimization (2019)


The nonsmooth landscape of phase retrieval
Damek Davis, Dmitriy Drusvyatskiy, Courtney Paquette
IMA Journal on Numerical Analysis (2018)


Stochastic model-based minimization of weakly convex functions.
Damek Davis, Dmitriy Drusvyatskiy
SIAM Journal on Optimization (2019) | blog
This is the combination of the two arXiv preprints arXiv:1802.02988 and arXiv:1803.06523
Supplementary technical note: Complexity of finding near-stationary points of convex functions stochastically
Related report on nonsmooth nonconvex mirror descent Stochastic model-based minimization under high-order growth (2018)
INFORMS Optimization Society Young Researchers Prize (2019)


Subgradient methods for sharp weakly convex functions
Damek Davis, Dmitriy Drusvyatskiy, Kellie J. MacPhee, Courtney Paquette
Journal of Optimization Theory and Applications (2018)


Forward-Backward-Half Forward Algorithm for Solving Monotone Inclusions
Luis M. Briceño-Arias, Damek Davis
SIAM Journal on Optimization (2018)


Convergence rate analysis of the forward-Douglas-Rachford splitting scheme.
Damek Davis
SIAM Journal on Optimization (2015)


Convergence rate analysis of primal-dual splitting schemes
Damek Davis
SIAM Journal on Optimization (2015)


Faster convergence rates of relaxed Peaceman-Rachford and ADMM under regularity assumptions
Damek Davis, Wotao Yin
Mathematics of Operations Research (2016)


A Three-Operator Splitting Scheme and its Optimization Applications.
Damek Davis, Wotao Yin
Set-Valued and Variational Analysis (2017) | code | slides


Beating level-set methods for 5D seismic data interpolation: a primal-dual alternating approach
Rajiv Kumar, Oscar López, Damek Davis, Aleksandr Y. Aravkin, Felix J. Herrmann
IEEE Transactions on Computational Imaging (2017)


Tactical Scheduling for Precision Air Traffic Operations: Past Research and Current Problems
Douglas R. Isaacson, Alexander V. Sadovsky, Damek Davis
Journal of Aerospace Information Systems, April, Vol. 11, No. 4 : pp. 234-257


Efficient computation of separation-compliant speed advisories for air traffic arriving in terminal airspace.
Alexander V. Sadovsky, Damek Davis, Douglas R. Isaacson.
Journal of Dynamic Systems Measurement and Control 136(4), 041027 (2014)


Separation-compliant, optimal routing and control of scheduled arrivals in a terminal airspace.
Alexander V. Sadovsky, Damek Davis, and Douglas R. Isaacson.
Transportation Research Part C: Emerging Technologies 37 (2013): 157-176


Factorial and Noetherian Subrings of Power Series Rings.
Damek Davis, Daqing Wan
Proceedings of the American Mathematical Society 139 (2011), no. 3, 823-834


Book chapters

Convergence rate analysis of several splitting schemes
Damek Davis, Wotao Yin
Splitting Methods in Communication and Imaging, Science and Engineering (2017)
video | slides | summary
Winner of the 2014 INFORMS optimization society best student paper prize.


Expository

Subgradient methods under weak convexity and tame geometry
Damek Davis, Dmitriy Drusvyatskiy
SIAG/OPT News and Views (2020)


Convergence Rate Analysis of Several Splitting Schemes
Damek Davis
INFORMS OS Today (2015)


Technical reports

A linearly convergent Gauss-Newton subgradient method for ill-conditioned problems
Damek Davis, Tao Jiang
Technical report (2023) | code


Stochastic model-based minimization under high-order growth.
Damek Davis, Dmitriy Drusvyatskiy, Kellie J. MacPhee
Technical Report (2018)


An (O(n\log(n))) algorithm for projecting onto the ordered weighted (\ell_1) norm ball
Damek Davis
UCLA CAM report 15-32 (2015) | code


SMART: The Stochastic Monotone Aggregated Root-Finding Algorithm.
Damek Davis
Manuscript (2015) [ slides ] [ video ]