I am interested in the mathematics of data science, particularly the interplay of optimization, signal processing, statistics, and machine learning. I enjoy designing and analyzing learning algorithms based on optimization. You can read more about my research in my research overview or publications. You can also check out my three favorite recent projects on exponential accelerations of Gradient Descent, Semismooth Newton methods, and Subgradient methods on Twitter.
My research has received several awards, including a Sloan Research Fellowship in Mathematics, the INFORMS Optimization Society Young Researchers Prize, an NSF CAREER Award, and the SIAM Activity Group on Optimization Best Paper Prize. I am currently an associate editor at Mathematical Programming and Foundations of Computational Mathematics.
Please use my email sparingly for correspondence related to research questions, teaching, or other professional inquiries.
Publications | CV | Github (personal) | Github (old lab) | Google Scholar |
I am currently recruiting PhD students at Penn. If you are a Penn student and wish to discuss advising/collaboration, send me a concise, informative email to set up a meeting.
If you’re not currently a Penn student, please apply through the Wharton Statistics and Data Science PhD program. I receive many emails from applicants and cannot read/reply to all of them. Most of these emails are generic and are only personalized by including a title of one of my papers. To increase your chance of your email being read, write a concise email that only you could write.
Current PhD Students Tao Jiang (Cornell) Status: A exam passed Next position: Postdoc at Meta with Lin Xiao
Graduated PhD Students Liwei Jiang Next position: Georgia Tech ISYE Postdoc, then Assistant Professor at Purdue Vasilis Charisopoulos Next position: University of Chicago postdoc, then Assistant Professor at University of Washington, Seattle Mateo Díaz Next position: Caltech CMS postdoc, then Assistant Professor at Johns Hopkins University Ben Grimmer Next position: Assistant Professor at Johns Hopkins University
Spring 2025 STAT 4830: Numerical optimization for data science and machine learning
Lecture notes Optimization: Structure, Duality, Calculus, and Algorithms Draft of F’2023 notes for my course ORIE 6300
SIAM Activity Group on Optimization Best Paper Prize (2023) NSF CAREER Award (2021) Sloan Research Fellowship in Mathematics (2020) INFORMS Optimization Society Young Researchers Prize (2019) Finalist for the Best Paper Prize for Young Researchers in Continuous Optimization (2019) Finalist for A. W. Tucker Prize for outstanding doctoral thesis (2018) NSF Math Postdoctoral Fellowship (2015) Pacific Journal of Mathematics Dissertation Prize (2015) INFORMS Optimization Society Student Paper Prize (2014) NSF Graduate Research Fellowship (2010) Elected to Phi Beta Kappa (2009)
(My complete publication list can be viewed below, on my CV, or on Google Scholar.)
Learning algorithms work exceptionally well in practice, but we have yet to find a coherent mathematical foundation explaining when they work and how to improve their performance. The challenge is that most learning algorithms rely on fitting highly nonlinear models via simple nonconvex optimization heuristics, and except for a few exceptional cases, there is no guarantee they will find global optima. Despite this and NP-hardness, simple heuristics often succeed, and over the last few years, I have studied why and when they do.
I spend a lot of time thinking about neural networks. I am particularly interested in whether we can provide provable convergence guarantees to standard training algorithms or substantially improve existing methods. Deep networks fall outside the scope of classical optimization theory since they lead to problems that lack conventionally helpful notions of convexity or smoothness. Taking the inherent nonsmooth structure of neural networks seriously is crucial to understand these methods. I study this structure and the associated algorithms by using and developing tools from several disciplines, including nonsmooth/variational analysis, tame geometry, and high-dimensional statistics.
While neural networks are nonsmooth, they are not pathological — they are built from just a few simple components, like polynomials, exponentials, logs, max’s, min’s, and absolute values. The best model of such non-pathological functions available in optimization is the so-called tame class, a class which appears in several of my papers and precludes cantor function-esque behavior. I have spent much time trying to uncover notions of beneficial “partial” smoothness in tame optimization problems to exploit this structure in algorithms.
While tame problems comprise virtually all tasks of interest, they lack enough structure to endow simple iterative methods with global efficiency guarantees. A class with more structure, which I view as a stepping stone between convex functions and general neural networks, is the so-called weakly convex class. These are functions that differ from convex functions by a simple quadratic. This class is deceptively simple yet surprisingly broad. It includes, for example, all C^2 smooth functions (on compact sets) and all compositions of Lipschitz convex functions with smooth mappings: h(c(x)). These losses appear throughout data science, particularly in low-rank matrix recovery problems (e.g., matrix completion and sensing).
My group has been working towards understanding the convergence of simple iterative methods, such as the stochastic subgradient method (SGD), on the tame and weakly convex problem classes. We have also been working towards designing methods that outperform SGD.
I will briefly summarize some of the contributions of my group. For those interested, you can find a brief technical introduction to some of my papers in the expository note:
Subgradient methods under weak convexity and tame geometry Damek Davis, Dmitriy Drusvyatskiy SIAG/OPT Views and News (2020)
We developed the first first-order method that (locally) converges nearly linearly (i.e., exponentially fast) on “generic” tame problems. This result shows that we can exponentially(!) surpass the “speed limit” of gradient methods derived by Nemirovski and Yudin in the 80s – if we wait a bit. The result applies to “almost every problem” in practice. We found this super surprising!
A nearly linearly convergent first-order method for nonsmooth functions with quadratic growth Damek Davis, Liwei Jiang Foundations of Computational Mathematics (to appear) | code | Twitter thread
We developed the first algorithm that (locally) converges nearly superlinearly (i.e., double exponentially fast) on “generic” tame equations.
A superlinearly convergent subgradient method for sharp semismooth problems Vasileios Charisopoulos, Damek Davis Mathematics of Operations Research (2023) | code | Twitter thread
We showed that the stochastic subgradient method (e.g., backpropagation) converges to first-order critical points on virtually any neural network.
Stochastic subgradient method converges on tame functions Damek Davis, Dmitriy Drusvyatskiy, Sham Kakade, Jason D. Lee Foundations of Computational Mathematics (2018) | Talk
We proved the first sample/computational efficiency guarantees for the stochastic subgradient method on the weakly convex class.
Stochastic model-based minimization of weakly convex functions Damek Davis, Dmitriy Drusvyatskiy SIAM Journal on Optimization (2018) | blog
Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems. Damek Davis, Benjamin Grimmer SIAM Journal on Optimization (2018) [code]
We developed the concept of an avoidable nonsmooth saddle point — nonoptimal points that algorithms may approach. The proper formulation of this concept is well-known in C^2 smooth optimization but was missing even for C^1 functions. We showed that both first-order and proximal methods do not converge to these points on “generic” tame problems:
Talk: avoiding saddle points in nonsmooth optimization Updated (11/2021) | video
Proximal methods avoid active strict saddles of weakly convex functions Damek Davis, Dmitriy Drusvyatskiy Foundations of Computational Mathematics (2021)
Escaping strict saddle points of the Moreau envelope in nonsmooth optimization Damek Davis, Mateo Díaz, Dmitriy Drusvyatskiy SIAM Journal on Optimization (2022)
Active manifolds, stratifications, and convergence to local minima in nonsmooth optimization Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang Manuscript (2022)
We characterized the asymptotic distribution of the error sequence in stochastic subgradient methods, proving it is asymptotically normal with “optimal covariance” on “generic” tame problems.
Asymptotic normality and optimality in nonsmooth stochastic approximation Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang Manuscript (2023)
We achieved the first sample complexity optimal and computationally optimal methods for several low-rank matrix recovery based on nonsmooth weakly convex formulations. Nonsmoothness was crucial to establishing these rates since prior smooth formulations suffered from “poor conditioning.”
Composite optimization for robust rank one bilinear sensing Vasileios Charisopoulos, Damek Davis, Mateo Diaz, Dmitriy Drusvyatskiy IMA Journal on Information and Inference (2020) [ code ]
The nonsmooth landscape of phase retrieval Damek Davis, Dmitriy Drusvyatskiy, Courtney Paquette IMA Journal on Numerical Analysis (2017) | Talk
Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence Vasileios Charisopoulos, Yudong Chen, Damek Davis, Mateo Díaz, Lijun Ding, Dmitriy Drusvyatskiy Foundations of Computational Mathematics (2019) | code
Besides my work on nonconvex learning algorithms, I also have worked on clustering and convex optimization algorithms.
Clustering is a fundamental statistical problem of dividing a dataset into two or more groups. Our work on this talk topic focuses on the classical setting wherein both clusters follow a Gaussian distribution with identical covariance but distinct means. When the covariance matrix is known or “nearly spherical,” there are efficient algorithms to perform the clustering and achieve the “Bayes-optimal error rate.” When the covariance is unknown or poorly conditioned, no known algorithms achieve the Bayes-optimal rate.
Our contribution to this topic is a surprising dichotomy for clustering with an unknown covariance matrix: on the one hand, the maximum likelihood estimator uncovers the correct clustering and achieves the Bayes-optimal error; on the other, we give evidence that no known algorithm can compute the maximum likelihood estimator unless one increases the number of samples by an order of magnitude. Thus, we conjecture that there is a statistical-to-computational gap for this classical statistical problem.
Clustering a Mixture of Gaussians with Unknown Covariance Damek Davis, Mateo Diaz, Kaizheng Wang Manuscript (2021)
I focused on a class of convex optimization algorithms called operator-splitting methods for my PhD thesis. An operator splitting method is a technique for writing the solution of a “structured” convex optimization problem as the fixed-point of a well-behaved nonlinear operator. Algorithmically, one then finds the fixed-point of the operator through, e.g., the classical fixed-point iteration. My best-known contributions to the topic include the (1) “Three-Operator-Splitting” method, which has been widely used throughout computational imaging, and (2) my work that established the convergence rates of several classical splitting methods, such as the Douglas-Rachford splitting method and Alternating Direction Method of Multipliers (ADMM).
A Three-Operator Splitting Scheme and its Optimization Applications Damek Davis, Wotao Yin Set-Valued and Variational Analysis (2017)
Convergence rate analysis of several splitting schemes Damek Davis, Wotao Yin Splitting Methods in Communication and Imaging, Science and Engineering (2017)
Preprints | Conference papers | Journal papers | Book chapters | Expository | Reports |
Gradient descent with adaptive stepsize converges (nearly) linearly under fourth-order growth Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang Manuscript (2024)
Aiming towards the minimizers: fast convergence of SGD for overparametrized problems Chaoyue Liu, Dmitriy Drusvyatskiy, Mikhail Belkin, Damek Davis, Yi-An Ma NeurIPS (2023)
A gradient sampling method with complexity guarantees for Lipschitz functions in high and low dimensions Damek Davis, Dmitriy Drusvyatskiy, Yin Tat Lee, Swati Padmanabhan, Guanghao Ye NeurIPS (2022) Oral Presentation (top ~1%)
High probability guarantees for stochastic convex optimization Damek Davis, Dmitriy Drusvyatskiy In Conference on Learning Theory (2020)
Global Convergence of EM Algorithm for Mixtures of Two Component Linear Regression Jeongyeol Kwon, Wei Qian, Constantine Caramanis, Yudong Chen, and Damek Davis Conference on Learning Theory (2019)
The Sound of APALM Clapping: Faster Nonsmooth Nonconvex Optimization with Stochastic Asynchronous PALM Damek Davis, Brent Edmunds, Madeleine Udell Neural Information Processing Systems (2016) | report
Multiview Feature Engineering and Learning Jingming Dong, Nikos Karianakis, Damek Davis, Joshua Hernandez, Jonathan Balzer and Stefano Soatto In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Asymmetric sparse kernel approximations for large-scale visual search. Damek Davis, Jonathan Balzer, Stefano Soatto In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)
Active manifolds, stratifications, and convergence to local minima in nonsmooth optimization Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang Foundations of Computational Mathematics (to appear)
Stochastic optimization over proximally smooth sets Damek Davis, Dmitriy Drusvyatskiy, Zhan Shi SIAM Journal on Optimization (to appear)
Computational Microscopy beyond Perfect Lenses Xingyuan Lu, Minh Pham, Elisa Negrini, Damek Davis, Stanley J. Osher, Jianwei Miao Physical Review E (to appear)
Global Optimality of the EM Algorithm for Mixtures of Two-Component Linear Regressions Jeongyeol Kwon, Wei Qian, Constantine Caramanis, Yudong Chen, Damek Davis, Nhat Ho: IEEE Transactions on Information Theory (2024)
Clustering a Mixture of Gaussians with Unknown Covariance Damek Davis, Mateo Diaz, Kaizheng Wang Bernoulli (to appear)
Asymptotic normality and optimality in nonsmooth stochastic approximation Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang The Annals of Statistics (to appear) Second Place in INFORMS Optimization Society 2024 Student Paper Prize
A nearly linearly convergent first-order method for nonsmooth functions with quadratic growth Damek Davis, Liwei Jiang Foundations of Computational Mathematics (to appear) | code | Twitter thread
Stochastic algorithms with geometric step decay converge linearly on sharp functions Damek Davis, Dmitriy Drusvyatskiy, Vasileios Charisopoulos Mathematical Programming (to appear) | code
A superlinearly convergent subgradient method for sharp semismooth problems Vasileios Charisopoulos, Damek Davis Mathematics of Operations Research (2023) | code | Twitter Thread
Escaping strict saddle points of the Moreau envelope in nonsmooth optimization Damek Davis, Mateo Díaz, Dmitriy Drusvyatskiy SIAM Journal on Optimization (2022)
Variance reduction for root-finding problems Damek Davis Mathematical Programming (to appear)
Conservative and semismooth derivatives are equivalent for semialgebraic maps Damek Davis, Dmitriy Drusvyatskiy Set-Valued and Variational Analysis (to appear)
From low probability to high confidence in stochastic convex optimization Damek Davis, Dmitriy Drusvyatskiy, Lin Xiao, Junyu Zhang Journal of Machine Learning Research (to appear)
Proximal methods avoid active strict saddles of weakly convex functions Damek Davis, Dmitriy Drusvyatskiy Foundations of Computational Mathematics (2021)
Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence Vasileios Charisopoulos, Yudong Chen, Damek Davis, Mateo Díaz, Lijun Ding, Dmitriy Drusvyatskiy Foundations of Computational Mathematics (to appear) | code
Composite optimization for robust rank one bilinear sensing Vasileios Charisopoulos, Damek Davis, Mateo Diaz, Dmitriy Drusvyatskiy IMA Journal on Information and Inference (2020) | code
Graphical Convergence of Subgradients in Nonconvex Optimization and Learning Damek Davis, Dmitriy Drusvyatskiy Mathematics of Operations Research (to appear)
Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems. Damek Davis, Benjamin Grimmer SIAM Journal on Optimization (to appear) | code
Trimmed Statistical Estimation via Variance Reduction Aleksandr Aravkin, Damek Davis Mathematics of Operations Research (2019) | video
Stochastic subgradient method converges on tame functions. Damek Davis, Dmitriy Drusvyatskiy, Sham Kakade, Jason D. Lee Foundations of Computational Mathematics (to appear) Finalist for the Best Paper Prize for Young Researchers in Continuous Optimization (2019)
The nonsmooth landscape of phase retrieval Damek Davis, Dmitriy Drusvyatskiy, Courtney Paquette IMA Journal on Numerical Analysis (2018)
Stochastic model-based minimization of weakly convex functions. Damek Davis, Dmitriy Drusvyatskiy SIAM Journal on Optimization (2019) | blog This is the combination of the two arXiv preprints arXiv:1802.02988 and arXiv:1803.06523 Supplementary technical note: Complexity of finding near-stationary points of convex functions stochastically Related report on nonsmooth nonconvex mirror descent Stochastic model-based minimization under high-order growth (2018) INFORMS Optimization Society Young Researchers Prize (2019)
Subgradient methods for sharp weakly convex functions Damek Davis, Dmitriy Drusvyatskiy, Kellie J. MacPhee, Courtney Paquette Journal of Optimization Theory and Applications (2018)
Forward-Backward-Half Forward Algorithm for Solving Monotone Inclusions Luis M. Briceño-Arias, Damek Davis SIAM Journal on Optimization (2018)
Convergence rate analysis of the forward-Douglas-Rachford splitting scheme. Damek Davis SIAM Journal on Optimization (2015)
Convergence rate analysis of primal-dual splitting schemes Damek Davis SIAM Journal on Optimization (2015)
Faster convergence rates of relaxed Peaceman-Rachford and ADMM under regularity assumptions Damek Davis, Wotao Yin Mathematics of Operations Research (2016)
A Three-Operator Splitting Scheme and its Optimization Applications. Damek Davis, Wotao Yin Set-Valued and Variational Analysis (2017) | code | slides
Beating level-set methods for 5D seismic data interpolation: a primal-dual alternating approach Rajiv Kumar, Oscar López, Damek Davis, Aleksandr Y. Aravkin, Felix J. Herrmann IEEE Transactions on Computational Imaging (2017)
Tactical Scheduling for Precision Air Traffic Operations: Past Research and Current Problems Douglas R. Isaacson, Alexander V. Sadovsky, Damek Davis Journal of Aerospace Information Systems, April, Vol. 11, No. 4 : pp. 234-257
Efficient computation of separation-compliant speed advisories for air traffic arriving in terminal airspace. Alexander V. Sadovsky, Damek Davis, Douglas R. Isaacson. Journal of Dynamic Systems Measurement and Control 136(4), 041027 (2014)
Separation-compliant, optimal routing and control of scheduled arrivals in a terminal airspace. Alexander V. Sadovsky, Damek Davis, and Douglas R. Isaacson. Transportation Research Part C: Emerging Technologies 37 (2013): 157-176
Factorial and Noetherian Subrings of Power Series Rings. Damek Davis, Daqing Wan Proceedings of the American Mathematical Society 139 (2011), no. 3, 823-834
Convergence rate analysis of several splitting schemes Damek Davis, Wotao Yin Splitting Methods in Communication and Imaging, Science and Engineering (2017) video | slides | summary Winner of the 2014 INFORMS optimization society best student paper prize.
Subgradient methods under weak convexity and tame geometry Damek Davis, Dmitriy Drusvyatskiy SIAG/OPT News and Views (2020)
Convergence Rate Analysis of Several Splitting Schemes Damek Davis INFORMS OS Today (2015)
A linearly convergent Gauss-Newton subgradient method for ill-conditioned problems Damek Davis, Tao Jiang Technical report (2023) | code
Stochastic model-based minimization under high-order growth. Damek Davis, Dmitriy Drusvyatskiy, Kellie J. MacPhee Technical Report (2018)
An (O(n\log(n))) algorithm for projecting onto the ordered weighted (\ell_1) norm ball Damek Davis UCLA CAM report 15-32 (2015) | code
SMART: The Stochastic Monotone Aggregated Root-Finding Algorithm. Damek Davis Manuscript (2015) [ slides ] [ video ]