Damek Davis

I’m an Associate Professor in Wharton’s Department of Statistics and Data Science. I was previously an Associate Professor at Cornell ORIE, an NSF Postdoctoral Fellow, and a PhD student in Math at UCLA under Wotao Yin (Alibaba) and Stefano Soatto (AWS AI). I was a long term visitor at the Simon’s Institute in Fall 2017 (bridging discrete and continuous optimization) and Fall 2024 (LLM program). I am currently an associate editor at Mathematical Programming and Foundations of Computational Mathematics.

Research Interests. Optimization and machine learning. See here.

Teaching. I teach theory and practice of optimization and machine learning. I sometimes write lecture notes, e.g., Optimization in PyTorch and Convex Analysis and First-Order Methods.

Selected Awards. I received a Sloan Research Fellowship in Mathematics, an NSF CAREER Award, and the SIAM Activity Group on Optimization Best Paper Prize.

Selected Works. Read more about my research here. Together with collaborators, I

Developed exponential accelerations of gradient descent, semismooth Newton, and the subgradient method.
Proved first guarantees for SGD on weakly convex and tame functions, which covers essentially all neural networks.
Characterized the asymptotic distribution of SGD in nonsmooth optimization.
Developed the concept of a strict saddle point in nonsmooth optimization and showed proximal methods and stochastic subgradient methods avoid them.
Developed kernel methods that efficiently learn sparse hierarchical functions.

Students. I’ve advised 5 PhD students. If you are a Penn student and wish to discuss advising/collaboration, send me a concise, informative email to set up a meeting. I am an active advisor–students who work best with me tend to have energy levels that match or exceed mine.

Graduated PhD Students from Cornell:

Tao Jiang → Meta (Postdoc)
Liwei Jiang → Purdue (Assistant Professor)
Vasilis Charisopoulos → UW, Seattle (Assistant Professor)
Mateo Díaz → Johns Hopkins University (Assistant Professor)
Ben Grimmer → Johns Hopkins University (Assistant Professor)

Blog. I sporadically post notes here.

Please use my email sparingly for correspondence related to consulting, research questions, teaching, or other professional inquiries.

You may not know that…

I started programming in html in ‘98. I made a Pokémon website.
I took up guitar In ‘99, played in bands, wrote and recorded music regularly into college.
- I played banjo from 2010-2012, mainly in clawhammer style.
- I mostly play piano, now. I like to sing and sight read chord charts off of ultimate guitar.
I worked throughout highschool, cleaning tables, bagging groceries, and making coffees.
In 2006, I applied to one college, UC Irvine.
- I could only afford one application fee and it was close to the beach.
- I intended to study music and lived in the arts dorm.
- I took calculus my first semester and realized I loved math.
- I took algebra with Daqing Wan in 2009; we worked on a commutative algebra problem.
I went to grad school at UCLA for pure math in 2010. I loved Algebra.
- I learned ML was a thing in 2012 and took learning from data with set-theorist Bill Chen.
- I got excited about AI and joined UCLA’s vision lab in 2012.
- A year in, I saw optimization was everywhere. I just understood nothing about it.
- I read Nesterov’s book in 2013. I tried to prove each theorem before reading the proof.
- I took Wotao’s course in F’ 2013, and solved an open problem he mentioned in class.
- Without Wotao’s encouragement, I would not have applied to faculty jobs in 2014.
I became interested in writing in 2016.
- Two books influenced me: Clear and Simple as the Truth and Style.
  - The first helped me appreciate writing.
  - The second taught me how to structure text so a busy reader could appreciate it.
- I think writing improves when you become more comfortable with rejection.
- I decided to do more writing in public in 2025.
  - In the spring semester, I wrote course notes for optimization in PyTorch.
  - In May, I started writing notes on what I’ve been thinking about.
  - In July-September, I wrote about learning LLM engineering from scratch.
  - In October, I wrote about the objective of reasoning with reinforcement learning.
Minerva convinced me LLMs could eventually accelerate mathematics research. Since then,
- I’ve spoken to LLMs more than anyone else I know.
  - I’ve use them for math, coding, writing, and even negotiating bills.
- LLMs sort of helped me almost formalize convergence of gradient descent in Lean.
  - As of April 2025, no LLM could complete the proof.
- I received a grant that aims to make progress on the Hadamard conjecture with RL tools.

Publications

Preprints

What is the objective of reasoning with reinforcement learning? Damek Davis, Benjamin Recht Manuscript (2025)

Iteratively reweighted kernel machines efficiently learn sparse functions Libin Zhu, Damek Davis, Dmitriy Drusvyatskiy, Maryam Fazel Manuscript (2025)

Spectral norm bound for the product of random Fourier-Walsh matrices Libin Zhu, Damek Davis, Dmitriy Drusvyatskiy, Maryam Fazel Manuscript (2025)

Conference papers

Gradient descent with adaptive stepsize converges (nearly) linearly under fourth-order growth Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang Mathematical Programming (to appear)

Online Covariance Estimation in Nonsmooth Stochastic Approximation Liwei Jiang, Abhishek Roy, Krishna Balasubramanian, Damek Davis, Dmitriy Drusvyatskiy, Sen Na In Conference on Learning Theory (2025)

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems Chaoyue Liu, Dmitriy Drusvyatskiy, Mikhail Belkin, Damek Davis, Yi-An Ma NeurIPS (2023)

A gradient sampling method with complexity guarantees for Lipschitz functions in high and low dimensions Damek Davis, Dmitriy Drusvyatskiy, Yin Tat Lee, Swati Padmanabhan, Guanghao Ye NeurIPS (2022) Oral Presentation (top ~1%)

High probability guarantees for stochastic convex optimization Damek Davis, Dmitriy Drusvyatskiy In Conference on Learning Theory (2020)

Global Convergence of EM Algorithm for Mixtures of Two Component Linear Regression Jeongyeol Kwon, Wei Qian, Constantine Caramanis, Yudong Chen, and Damek Davis Conference on Learning Theory (2019)

The Sound of APALM Clapping: Faster Nonsmooth Nonconvex Optimization with Stochastic Asynchronous PALM Damek Davis, Brent Edmunds, Madeleine Udell Neural Information Processing Systems (2016) | report

Multiview Feature Engineering and Learning Jingming Dong, Nikos Karianakis, Damek Davis, Joshua Hernandez, Jonathan Balzer and Stefano Soatto In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)

Asymmetric sparse kernel approximations for large-scale visual search. Damek Davis, Jonathan Balzer, Stefano Soatto In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)

Journal papers

Active manifolds, stratifications, and convergence to local minima in nonsmooth optimization Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang Foundations of Computational Mathematics (to appear)

Stochastic optimization over proximally smooth sets Damek Davis, Dmitriy Drusvyatskiy, Zhan Shi SIAM Journal on Optimization (to appear)

Computational Microscopy beyond Perfect Lenses Xingyuan Lu, Minh Pham, Elisa Negrini, Damek Davis, Stanley J. Osher, Jianwei Miao Physical Review E (to appear)

Global Optimality of the EM Algorithm for Mixtures of Two-Component Linear Regressions Jeongyeol Kwon, Wei Qian, Constantine Caramanis, Yudong Chen, Damek Davis, Nhat Ho: IEEE Transactions on Information Theory (2024)

Clustering a Mixture of Gaussians with Unknown Covariance Damek Davis, Mateo Diaz, Kaizheng Wang Bernoulli (to appear)

Asymptotic normality and optimality in nonsmooth stochastic approximation Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang The Annals of Statistics (to appear)

A nearly linearly convergent first-order method for nonsmooth functions with quadratic growth Damek Davis, Liwei Jiang Foundations of Computational Mathematics (to appear) | code | Twitter thread

Stochastic algorithms with geometric step decay converge linearly on sharp functions Damek Davis, Dmitriy Drusvyatskiy, Vasileios Charisopoulos Mathematical Programming (to appear) | code

A superlinearly convergent subgradient method for sharp semismooth problems Vasileios Charisopoulos, Damek Davis Mathematics of Operations Research (2023) | code | Twitter Thread

Escaping strict saddle points of the Moreau envelope in nonsmooth optimization Damek Davis, Mateo Díaz, Dmitriy Drusvyatskiy SIAM Journal on Optimization (2022)

Variance reduction for root-finding problems Damek Davis Mathematical Programming (to appear)

Conservative and semismooth derivatives are equivalent for semialgebraic maps Damek Davis, Dmitriy Drusvyatskiy Set-Valued and Variational Analysis (to appear)

From low probability to high confidence in stochastic convex optimization Damek Davis, Dmitriy Drusvyatskiy, Lin Xiao, Junyu Zhang Journal of Machine Learning Research (to appear)

Proximal methods avoid active strict saddles of weakly convex functions Damek Davis, Dmitriy Drusvyatskiy Foundations of Computational Mathematics (2021)

Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence Vasileios Charisopoulos, Yudong Chen, Damek Davis, Mateo Díaz, Lijun Ding, Dmitriy Drusvyatskiy Foundations of Computational Mathematics (to appear) | code

Composite optimization for robust rank one bilinear sensing Vasileios Charisopoulos, Damek Davis, Mateo Diaz, Dmitriy Drusvyatskiy IMA Journal on Information and Inference (2020) | code

Graphical Convergence of Subgradients in Nonconvex Optimization and Learning Damek Davis, Dmitriy Drusvyatskiy Mathematics of Operations Research (to appear)

Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems. Damek Davis, Benjamin Grimmer SIAM Journal on Optimization (to appear) | code

Trimmed Statistical Estimation via Variance Reduction Aleksandr Aravkin, Damek Davis Mathematics of Operations Research (2019) | video

Stochastic subgradient method converges on tame functions. Damek Davis, Dmitriy Drusvyatskiy, Sham Kakade, Jason D. Lee Foundations of Computational Mathematics (to appear)

The nonsmooth landscape of phase retrieval Damek Davis, Dmitriy Drusvyatskiy, Courtney Paquette IMA Journal on Numerical Analysis (2018)

Stochastic model-based minimization of weakly convex functions. Damek Davis, Dmitriy Drusvyatskiy SIAM Journal on Optimization (2019) | blog This is the combination of the two arXiv preprints arXiv:1802.02988 and arXiv:1803.06523 Supplementary technical note: Complexity of finding near-stationary points of convex functions stochastically Related report on nonsmooth nonconvex mirror descent Stochastic model-based minimization under high-order growth (2018) INFORMS Optimization Society Young Researchers Prize (2019)

Subgradient methods for sharp weakly convex functions Damek Davis, Dmitriy Drusvyatskiy, Kellie J. MacPhee, Courtney Paquette Journal of Optimization Theory and Applications (2018)

Forward-Backward-Half Forward Algorithm for Solving Monotone Inclusions Luis M. Briceño-Arias, Damek Davis SIAM Journal on Optimization (2018)

Convergence rate analysis of the forward-Douglas-Rachford splitting scheme. Damek Davis SIAM Journal on Optimization (2015)

Convergence rate analysis of primal-dual splitting schemes Damek Davis SIAM Journal on Optimization (2015)

Faster convergence rates of relaxed Peaceman-Rachford and ADMM under regularity assumptions Damek Davis, Wotao Yin Mathematics of Operations Research (2016)

A Three-Operator Splitting Scheme and its Optimization Applications. Damek Davis, Wotao Yin Set-Valued and Variational Analysis (2017) | code | slides

Beating level-set methods for 5D seismic data interpolation: a primal-dual alternating approach Rajiv Kumar, Oscar López, Damek Davis, Aleksandr Y. Aravkin, Felix J. Herrmann IEEE Transactions on Computational Imaging (2017)

Tactical Scheduling for Precision Air Traffic Operations: Past Research and Current Problems Douglas R. Isaacson, Alexander V. Sadovsky, Damek Davis Journal of Aerospace Information Systems, April, Vol. 11, No. 4 : pp. 234-257

Efficient computation of separation-compliant speed advisories for air traffic arriving in terminal airspace. Alexander V. Sadovsky, Damek Davis, Douglas R. Isaacson. Journal of Dynamic Systems Measurement and Control 136(4), 041027 (2014)

Separation-compliant, optimal routing and control of scheduled arrivals in a terminal airspace. Alexander V. Sadovsky, Damek Davis, and Douglas R. Isaacson. Transportation Research Part C: Emerging Technologies 37 (2013): 157-176

Factorial and Noetherian Subrings of Power Series Rings. Damek Davis, Daqing Wan Proceedings of the American Mathematical Society 139 (2011), no. 3, 823-834