AKA: Optimization in PyTorch
I’m currently developing this course for Spring 2025. I am literally pushing my thoughts to this public repo as soon as I have them. Be prepared for the course to look rough until the semester starts.
See a rough outline of the course here.
For lecture notes see table of contents.
Discord server: I’ll invite you to the course’s Discord server once the course starts. If you join late and do not have an invite, email me from your Penn email and I’ll add you.
Office Hours: Tuesdays 12:30-1:30 PM
Why am I excited about this course? Optimization works! Everything you learn about solving optimization problems numerically will be useful in your career, whether you’re a data scientist, AI researcher, or an engineer of any kind. Understanding how optimization methods work and how to use them will actually help you solve real problems.
To quote Joshua Achiam (OpenAI):
If you want to know something deep, fundamental, and maximally portable between virtually every field: study mathematical optimization.
Optimization is the modeling language in which modern data science, machine learning, and sequential decision-making problems are formulated and solved numerically. This course will teach you how to formulate these problems mathematically, choose appropriate algorithms to solve them, and implement and tune the algorithms in PyTorch. Tentative topics include:
By the end of this course, you will become an intelligent consumer of numerical methods and software for solving modern optimization problems.
ITERATIVE DEVELOPMENT PROCESS PROJECT COMPONENTS
=========================== ==================
┌─────────────────┐ ┌─────────────────┐ ┌────────────────────┐
│ INITIAL SETUP │ │ DELIVERABLES │ │ PROJECT OPTIONS │
│ Teams: 3-4 ├───────────┤ • GitHub Repo │ │ • Model Training │
│ Week 2 Start │ │ • Colab Demo │ │ • Reproducibility │
└───────┬─────────┘ │ • Final Paper │ │ • Benchmarking │
│ │ • Slide Deck │ │ • Research Extend │
│ └───────┬─────────┘ │ • ... │
│ │ └────────────────────┘
│ ▼
│ ┌─────────────────┐ BIWEEKLY SCHEDULE
▼ │ FEEDBACK │ ════════════════
┌─────────────────┐ │ PEER REVIEWS: │ Week 3: Report
│ IMPLEMENT │◀───────── ┤ • Run Code │ Week 4: Slides Draft
│ • Write Code │ │ • Test Demo │ Week 5: Report
│ • Test & Debug ├─────────▶ │ • Give Feedback │ Week 6: Slides Draft
│ • Document │ │ │ Week 7: Report
└─────────────────┘ │ PROF MEETINGS: │ Week 8: ⚡LIGHTNING TALK⚡
│ • Week 3 Scope │ Week 9: Report
│ • Week 7 Mid │ Week 10: Slides Draft
│ • Week 11 Final │ Week 11: Report
└─────────────────┘ Week 12: Slides Draft
Week 13: Final Report
DEVELOPMENT WITH LLMs Week 14: Final Present
• Write & review reports, documentation
• Develop & test code (verify outputs!)
• Regular commits with clear documentation
…which you then iterate on throughout the semester.
Why this approach?
Final projects often become a single rushed deliverable. We’ll break the project into regular drafts and feedback cycles so your team can iterate, improve, and build something more substantial and refined. You’ll have multiple checkpoints, each with opportunities for critique and revision. By the end, you’ll have a polished piece of work you can showcase—something worthy of your portfolio or internship applications.
By developing your projects in iterative steps, you will receive feedback multiple times and have a better chance of creating something valuable. This also help me shape the course content based on your areas of interest, ensuring that lectures and assignments align well with your goals.
In the course project, you will apply the optimization skills you learned in the course. For example, your final project could include:
I will post more project ideas adapted to student interests in the course’s first weeks. Each project must include a Google Colab (or equivalent) walkthrough of your results and claims. I must approve the scope of the project.
This course material will be self-contained outside of the prerequisites, but there will be no official textbook. Some material will draw on recent research published in academic conferences and journals. Some will draw on software libraries that are mature or in development. Below, I have instead listed several resources that we may draw – more will be added throughout the course:
Beck, A. (2014). Introduction to nonlinear optimization: Theory, algorithms, and applications with MATLAB. Society for Industrial and Applied Mathematics.
Nocedal, J., & Wright, S. J. (1999). Numerical optimization. Springer.
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.
Hazan, E. (2016). Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4), 157-325.
UCLA course ECE236C - Optimization Methods for Large-Scale Systems https://www.seas.ucla.edu/~vandenbe/ee236c.html
Google Research. (n.d.). Deep learning tuning playbook. https://github.com/google-research/tuning_playbook
Google Research. (2023). Benchmarking neural network training algorithms and the ML commons library. https://arxiv.org/abs/2306.07179 https://github.com/mlcommons/algorithmic-efficiency/tree/main
Moreau, T., et al. (2022). Benchopt: Reproducible, efficient and collaborative optimization benchmarks. https://github.com/benchopt/benchopt
Schmidt, R., et al. (2021). Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers. https://proceedings.mlr.press/v139/schmidt21a
CVXPY: Convex optimization, for everyone. https://www.cvxpy.org/
PyTorch (The full library): https://pytorch.org/
Numerical implementation of optimizers in PyTorch: https://pytorch.org/docs/stable/optim.html#algorithms
Jax library: https://github.com/google/jax
Micrograd: A tiny educational “auto differentiation” library https://github.com/karpathy/micrograd
CoLA: a framework for scalable linear algebra that exploits structure often found in machine learning problems. https://github.com/wilson-labs/cola
MinGPT: A PyTorch re-implementation of GPT https://github.com/karpathy/minGPT
The annotated diffusion model https://huggingface.co/blog/annotated-diffusion
Repository containing resources and papers for diffusion models https://diff-usion.github.io/Awesome-Diffusion-Models/
GitHub Docs: Hello World https://docs.github.com/en/get-started/start-your-journey/hello-world
Weights and Biases https://wandb.ai/site
Sasha Rush’s has luckily made a series of open source puzzles for helping you understand GPUs, automatic differentiation, and optimization.
Tensor puzzles: https://github.com/srush/Tensor-Puzzles
GPU Puzzles: https://github.com/srush/GPU-Puzzles
Autodiff Puzzles: https://github.com/srush/Autodiff-Puzzles/
It’s useful to appreciate how optimization evolved as an algorithmic discipline over the last seventy years:
In this course, we will appreciate both sides: solver-based approaches for classical, well-structured problems (via CVXPY) and more flexible, high-powered frameworks (via PyTorch) for data-driven, nonconvex tasks.