Stat 4830: Numerical optimization for data science and machine learning

AKA: Optimization in PyTorch

I’m currently developing this course for Spring 2025. I am literally pushing my thoughts to this public repo as soon as I have them. Be prepared for the course to look rough until the semester starts.

See a rough outline of the course here.

For lecture notes see table of contents.

ED/Canvas: If you have not been added to the course on ED or Canvas, please email me from your Penn email and I’ll add you.

Office Hours: Tuesdays 12:30-1:30 PM

Table of contents

High-level thoughts on the course


Optimization is the modeling language in which modern data science, machine learning, and sequential decision-making problems are formulated and solved numerically. This course will teach you how to formulate these problems mathematically, choose appropriate algorithms to solve them, and implement and tune the algorithms in PyTorch. Tentative topics include:

By the end of this course, you will become an intelligent consumer of numerical methods and software for solving modern optimization problems.


Deliverable: a final project due on week 3

Final Project Structure

VERY detailed instructions on how to get started with your project are available at the following link: STAT-4830-project-base

Timeline and milestones

Deliverables (Due Fridays):

Note: Instructions for peer feedback will be added throughout the semester for each deliverable.

Why this approach?
Final projects often become a single rushed deliverable. We’ll break the project into regular drafts and feedback cycles so your team can iterate, improve, and build something more substantial and refined. You’ll have multiple checkpoints, each with opportunities for critique and revision. By the end, you’ll have a polished piece of work you can showcase—something worthy of your portfolio or internship applications.

Deliverable format

  1. GitHub repository
    • Centralize all materials: your written report, code, and presentation slides.
    • Be sure to include a clear README with instructions for reproducing results.
  2. Executable demo
    • Provide a runnable demonstration in Google Colab. If your overall code is extensive, create a minimal Colab notebook that shows core functionality or key results.
  3. Written report
    • By default, structure it like a short conference-style paper (e.g., 8 pages + supplementary).
    • If you have a more creative format in mind, just run it by me first.

Feedback loop


By developing your projects in iterative steps, you will receive feedback multiple times and have a better chance of creating something valuable. This also help me shape the course content based on your areas of interest, ensuring that lectures and assignments align well with your goals.

Course project ideas

In the course project, you will apply the optimization skills you learned in the course. For example, your final project could include:

I will post more project ideas adapted to student interests in the course’s first weeks. Each project must include a Google Colab (or equivalent) walkthrough of your results and claims. I must approve the scope of the project.

Textbooks, readings, and software libraries

This course material will be self-contained outside of the prerequisites, but there will be no official textbook. Some material will draw on recent research published in academic conferences and journals. Some will draw on software libraries that are mature or in development. Below, I have instead listed several resources that we may draw – more will be added throughout the course:


General background

  1. Beck, A. (2014). Introduction to nonlinear optimization: Theory, algorithms, and applications with MATLAB. Society for Industrial and Applied Mathematics.

  2. Nocedal, J., & Wright, S. J. (1999). Numerical optimization. Springer.

  3. Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.

  4. Hazan, E. (2016). Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4), 157-325.

  5. UCLA course ECE236C - Optimization Methods for Large-Scale Systems

Numerical optimization in machine learning

  1. Google Research. (n.d.). Deep learning tuning playbook.

  2. Google Research. (2023). Benchmarking neural network training algorithms and the ML commons library.

  3. Moreau, T., et al. (2022). Benchopt: Reproducible, efficient and collaborative optimization benchmarks.

  4. Schmidt, R., et al. (2021). Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers.

Software and tutorials

Convex optimization in Python: CVXPY

CVXPY: Convex optimization, for everyone.

PyTorch, Jax, and auto differentiation

  1. PyTorch (The full library):

  2. Numerical implementation of optimizers in PyTorch:

  3. Jax library:

  4. Micrograd: A tiny educational “auto differentiation” library

Scalable linear algebra

CoLA: a framework for scalable linear algebra that exploits structure often found in machine learning problems.

Transformers and diffusion models

  1. MinGPT: A PyTorch re-implementation of GPT

  2. The annotated diffusion model

  3. Repository containing resources and papers for diffusion models

Version control: git and github

GitHub Docs: Hello World

Training visualization

Weights and Biases


Sasha Rush’s has luckily made a series of open source puzzles for helping you understand GPUs, automatic differentiation, and optimization.

  1. Tensor puzzles:

  2. GPU Puzzles:

  3. Autodiff Puzzles:

Brief historical perspective on optimization

It’s useful to appreciate how optimization evolved as an algorithmic discipline over the last seventy years:

In this course, we will appreciate both sides: solver-based approaches for classical, well-structured problems (via CVXPY) and more flexible, high-powered frameworks (via PyTorch) for data-driven, nonconvex tasks.