Stat 4830: Numerical optimization for data science and machine learning

AKA: Optimization in PyTorch

I’m currently developing this course for Spring 2025. I am literally pushing my thoughts to this public repo as soon as I have them. Be prepared for the course to look rough until the semester starts.

See a rough outline of the course here.

For lecture notes see table of contents.

Discord server: I’ll invite you to the course’s Discord server once the course starts. If you join late and do not have an invite, email me from your Penn email and I’ll add you.

Office Hours: Tuesdays 12:30-1:30 PM

Table of contents

High-level thoughts on the course

Overview

Optimization is the modeling language in which modern data science, machine learning, and sequential decision-making problems are formulated and solved numerically. This course will teach you how to formulate these problems mathematically, choose appropriate algorithms to solve them, and implement and tune the algorithms in PyTorch. Tentative topics include:

By the end of this course, you will become an intelligent consumer of numerical methods and software for solving modern optimization problems.

Pre-requisites

Deliverable: a final project due on week 3

Final Project Structure

    ITERATIVE DEVELOPMENT PROCESS                          PROJECT COMPONENTS
    ===========================                            ==================

    ┌─────────────────┐           ┌─────────────────┐    ┌────────────────────┐
    │  INITIAL SETUP  │           │  DELIVERABLES   │    │  PROJECT OPTIONS   │
    │  Teams: 3-4     ├───────────┤  • GitHub Repo  │    │ • Model Training   │
    │  Week 2 Start   │           │  • Colab Demo   │    │ • Reproducibility  │
    └───────┬─────────┘           │  • Final Paper  │    │ • Benchmarking     │
            │                     │  • Slide Deck   │    │ • Research Extend  │
            │                     └───────┬─────────┘    │ • ...              │
            │                             │              └────────────────────┘
            │                             ▼                         
            │                     ┌─────────────────┐    BIWEEKLY SCHEDULE
            ▼                     │    FEEDBACK     │    ════════════════
    ┌─────────────────┐           │ PEER REVIEWS:   │    Week 3:  Report
    │   IMPLEMENT     │◀───────── ┤ • Run Code      │    Week 4:  Slides Draft
    │ • Write Code    │           │ • Test Demo     │    Week 5:  Report
    │ • Test & Debug  ├─────────▶ │ • Give Feedback │    Week 6:  Slides Draft
    │ • Document      │           │                 │    Week 7:  Report
    └─────────────────┘           │ PROF MEETINGS:  │    Week 8:  ⚡LIGHTNING TALK⚡
                                  │ • Week 3 Scope  │    Week 9:  Report
                                  │ • Week 7 Mid    │    Week 10: Slides Draft
                                  │ • Week 11 Final │    Week 11: Report
                                  └─────────────────┘    Week 12: Slides Draft
                                                        Week 13: Final Report
    DEVELOPMENT WITH LLMs                               Week 14: Final Present
    • Write & review reports, documentation                         
    • Develop & test code (verify outputs!)                         
    • Regular commits with clear documentation

…which you then iterate on throughout the semester.

Why this approach?
Final projects often become a single rushed deliverable. We’ll break the project into regular drafts and feedback cycles so your team can iterate, improve, and build something more substantial and refined. You’ll have multiple checkpoints, each with opportunities for critique and revision. By the end, you’ll have a polished piece of work you can showcase—something worthy of your portfolio or internship applications.

Timeline and milestones

  1. First draft (Week 2)
    • Submit an initial write-up of your project idea and early exploration. The goal is to get feedback fast and clarify your scope.
  2. Regular drafts and critiques (every two weeks)
    • You will submit an updated draft of the report plus a brief critique of your previous version.
    • On alternating weeks, submit a draft of your final presentation slides. This keeps your written and visual materials aligned as the project evolves.
  3. Midterm presentation (midpoint of semester)
    • Prepare a lightning presentation of your project so far. You’ll get early audience feedback on your approach and results to date.
  4. Final presentation (end of semester)
    • Deliver a polished talk showcasing your findings, experiments, and lessons learned.

Deliverable format

  1. GitHub repository
    • Centralize all materials: your written report, code, and presentation slides.
    • Be sure to include a clear README with instructions for reproducing results.
  2. Executable demo
    • Provide a runnable demonstration in Google Colab. If your overall code is extensive, create a minimal Colab notebook that shows core functionality or key results.
  3. Written report
    • By default, structure it like a short conference-style paper (e.g., 8 pages + supplementary).
    • If you have a more creative format in mind, just run it by me first.

Feedback loop

Rules

By developing your projects in iterative steps, you will receive feedback multiple times and have a better chance of creating something valuable. This also help me shape the course content based on your areas of interest, ensuring that lectures and assignments align well with your goals.

Course project ideas

In the course project, you will apply the optimization skills you learned in the course. For example, your final project could include:

I will post more project ideas adapted to student interests in the course’s first weeks. Each project must include a Google Colab (or equivalent) walkthrough of your results and claims. I must approve the scope of the project.

Textbooks, readings, and software libraries

This course material will be self-contained outside of the prerequisites, but there will be no official textbook. Some material will draw on recent research published in academic conferences and journals. Some will draw on software libraries that are mature or in development. Below, I have instead listed several resources that we may draw – more will be added throughout the course:

Optimization

General background

  1. Beck, A. (2014). Introduction to nonlinear optimization: Theory, algorithms, and applications with MATLAB. Society for Industrial and Applied Mathematics.

  2. Nocedal, J., & Wright, S. J. (1999). Numerical optimization. Springer.

  3. Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.

  4. Hazan, E. (2016). Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4), 157-325.

  5. UCLA course ECE236C - Optimization Methods for Large-Scale Systems https://www.seas.ucla.edu/~vandenbe/ee236c.html

Numerical optimization in machine learning

  1. Google Research. (n.d.). Deep learning tuning playbook. https://github.com/google-research/tuning_playbook

  2. Google Research. (2023). Benchmarking neural network training algorithms and the ML commons library. https://arxiv.org/abs/2306.07179 https://github.com/mlcommons/algorithmic-efficiency/tree/main

  3. Moreau, T., et al. (2022). Benchopt: Reproducible, efficient and collaborative optimization benchmarks. https://github.com/benchopt/benchopt

  4. Schmidt, R., et al. (2021). Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers. https://proceedings.mlr.press/v139/schmidt21a

Software and tutorials

Convex optimization in Python: CVXPY

CVXPY: Convex optimization, for everyone. https://www.cvxpy.org/

PyTorch, Jax, and auto differentiation

  1. PyTorch (The full library): https://pytorch.org/

  2. Numerical implementation of optimizers in PyTorch: https://pytorch.org/docs/stable/optim.html#algorithms

  3. Jax library: https://github.com/google/jax

  4. Micrograd: A tiny educational “auto differentiation” library https://github.com/karpathy/micrograd

Scalable linear algebra

CoLA: a framework for scalable linear algebra that exploits structure often found in machine learning problems. https://github.com/wilson-labs/cola

Transformers and diffusion models

  1. MinGPT: A PyTorch re-implementation of GPT https://github.com/karpathy/minGPT

  2. The annotated diffusion model https://huggingface.co/blog/annotated-diffusion

  3. Repository containing resources and papers for diffusion models https://diff-usion.github.io/Awesome-Diffusion-Models/

Version control: git and github

GitHub Docs: Hello World https://docs.github.com/en/get-started/start-your-journey/hello-world

Training visualization

Weights and Biases https://wandb.ai/site

Puzzles!

Sasha Rush’s has luckily made a series of open source puzzles for helping you understand GPUs, automatic differentiation, and optimization.

  1. Tensor puzzles: https://github.com/srush/Tensor-Puzzles

  2. GPU Puzzles: https://github.com/srush/GPU-Puzzles

  3. Autodiff Puzzles: https://github.com/srush/Autodiff-Puzzles/

Brief historical perspective on optimization

It’s useful to appreciate how optimization evolved as an algorithmic discipline over the last seventy years:

In this course, we will appreciate both sides: solver-based approaches for classical, well-structured problems (via CVXPY) and more flexible, high-powered frameworks (via PyTorch) for data-driven, nonconvex tasks.