Table of Contents

0. Introduction | Slides | Notebook

Course content, a deliverable, and spam classification in PyTorch.

1. Basic linear Algebra in PyTorch | Slides | Notebook | Live Demo

Basic linear algebra operations in PyTorch.

2. Linear Regression: Direct Methods | Slides | Notebook

Direct methods for solving least squares problems, comparing LU and QR factorization.

3. Linear Regression: Gradient Descent | Slides | Notebook

Linear regression via gradient descent.

4. How to compute gradients in PyTorch | Slides | Notebook

Introduction to PyTorch’s automatic differentiation system.

5. A step-by-step introduction to transformer models

Building transformers from scratch: embeddings, attention, residual connections, and next-token prediction on Shakespeare.

6. A step-by-step introduction to diffusion models

Diffusion models from first principles: forward process, reverse process, noise prediction, U-Net, sampling, DDIM, conditional generation, and FID.

7. Stochastic gradient descent: insights from the Noisy Quadratic Model

When should we use exponential moving averages, momentum, and preconditioning?

8. Stochastic Gradient Descent: The general problem and implementation details | Notebook

Stochastic optimization problems, SGD, tweaks, and implementation in PyTorch

9. Adaptive Optimization Methods | Notebook | Cheatsheet

Intro to adaptive optimization methods: Adagrad, Adam, and AdamW.

10. Benchmarking Optimizers: Challenges and Some Empirical Results | Cheatsheet

How do we compare optimizers for deep learning?

11. A Playbook for Tuning Deep Learning Models | Cheatsheet

A systematic process for tuning deep learning models

12. Scaling Transformers: Parallelism Strategies from the Ultrascale Playbook | Cheatsheet

How do we scale training of transformers to 100s of billions of parameters?

Recap | Cheatsheet

A recap of the course.