Evolution of Optimization Methods: Algorithms, Scenarios, and Evaluations

Master the core trade-offs in deep learning optimization: convergence speed, generalization, and computational efficiency. This pack guides you through understanding and applying foundational methods like SGD and Adam to improve your model training.

intermediate45 min4 steps

The play

Identify Optimization Trade-offs
Understand the three critical goals in deep learning optimization: convergence speed (how fast), generalization capability (performance on unseen data), and computational efficiency (resource usage). Recognize that these goals are often in tension, requiring careful balance.
Configure Stochastic Gradient Descent (SGD)
Implement SGD, the foundational optimizer that updates parameters using mini-batch gradients. Focus on tuning its primary hyperparameter, `learning_rate`, and consider adding `momentum` to smooth updates and accelerate convergence. Understand its simplicity and good generalization potential, despite slower convergence.
Configure Adam Optimizer
Implement Adam, a widely used adaptive optimizer. Utilize its adaptive learning rates for each parameter, which generally leads to faster convergence, especially early in training. Pay attention to `learning_rate` and `betas` hyperparameters. Note its potential for sharper minima, which can sometimes impact generalization.
Evaluate Optimizer Performance
Apply both SGD and Adam to a simple deep learning task (e.g., training a small neural network on a dataset like MNIST or CIFAR-10). Compare their training curves (loss over epochs) and validation accuracy to observe differences in convergence speed and generalization. Experiment with their respective hyperparameters.

Starter code

import torch
import torch.nn as nn
import torch.optim as optim

# 1. Define a simple model
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc = nn.Linear(10, 1) # 10 input features, 1 output

    def forward(self, x):
        return self.fc(x)

model = SimpleNN()

# 2. Define a loss function
criterion = nn.MSELoss()

# 3. Instantiate SGD Optimizer
sgd_optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
print(f"\nSGD Optimizer initialized: {sgd_optimizer}")

# 4. Instantiate Adam Optimizer
adam_optimizer = optim.Adam(model.parameters(), lr=0.001, betas=(0.9, 0.999), eps=1e-08)
print(f"\nAdam Optimizer initialized: {adam_optimizer}")

# Example of a dummy training step (conceptual)
# dummy_input = torch.randn(1, 10)
# dummy_target = torch.randn(1, 1)
# output = model(dummy_input)
# loss = criterion(output, dummy_target)

# For SGD:
# sgd_optimizer.zero_grad()
# loss.backward()
# sgd_optimizer.step()

# For Adam:
# adam_optimizer.zero_grad()
# loss.backward()
# adam_optimizer.step()

print("\nOptimizers are ready to be used in a training loop.")