Skip to main content
Paper·arxiv.org
machine-learningresearchevaluationfine-tuningdata-pipelines

Benchmarking Optimizers for MLPs in Tabular Deep Learning

Many deep learning models for tabular data default to AdamW, but this isn't always optimal. This Action Pack guides you to systematically benchmark different optimizers for Multi-Layer Perceptrons (MLPs) to improve model performance and training efficiency.

intermediate1-2 hours6 steps
The play
  1. Establish Your Baseline
    Set up your MLP model for tabular data and train it using your current default optimizer (e.g., AdamW). Record its performance metrics like convergence speed, final accuracy/loss, and training time. This will be your reference point.
  2. Select Alternative Optimizers
    Choose 2-4 alternative optimizers to test. Consider a mix of adaptive (e.g., RMSprop, Adagrad, Adam, SGD with Nesterov momentum) and non-adaptive optimizers. Research newer optimizers like Lookahead or Ranger if your framework supports them.
  3. Configure Benchmarking Runs
    Create a loop or script to train your MLP with each selected optimizer. Ensure hyperparameters (e.g., learning rate, weight decay) are tuned appropriately for each optimizer, as optimal values can vary significantly. Keep other training parameters (epochs, batch size) consistent.
  4. Track & Collect Performance Data
    During each training run, log key metrics: training loss, validation loss, accuracy (or relevant metric for your task), and the time taken per epoch/total training time. Use tools like TensorBoard or MLflow for easier tracking and visualization.
  5. Analyze and Compare Results
    Review the collected data. Compare the optimizers based on their achieved performance (e.g., highest accuracy, lowest loss), convergence speed, and resource efficiency. Identify optimizers that outperform your baseline.
  6. Integrate Superior Optimizer
    Once a superior optimizer is identified, integrate it into your main deep learning pipeline for tabular data. Consider further fine-tuning its hyperparameters for maximum benefit.
Starter code
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import numpy as np

# 1. Generate synthetic tabular data
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_redundant=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.long)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.long)

# 2. Define a simple MLP model
class SimpleMLP(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(SimpleMLP, self).__init__()
        self.layers = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, output_dim)
        )

    def forward(self, x):
        return self.layers(x)

input_dim = X_train.shape[1]
hidden_dim = 64
output_dim = len(np.unique(y))
model = SimpleMLP(input_dim, hidden_dim, output_dim)

# 3. Choose your optimizer here. Uncomment one at a time for benchmarking.
# optimizer = optim.AdamW(model.parameters(), lr=0.001)
# optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
optimizer = optim.RMSprop(model.parameters(), lr=0.001)

criterion = nn.CrossEntropyLoss()

# 4. Training loop (simplified)
num_epochs = 50
for epoch in range(num_epochs):
    model.train()
    optimizer.zero_grad()
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)
    loss.backward()
    optimizer.step()

    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# 5. Evaluate (simplified)
model.eval()
with torch.no_grad():
    outputs = model(X_test_tensor)
    _, predicted = torch.max(outputs.data, 1)
    accuracy = (predicted == y_test_tensor).sum().item() / y_test_tensor.size(0)
    print(f'Test Accuracy: {accuracy:.4f}')
Source
Benchmarking Optimizers for MLPs in Tabular Deep Learning — Action Pack