Article

a-b-testingmodel-evaluationmlopsevidentlypythondata-driftstatistical-testingshadow-mode

A/B Test Your ML Models with Evidently

Compare two machine learning model versions side-by-side using Evidently. This guide shows how to run statistical tests on logged prediction data to evaluate performance differences and detect data drift between a reference (A) and a new (B) model.

beginner15 min4 steps

The play

Prepare Your Environment and Data
Install the necessary libraries. For a Model A/B Testing scenario, you need two datasets: `reference_data` from your control model (A) and `current_data` from your challenger model (B). These are typically logged from a live prediction service.
Define the A/B Test Suite
Import `TestSuite` and a relevant test preset from Evidently. The `DataDriftTestPreset` is ideal for A/B testing as it compares the distributions of all columns between the two datasets, highlighting statistically significant changes in inputs or predictions.
Run the Comparison and Generate Report
Execute the test suite using your two dataframes. Evidently will perform the statistical comparisons and bundle the results. Save the output as a self-contained HTML file for easy analysis.
Analyze the A/B Test Results
Open the generated `model_ab_test_report.html` in your browser. The report provides a dashboard summarizing the test outcomes. Look for failed tests (marked in red) to identify features or predictions with statistically significant drift, which can inform your decision to promote the new model.

Starter code

import pandas as pd
from sklearn import datasets

from evidently.test_suite import TestSuite
from evidently.test_preset import DataDriftTestPreset

# --- 1. Load & Prepare Data ---
# In a real scenario, this data would come from your model logging system.
# Here, we'll simulate it using the Iris dataset.
iris_data = datasets.load_iris(as_frame=True)
iris_frame = iris_data.frame

# Model A's data (Reference)
reference_df = iris_frame.iloc[:75]

# Model B's data (Current) - we'll introduce a slight drift for demonstration
current_df = iris_frame.iloc[75:]
# Simulate a change in the distribution for one feature
current_df['sepal length (cm)'] = current_df['sepal length (cm)'] + 0.5

print("Prepared reference and current dataframes for A/B test analysis.")

# --- 2. Define and Run Test Suite ---
# The DataDriftTestPreset is perfect for A/B testing as it compares feature
# and prediction distributions between two datasets.
ab_test_suite = TestSuite(tests=[
    DataDriftTestPreset(),
])

ab_test_suite.run(current_data=current_df, reference_data=reference_df, column_mapping=None)

# --- 3. Save the Report ---
ab_test_suite.save_html("model_ab_test_report.html")

print("\nModel A/B Test report generated! Open 'model_ab_test_report.html' in your browser.")