Article
a-b-testingmodel-evaluationmlopsevidentlypythondata-driftstatistical-testingshadow-mode
A/B Test Your ML Models with Evidently
Compare two machine learning model versions side-by-side using Evidently. This guide shows how to run statistical tests on logged prediction data to evaluate performance differences and detect data drift between a reference (A) and a new (B) model.
beginner15 min4 steps
The play
- Prepare Your Environment and DataInstall the necessary libraries. For a Model A/B Testing scenario, you need two datasets: `reference_data` from your control model (A) and `current_data` from your challenger model (B). These are typically logged from a live prediction service.
- Define the A/B Test SuiteImport `TestSuite` and a relevant test preset from Evidently. The `DataDriftTestPreset` is ideal for A/B testing as it compares the distributions of all columns between the two datasets, highlighting statistically significant changes in inputs or predictions.
- Run the Comparison and Generate ReportExecute the test suite using your two dataframes. Evidently will perform the statistical comparisons and bundle the results. Save the output as a self-contained HTML file for easy analysis.
- Analyze the A/B Test ResultsOpen the generated `model_ab_test_report.html` in your browser. The report provides a dashboard summarizing the test outcomes. Look for failed tests (marked in red) to identify features or predictions with statistically significant drift, which can inform your decision to promote the new model.
Starter code
import pandas as pd
from sklearn import datasets
from evidently.test_suite import TestSuite
from evidently.test_preset import DataDriftTestPreset
# --- 1. Load & Prepare Data ---
# In a real scenario, this data would come from your model logging system.
# Here, we'll simulate it using the Iris dataset.
iris_data = datasets.load_iris(as_frame=True)
iris_frame = iris_data.frame
# Model A's data (Reference)
reference_df = iris_frame.iloc[:75]
# Model B's data (Current) - we'll introduce a slight drift for demonstration
current_df = iris_frame.iloc[75:]
# Simulate a change in the distribution for one feature
current_df['sepal length (cm)'] = current_df['sepal length (cm)'] + 0.5
print("Prepared reference and current dataframes for A/B test analysis.")
# --- 2. Define and Run Test Suite ---
# The DataDriftTestPreset is perfect for A/B testing as it compares feature
# and prediction distributions between two datasets.
ab_test_suite = TestSuite(tests=[
DataDriftTestPreset(),
])
ab_test_suite.run(current_data=current_df, reference_data=reference_df, column_mapping=None)
# --- 3. Save the Report ---
ab_test_suite.save_html("model_ab_test_report.html")
print("\nModel A/B Test report generated! Open 'model_ab_test_report.html' in your browser.")