Estimating Heterogeneous Treatment Effects with UpliftBench

UpliftBench is an evaluation framework comparing meta-learners and causal forests for estimating Heterogeneous Treatment Effects (HTE) in marketing. This Action Pack guides you to implement and evaluate Conditional Average Treatment Effect (CATE) estimators, enabling more precise targeting and optimizing marketing campaign ROI.

intermediate1 hour5 steps

The play

Set Up Python Environment
Install the necessary Python libraries for data manipulation, machine learning, and causal inference. Key libraries include `numpy`, `pandas`, `scikit-learn`, `econml`, and `causalml`.
Prepare Your Marketing Data
Acquire and preprocess your marketing dataset. Ensure it includes a binary Treatment Indicator (T), the desired Outcome Variable (Y), and relevant Covariates (X). Handle missing values and encode categorical features appropriately.
Instantiate CATE Estimators
Select and initialize the Conditional Average Treatment Effect (CATE) estimators you wish to compare. Common choices, as highlighted by UpliftBench, include meta-learners like S-Learner and T-Learner, or Causal Forests.
Train Models and Predict CATEs
Train your chosen CATE estimators on your preprocessed data (Y, T, X) and then use them to predict the individual treatment effects for each observation.
Evaluate Estimator Performance
Assess the performance of the trained CATE estimators. While UpliftBench provides a framework, you can use metrics like Qini curves, uplift curves, or custom evaluation methods to compare their effectiveness in identifying high-potential customers for treatment.

Starter code

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from econml.metalearners import SLearner

# Generate synthetic data for demonstration
np.random.seed(42)
n_samples = 1000
X = np.random.rand(n_samples, 3) # Covariates
T = np.random.randint(0, 2, n_samples) # Treatment indicator (0=control, 1=treated)

# Outcome: Treatment effect depends on X[:, 0]
Y = (1 + 0.5 * X[:, 0] * T + np.random.normal(0, 0.5, n_samples)) 

# Initialize an S-Learner with a base model (e.g., Logistic Regression for binary outcome)
s_learner = SLearner(overall_model=LogisticRegression(solver='liblinear', random_state=42))

# Fit the S-Learner model
s_learner.fit(Y, T, X=X)

# Predict Conditional Average Treatment Effects (CATEs)
cate_estimates = s_learner.effect(X)

print("First 10 CATE estimates:\n", cate_estimates[:10])
print("Mean CATE estimate:\n", np.mean(cate_estimates))