Article
automlmachine-learningmlopspythonapipredictive-modelingenterprise-aidata-science
Automate Model Building with the DataRobot AI Agent
Use the DataRobot AI Agent to automatically build, train, and evaluate machine learning models. This guide uses the Python client to upload a dataset, start an Autopilot project, and retrieve the best model without manual intervention.
beginner1 hour5 steps
The play
- Connect to the DataRobot APIInstall the Python client and configure your credentials. You need your API token and endpoint URL from your DataRobot account (usually under your profile's 'Developer Tools'). Set them as environment variables for security.
- Create a Project and Upload DataA DataRobot project is a workspace for a specific modeling problem. Create one by providing a dataset. We'll use a public dataset URL to make it easy. DataRobot will automatically analyze the features.
- Launch AutopilotTrigger the DataRobot AI Agent's core AutoML process, Autopilot. Specify the target variable you want to predict. We'll use 'quick' mode for faster results. This automates feature engineering, model selection, and training.
- Get the Best ModelAfter Autopilot completes, DataRobot ranks all trained models on a leaderboard. You can programmatically retrieve the model that DataRobot recommends for deployment.
- Make PredictionsUse your top model to make predictions on new data. You can upload a dataset and get predictions back. This demonstrates the model's practical use.
Starter code
import datarobot as dr
import os
import time
# Ensure DATAROBOT_API_TOKEN and DATAROBOT_ENDPOINT are set as environment variables
# Example Endpoint: https://app.datarobot.com/api/v2
# 1. Connect to DataRobot
print("Connecting to DataRobot...")
dr.Client()
# 2. Create a project from a public URL
project_name = f"Diabetes_Prediction_{int(time.time())}"
data_url = 'https://s3.amazonaws.com/datarobot_public_datasets/10k_diabetes.csv'
print(f"Creating project '{project_name}'...")
project = dr.Project.create(project_name=project_name, sourcedata=data_url)
print(f"Project ID: {project.id}")
# 3. Launch Autopilot
target_variable = 'readmitted'
print(f"Setting target to '{target_variable}' and starting Autopilot in QUICK mode...")
project.set_target(
target=target_variable,
mode=dr.AUTOPILOT_MODE.QUICK,
worker_count=-1 # Use max workers
)
print("Waiting for Autopilot to complete. This may take 20-40 minutes...")
project.wait_for_autopilot()
print("Autopilot finished!")
# 4. Get the best model
print("Fetching the recommended model from the leaderboard...")
# Use get_models() and sort to find the one recommended for deployment
models = project.get_models()
recommended_model = next((m for m in models if m.id == project.recommended_model().id), None)
if recommended_model:
print(f"\n--- Recommended Model --- ")
print(f"Model Type: {recommended_model.model_type}")
print(f"Blueprint ID: {recommended_model.blueprint_id}")
# Fetch validation score from the specific metric DataRobot optimized for
optimization_metric = project.metric
print(f"Validation Score ({optimization_metric}): {recommended_model.metrics[optimization_metric]['validation']}")
else:
print("Could not retrieve the recommended model.")