Article
mlflowmlopsexperiment-trackingmodel-registrypythonscikit-learnreproducibility
Track Your First ML Experiment with MLflow
Use MLflow to log parameters, metrics, and models from a machine learning experiment. This action pack shows how to instrument a Python script and view the results in the MLflow UI, making your work reproducible and comparable.
beginner15 min5 steps
The play
- Install MLflowInstall MLflow and scikit-learn using pip. MLflow is a Python library, and we'll use scikit-learn to train a simple model for demonstration.
- Create a Training ScriptWrite a basic Python script to train a model. We'll use a Logistic Regression model on the Iris dataset. This script forms the foundation of the experiment we will track.
- Log Parameters and MetricsUse `mlflow.start_run()` to begin tracking. Inside this block, log hyperparameters like regularization strength with `mlflow.log_param()` and performance results like accuracy with `mlflow.log_metric()`.
- Log the Model as an ArtifactWithin the same `mlflow.start_run()` block, save the trained model using `mlflow.sklearn.log_model()`. This packages the model with its dependencies, making it easy to reload and deploy later.
- Launch the MLflow UIAfter your script runs, it creates an `mlruns` directory. Launch the MLflow Tracking UI from your terminal to inspect and compare your runs. Navigate to http://127.0.0.1:5000 in your browser.
Starter code
import mlflow
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Set an experiment name
mlflow.set_experiment("Iris Classification")
# Load data
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Start an MLflow run
with mlflow.start_run():
# Define hyperparameters
C = 0.1
solver = 'liblinear'
# Log parameters
mlflow.log_param("regularization_strength_C", C)
mlflow.log_param("solver", solver)
# Train the model
model = LogisticRegression(C=C, solver=solver, max_iter=200)
model.fit(X_train, y_train)
# Make predictions and evaluate
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
# Log metrics
mlflow.log_metric("accuracy", accuracy)
# Log the model artifact
mlflow.sklearn.log_model(model, "iris_logistic_regression")
run_id = mlflow.active_run().info.run_id
print(f"Run completed. Run ID: {run_id}")
print(f"Accuracy: {accuracy}")
print("\nTo view the run, execute 'mlflow ui' in your terminal and navigate to http://127.0.0.1:5000")