Article
serverlessmodalmlopsdeploymentpythonapi-deploymenthugging-facesentiment-analysis
Deploy an ML Model as a Serverless API with Modal
Use the Modal Python client to package a pre-trained Hugging Face model into a scalable, serverless API endpoint. This action pack shows how to define dependencies, load a model efficiently, and create a web endpoint for inference.
beginner15 min5 steps
The play
- Install and Authenticate ModalFirst, install the Modal client library using pip. Then, create an authentication token to connect your local environment to the Modal service. This is a one-time setup.
- Define App and DependenciesCreate a Python file (e.g., `deploy_model.py`). Define a Modal App and specify the Python libraries needed for your model, like `torch` and `transformers`. Modal builds a container image with these dependencies.
- Load the Model on StartupTo avoid cold starts, load the model once when the container starts. Use a class and the `@enter` decorator. This function downloads and caches the model, making it ready for inference requests.
- Create the Inference EndpointAdd a method to your class decorated with `@web_endpoint`. This exposes the method as a public API. The function will take text input, run it through the loaded model, and return the prediction as JSON.
- Deploy and Test the APIDeploy the application from your terminal. Modal will provide a public URL for your new API endpoint. You can then test it using `curl` or any HTTP client.
Starter code
import modal
# Define the container image with necessary Python libraries
image = modal.Image.debian_slim().pip_install("transformers", "torch")
# Create a Modal App. The name is used for deployment.
app = modal.App("sentiment-analysis-api", image=image)
@app.cls()
class SentimentModel:
"""
A class that loads a sentiment analysis model and provides a method for inference.
"""
@modal.enter()
def load_model(self):
"""
This method is run once when the container starts up.
It downloads the model from Hugging Face and initializes the pipeline.
"""
from transformers import pipeline
print("\nDownloading and loading sentiment analysis model...\n")
# Using a default sentiment analysis model from Hugging Face
self.sentiment_pipeline = pipeline("sentiment-analysis")
print("\nModel loaded successfully!\n")
@modal.web_endpoint(method="POST")
def predict(self, data: dict):
"""
This method is exposed as a web endpoint. It takes a JSON object
with a 'text' field and returns the model's sentiment prediction.
"""
if "text" not in data:
# Return a 400 Bad Request if the 'text' key is missing
return {"error": "'text' field not found in request body"}, 400
text_to_analyze = data["text"]
print(f"Performing inference for: '{text_to_analyze}'")
# Run inference using the pre-loaded pipeline
result = self.sentiment_pipeline(text_to_analyze)
# The result from the pipeline is a list of dictionaries
return {"prediction": result[0]}
# To deploy this API, save the code as a Python file (e.g., deploy_model.py)
# and run the following command in your terminal:
# modal deploy deploy_model.py
#
# To test the deployed endpoint, use a tool like curl:
# curl -X POST -H "Content-Type: application/json" \
# -d '{"text": "Modal is an amazing tool for deploying models."}' \
# https://your-modal-url.modal.run