Article

image-classificationcomputer-visionpytorchhugging-facetransformersclipythonbatch-inference

Batch Classify Images with a PyTorch Script

Run end-to-end image classification from your terminal. This Hugging Face Transformers script handles data loading, preprocessing, and inference for any ViT or ResNet model. Export results, including top-k predictions and scores, to a CSV file.

intermediate30 min5 steps

The play

Set Up the Environment
Clone the Hugging Face Transformers repository and install the specific dependencies for the image classification example. This provides the script and its required libraries, including PyTorch and torchvision.
Prepare Your Image Folder
Create a directory and place all the images you want to classify inside it. The Image Classification Pipeline script will process every image file found in this folder.
Run Basic Inference
Execute the script, pointing it to your image folder and an output directory. It will download the specified model (defaulting to a Vision Transformer), classify the images, and save the results.
Customize Predictions
Refine the output by specifying the number of top predictions (`--top_k`) and a minimum confidence score (`--confidence_threshold`). This helps filter out irrelevant or low-confidence results from your output file.
Review CSV Output
The script automatically saves a `predictions.csv` file in your specified output directory. This file contains the image path, predicted label, and confidence score for each prediction that meets your criteria.

Starter code

#!/bin/bash

# This script demonstrates a full run of the Image Classification Pipeline.
# It clones the repo, installs dependencies, downloads a sample image, 
# runs inference, and displays the resulting CSV.

# 1. Clone repository and navigate into it
echo "Cloning Hugging Face Transformers repository..."
git clone --depth 1 https://github.com/huggingface/transformers.git
cd transformers

# 2. Install dependencies
echo "Installing required Python packages..."
pip install -q -r examples/pytorch/image-classification/requirements.txt

# 3. Prepare a sample image
echo "Creating image directory and downloading a sample image..."
mkdir -p sample_data
curl -s -o sample_data/cat.jpg https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg

# 4. Run the Image Classification Pipeline script
echo "Running inference on the sample image..."
python examples/pytorch/image-classification/run_image_classification.py \
    --model_name_or_path google/vit-base-patch16-224 \
    --image_folder ./sample_data \
    --output_dir ./classification_results > /dev/null 2>&1

# 5. Display the results
echo "\nInference complete. Results are saved in ./classification_results/predictions.csv"
echo "--- Contents of predictions.csv ---"
cat ./classification_results/predictions.csv