Run Real-Time Object Detection with YOLOv8 and Python

Bootstrap a real-time object detection script using your webcam. This guide uses the Ultralytics YOLOv8 model to identify objects in video frames, draw bounding boxes, and print structured detection data to the console.

beginner15 min5 steps

The play

Install Dependencies
Set up your Python environment by installing the core `ultralytics` library, which includes YOLOv8, and `opencv-python` for handling video streams from your webcam or video files.
Load a Pre-trained Model
Instantiate a YOLO model. The `ultralytics` library automatically downloads the specified pre-trained model weights the first time you use them. We'll use 'yolov8n.pt', a small and fast model ideal for real-time applications.
Run Detection on a Single Image
Perform a test inference on a single image to verify your setup. The `predict` method runs the detection and returns a list of results. We'll save the annotated image to a file to see the output.
Process Detection Results
The power of the Object Detection Setup is accessing the structured data. Iterate through the `boxes` attribute of a result to get coordinates (xyxy), confidence scores, and class IDs for each detected object.
Launch Real-Time Webcam Feed
Use OpenCV to capture frames from your webcam, pass each frame to the YOLO model for inference, and display the annotated video stream in real-time. The starter code below provides a complete implementation of this.

Starter code

import cv2
from ultralytics import YOLO
import json

# Load the YOLOv8 model
model = YOLO('yolov8n.pt')

# Open the webcam
cap = cv2.VideoCapture(0)

# Check if the webcam is opened correctly
if not cap.isOpened():
    raise IOError("Cannot open webcam")

print("Webcam opened. Press 'q' to quit.")

while True:
    # Read a frame from the webcam
    ret, frame = cap.read()
    if not ret:
        break

    # Run YOLOv8 inference on the frame
    results = model(frame)

    # --- Process Detections for JSON Output ---
    detections = []
    for r in results:
        for box in r.boxes:
            detection = {
                'class_id': int(box.cls.item()),
                'class_name': model.names[int(box.cls.item())],
                'confidence': float(box.conf.item()),
                'box': [float(coord) for coord in box.xyxy[0].tolist()]
            }
            detections.append(detection)
    
    # Print detections as a JSON string for the current frame
    if detections:
        print(json.dumps(detections, indent=2))

    # Visualize the results on the frame
    annotated_frame = results[0].plot()

    # Display the annotated frame
    cv2.imshow('YOLOv8 Real-Time Detection', annotated_frame)

    # Break the loop if 'q' is pressed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release the webcam and close the display window
cap.release()
cv2.destroyAllWindows()
print("Webcam feed closed.")