Skip to main content
Article
mediapipepythoncomputer-visionhand-trackingreal-timeopencvon-device-aigesture-recognition

Real-time Hand Tracking with MediaPipe and Python

Use MediaPipe's pre-trained models for on-device computer vision. This guide shows how to use its Python library to detect and visualize hand landmarks from a live webcam feed in just a few minutes.

beginner15 min5 steps
The play
  1. Install Dependencies
    Install the necessary Python libraries. `mediapipe` provides the core vision models, and `opencv-python` is used to capture and display video from your webcam.
  2. Initialize MediaPipe Hands
    Import the libraries and create an instance of the MediaPipe Hands solution. This object contains the pre-trained model for detecting hand landmarks and utilities for drawing them.
  3. Capture and Process Video Frames
    Use OpenCV to open your webcam. In a loop, read each frame, convert it from BGR to RGB (the format MediaPipe expects), and pass it to the `hands.process()` method for detection.
  4. Draw Landmarks on Detected Hands
    Check if any hands were detected in the frame. If so, loop through them and use MediaPipe's `drawing_utils` to draw the 21 hand landmarks and the connections between them onto the frame.
  5. Display the Output
    Use OpenCV's `imshow` to display the annotated video frame in a window. Add a key press condition (ESC key) to exit the loop and release resources gracefully.
Starter code
import cv2
import mediapipe as mp

# Initialize MediaPipe Hands and drawing utilities
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(static_image_mode=False,
                       max_num_hands=2,
                       min_detection_confidence=0.5,
                       min_tracking_confidence=0.5)
mp_drawing = mp.solutions.drawing_utils

# Start webcam capture
cap = cv2.VideoCapture(0)

print("Starting webcam feed. Press 'ESC' to exit.")

while cap.isOpened():
    success, image = cap.read()
    if not success:
        print("Ignoring empty camera frame.")
        continue

    # Flip the image horizontally for a selfie-view display, and convert the BGR image to RGB.
    image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB)
    
    # To improve performance, optionally mark the image as not writeable to pass by reference.
    image.flags.writeable = False
    results = hands.process(image)

    # Draw the hand annotations on the image.
    image.flags.writeable = True
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    if results.multi_hand_landmarks:
        for hand_landmarks in results.multi_hand_landmarks:
            mp_drawing.draw_landmarks(
                image, hand_landmarks, mp_hands.HAND_CONNECTIONS)
    
    # Display the resulting frame
    cv2.imshow('MediaPipe Hands', image)
    
    # Exit loop if 'ESC' is pressed
    if cv2.waitKey(5) & 0xFF == 27:
        break

# Release resources
cap.release()
cv2.destroyAllWindows()
hands.close()
print("Webcam feed closed.")
Real-time Hand Tracking with MediaPipe and Python — Action Pack