Article
mediapipepythoncomputer-visionhand-trackingreal-timeopencvon-device-aigesture-recognition
Real-time Hand Tracking with MediaPipe and Python
Use MediaPipe's pre-trained models for on-device computer vision. This guide shows how to use its Python library to detect and visualize hand landmarks from a live webcam feed in just a few minutes.
beginner15 min5 steps
The play
- Install DependenciesInstall the necessary Python libraries. `mediapipe` provides the core vision models, and `opencv-python` is used to capture and display video from your webcam.
- Initialize MediaPipe HandsImport the libraries and create an instance of the MediaPipe Hands solution. This object contains the pre-trained model for detecting hand landmarks and utilities for drawing them.
- Capture and Process Video FramesUse OpenCV to open your webcam. In a loop, read each frame, convert it from BGR to RGB (the format MediaPipe expects), and pass it to the `hands.process()` method for detection.
- Draw Landmarks on Detected HandsCheck if any hands were detected in the frame. If so, loop through them and use MediaPipe's `drawing_utils` to draw the 21 hand landmarks and the connections between them onto the frame.
- Display the OutputUse OpenCV's `imshow` to display the annotated video frame in a window. Add a key press condition (ESC key) to exit the loop and release resources gracefully.
Starter code
import cv2
import mediapipe as mp
# Initialize MediaPipe Hands and drawing utilities
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(static_image_mode=False,
max_num_hands=2,
min_detection_confidence=0.5,
min_tracking_confidence=0.5)
mp_drawing = mp.solutions.drawing_utils
# Start webcam capture
cap = cv2.VideoCapture(0)
print("Starting webcam feed. Press 'ESC' to exit.")
while cap.isOpened():
success, image = cap.read()
if not success:
print("Ignoring empty camera frame.")
continue
# Flip the image horizontally for a selfie-view display, and convert the BGR image to RGB.
image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB)
# To improve performance, optionally mark the image as not writeable to pass by reference.
image.flags.writeable = False
results = hands.process(image)
# Draw the hand annotations on the image.
image.flags.writeable = True
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
if results.multi_hand_landmarks:
for hand_landmarks in results.multi_hand_landmarks:
mp_drawing.draw_landmarks(
image, hand_landmarks, mp_hands.HAND_CONNECTIONS)
# Display the resulting frame
cv2.imshow('MediaPipe Hands', image)
# Exit loop if 'ESC' is pressed
if cv2.waitKey(5) & 0xFF == 27:
break
# Release resources
cap.release()
cv2.destroyAllWindows()
hands.close()
print("Webcam feed closed.")