[ad_1]
We discover a use case that leverages the ability of MediaPipe for monitoring human poses in each 2D and 3D. What makes this exploration much more fascinating is the visualisation side powered by the open-source visualisation device Rerun, which supplies a holistic view of human poses in motion.
On this weblog publish, you’ll be guided to make use of MediaPipe to trace human poses in 2D and 3D, and discover the visualisation capabilities of Rerun.
Human pose monitoring is a process in pc imaginative and prescient that focuses on figuring out key physique places, analysing posture, and categorising actions. On the coronary heart of this know-how is a pre-trained machine-learning mannequin to evaluate the visible enter and recognise landmarks on the physique in each picture coordinates and 3D world coordinates. The use circumstances and functions of this know-how embody however aren’t restricted to Human-Laptop Interplay, Sports activities Evaluation, Gaming, Digital Actuality, Augmented Actuality, Well being, and many others.
It might be good to have an ideal mannequin, however sadly, the present fashions are nonetheless imperfect. Though datasets may have quite a lot of physique varieties, the human physique differs amongst people. The individuality of every particular person’s physique poses a problem, notably for these with non-standard arm and leg dimensions, which can lead to decrease accuracy when utilizing this know-how. When contemplating the combination of this know-how into methods, it’s essential to acknowledge the opportunity of inaccuracies. Hopefully, ongoing efforts inside the scientific group will pave the best way for the event of extra strong fashions.
Past lack of accuracy, moral and authorized issues emerge from utilising this know-how. As an example, capturing human physique poses in public areas may probably invade privateness rights if people haven’t given their consent. It’s essential to keep in mind any moral and authorized considerations earlier than implementing this know-how in real-world eventualities.
Start by putting in the required libraries:
# Set up the required Python packages
pip set up mediapipe
pip set up numpy
pip set up opencv-python<4.6
pip set up requests>=2.31,<3
pip set up rerun-sdk# or simply use the necessities file
pip set up -r examples/python/human_pose_tracking/necessities.txt
MediaPipe Python is a helpful device for builders seeking to combine on-device ML options for pc imaginative and prescient and machine studying.
Within the code beneath, MediaPipe pose landmark detection was utilised for detecting landmarks of human our bodies in a picture. This mannequin can detect physique pose landmarks as each picture coordinates and 3D world coordinates. After you have efficiently run the ML mannequin, you should utilize the picture coordinates and the 3D world coordinates to visualise the output.
import mediapipe as mp
import numpy as np
from typing import Any
import numpy.typing as npt
import cv2"""
Learn 2D landmark positions from Mediapipe Pose outcomes.
Args:
outcomes (Any): Mediapipe Pose outcomes.
image_width (int): Width of the enter picture.
image_height (int): Top of the enter picture.
Returns:
np.array | None: Array of 2D landmark positions or None if no landmarks are detected.
"""
def read_landmark_positions_2d(
outcomes: Any,
image_width: int,
image_height: int,
) -> npt.NDArray[np.float32] | None:
if outcomes.pose_landmarks is None:
return None
else:
# Extract normalized landmark positions and scale them to picture dimensions
normalized_landmarks = [results.pose_landmarks.landmark[lm] for lm in mp.options.pose.PoseLandmark]
return np.array([(image_width * lm.x, image_height * lm.y) for lm in normalized_landmarks])
"""
Learn 3D landmark positions from Mediapipe Pose outcomes.
Args:
outcomes (Any): Mediapipe Pose outcomes.
Returns:
np.array | None: Array of 3D landmark positions or None if no landmarks are detected.
"""
def read_landmark_positions_3d(
outcomes: Any,
) -> npt.NDArray[np.float32] | None:
if outcomes.pose_landmarks is None:
return None
else:
# Extract 3D landmark positions
landmarks = [results.pose_world_landmarks.landmark[lm] for lm in mp.options.pose.PoseLandmark]
return np.array([(lm.x, lm.y, lm.z) for lm in landmarks])
"""
Monitor and analyze pose from an enter picture.
Args:
image_path (str): Path to the enter picture.
"""
def track_pose(image_path: str) -> None:
# Learn the picture, convert shade to RGB
picture = cv2.imread(image_path)
picture = cv2.cvtColor(picture, cv2.COLOR_BGR2RGB)
# Create a Pose mannequin occasion
pose_detector = mp.options.pose.Pose(static_image_mode=True)
# Course of the picture to acquire pose landmarks
outcomes = pose_detector.course of(picture)
h, w, _ = picture.form
# Learn 2D and 3D landmark positions
landmark_positions_2d = read_landmark_positions_2d(outcomes, w, h)
landmark_positions_3d = read_landmark_positions_3d(outcomes)
Rerun serves as a visualisation device for multi-modal information. Via the Rerun Viewer, you may construct layouts, customise visualisations and work together together with your information. The remainder a part of this part particulars how one can log and current information utilizing the Rerun SDK to visualise it inside the Rerun Viewer
In each 2D and 3D factors, specifying connections between factors is crucial. Defining these connections mechanically renders traces between them. Utilizing the knowledge offered by MediaPipe, you may get the pose factors connections from the POSE_CONNECTIONS
set after which set them as keypoint connections utilizing Annotation Context.
rr.log(
"/",
rr.AnnotationContext(
rr.ClassDescription(
data=rr.AnnotationInfo(id=0, label="Individual"),
keypoint_annotations=[rr.AnnotationInfo(id=lm.value, label=lm.name) for lm in mp_pose.PoseLandmark],
keypoint_connections=mp_pose.POSE_CONNECTIONS,
)
),
timeless=True,
)
Picture Coordinates — 2D Positions
Visualising the physique pose landmarks on the video seems to be a sensible choice. To realize that, you must observe the rerun documentation for Entities and Elements. The Entity Path Hierarchy web page describes tips on how to log a number of Elements on the identical Entity. For instance, you may create the ‘video’ entity and embody the elements ‘video/rgb’ for the video and ‘video/pose’ for the physique pose. In case you’re aiming to make use of that for a video, you want the idea of Timelines. Every body will be related to the suitable information.
Here’s a operate that may visualise the 2D factors on the video:
def track_pose_2d(video_path: str) -> None:
mp_pose = mp.options.pose with closing(VideoSource(video_path)) as video_source, mp_pose.Pose() as pose:
for idx, bgr_frame in enumerate(video_source.stream_bgr()):
if max_frame_count just isn't None and idx >= max_frame_count:
break
rgb = cv2.cvtColor(bgr_frame.information, cv2.COLOR_BGR2RGB)
# Affiliate body with the information
rr.set_time_seconds("time", bgr_frame.time)
rr.set_time_sequence("frame_idx", bgr_frame.idx)
# Current the video
rr.log("video/rgb", rr.Picture(rgb).compress(jpeg_quality=75))
# Get the prediction outcomes
outcomes = pose.course of(rgb)
h, w, _ = rgb.form
# Log second factors to 'video' entity
landmark_positions_2d = read_landmark_positions_2d(outcomes, w, h)
if landmark_positions_2d just isn't None:
rr.log(
"video/pose/factors",
rr.Points2D(landmark_positions_2d, class_ids=0, keypoint_ids=mp_pose.PoseLandmark),
)
3D World Coordinates — 3D Factors
Why decide on 2D factors when you’ve gotten 3D Factors? Create a brand new entity, identify it “Individual”, and log the 3D factors. It’s executed! You simply created a 3D presentation of the human physique pose.
Right here is tips on how to do it:
def track_pose_3d(video_path: str, *, section: bool, max_frame_count: int | None) -> None:
mp_pose = mp.options.pose rr.log("particular person", rr.ViewCoordinates.RIGHT_HAND_Y_DOWN, timeless=True)
with closing(VideoSource(video_path)) as video_source, mp_pose.Pose() as pose:
for idx, bgr_frame in enumerate(video_source.stream_bgr()):
if max_frame_count just isn't None and idx >= max_frame_count:
break
rgb = cv2.cvtColor(bgr_frame.information, cv2.COLOR_BGR2RGB)
# Affiliate body with the information
rr.set_time_seconds("time", bgr_frame.time)
rr.set_time_sequence("frame_idx", bgr_frame.idx)
# Current the video
rr.log("video/rgb", rr.Picture(rgb).compress(jpeg_quality=75))
# Get the prediction outcomes
outcomes = pose.course of(rgb)
h, w, _ = rgb.form
# New entity "Individual" for the 3D presentation
landmark_positions_3d = read_landmark_positions_3d(outcomes)
if landmark_positions_3d just isn't None:
rr.log(
"particular person/pose/factors",
rr.Points3D(landmark_positions_3d, class_ids=0, keypoint_ids=mp_pose.PoseLandmark),
)
The tutorial focuses on the principle elements of the Human Pose Monitoring instance. For individuals who choose a hands-on method, the total supply code for this instance is out there on GitHub. Be happy to discover, modify, and perceive the inside workings of the implementation.
1. Compress the picture for effectivity
You possibly can increase the general process velocity by compressing the logged pictures:
rr.log(
"video",
rr.Picture(img).compress(jpeg_quality=75)
)
2. Restrict Reminiscence Use
In case you’re logging extra information than will be fitted into your RAM, it can begin dropping the previous information. The default restrict is 75% of your system RAM. If you wish to improve that you could possibly use the command line argument — memory-limit. Extra details about reminiscence limits will be discovered on Rerun’s How To Restrict Reminiscence Use web page.
3. Customise Visualisations in your wants
In case you discovered this text helpful and insightful, there’s extra!
Comparable articles:
I usually share tutorials on visualisation for pc imaginative and prescient and robotics. Observe me for future updates!
Additionally, you will discover me on LinkedIn.
[1] Pose Landmark Detection Information by Google, Parts of this web page are reproduced from work created and shared by Google and used based on phrases described within the Artistic Commons 4.0 Attribution License.
[2] Rerun Docs by Rerun below MIT license
[ad_2]