|
| 1 | +# MediaPipe Handpose |
| 2 | + |
| 3 | +MediaPipe Handpose is a lightweight ML pipeline consisting of two models: A palm detector and a hand-skeleton finger tracking model. It predicts 21 2D hand keypoints per detected hand. For more details, please read our Google AI [blogpost](https://ai.googleblog.com/2019/08/on-device-real-time-hand-tracking-with.html). |
| 4 | + |
| 5 | +<img src="demo/demo.gif" alt="demo" style="width:640px" /> |
| 6 | + |
| 7 | +Given an input, the model predicts whether it contains a hand. If so, the model returns coordinates for the bounding box around the hand, as well as 21 keypoints within the hand, outlining the location of each finger joint and the palm. |
| 8 | + |
| 9 | +More background information about the model, as well as its performance characteristics on different datasets, can be found here: [https://drive.google.com/file/d/1sv4sSb9BSNVZhLzxXJ0jBv9DqD-4jnAz/view](https://drive.google.com/file/d/1sv4sSb9BSNVZhLzxXJ0jBv9DqD-4jnAz/view) |
| 10 | + |
| 11 | +Check out our [demo](https://storage.googleapis.com/tfjs-models/demos/handpose/index.html), which uses the model to detect hand landmarks in a live video stream. |
| 12 | + |
| 13 | +This model is also available as part of [MediaPipe](https://hand.mediapipe.dev/), a framework for building multimodal applied ML pipelines. |
| 14 | + |
| 15 | +# Performance |
| 16 | + |
| 17 | +MediaPipe Handpose consists of ~12MB of weights, and is well-suited for real time inference across a variety of devices (40 FPS on a 2018 MacBook Pro, 35 FPS on an iPhone11, 6 FPS on a Pixel3). |
| 18 | + |
| 19 | +## Installation |
| 20 | + |
| 21 | +Using `yarn`: |
| 22 | + |
| 23 | + $ yarn add @tensorflow-models/handpose |
| 24 | + |
| 25 | +Using `npm`: |
| 26 | + |
| 27 | + $ npm install @tensorflow-models/handpose |
| 28 | + |
| 29 | +Note that this package specifies `@tensorflow/tfjs-core` and `@tensorflow/tfjs-converter` as peer dependencies, so they will also need to be installed. |
| 30 | + |
| 31 | +## Usage |
| 32 | + |
| 33 | +To import in npm: |
| 34 | + |
| 35 | +```js |
| 36 | +import * as handpose from '@tensorflow-models/handpose'; |
| 37 | +``` |
| 38 | + |
| 39 | +or as a standalone script tag: |
| 40 | + |
| 41 | +```html |
| 42 | +<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-core"></script> |
| 43 | +<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-converter"></script> |
| 44 | +<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/handpose"></script> |
| 45 | +``` |
| 46 | + |
| 47 | +Then: |
| 48 | + |
| 49 | +```js |
| 50 | +async function main() { |
| 51 | + // Load the MediaPipe handpose model. |
| 52 | + const model = await handpose.load(); |
| 53 | + // Pass in a video stream (or an image, canvas, or 3D tensor) to obtain a |
| 54 | + // hand prediction from the MediaPipe graph. |
| 55 | + const predictions = await model.estimateHands(document.querySelector("video")); |
| 56 | + if (predictions.length > 0) { |
| 57 | + /* |
| 58 | + `predictions` is an array of objects describing each detected hand, for example: |
| 59 | + [ |
| 60 | + { |
| 61 | + handInViewConfidence: 1, // The probability of a hand being present. |
| 62 | + boundingBox: { // The bounding box surrounding the hand. |
| 63 | + topLeft: [162.91, -17.42], |
| 64 | + bottomRight: [548.56, 368.23], |
| 65 | + }, |
| 66 | + landmarks: [ // The 3D coordinates of each hand landmark. |
| 67 | + [472.52, 298.59, 0.00], |
| 68 | + [412.80, 315.64, -6.18], |
| 69 | + ... |
| 70 | + ], |
| 71 | + annotations: { // Semantic groupings of the `landmarks` coordinates. |
| 72 | + thumb: [ |
| 73 | + [412.80, 315.64, -6.18] |
| 74 | + [350.02, 298.38, -7.14], |
| 75 | + ... |
| 76 | + ], |
| 77 | + ... |
| 78 | + } |
| 79 | + } |
| 80 | + ] |
| 81 | + */ |
| 82 | + |
| 83 | + for (let i = 0; i < predictions.length; i++) { |
| 84 | + const keypoints = predictions[i].landmarks; |
| 85 | + |
| 86 | + // Log hand keypoints. |
| 87 | + for (let i = 0; i < keypoints.length; i++) { |
| 88 | + const [x, y, z] = keypoints[i]; |
| 89 | + console.log(`Keypoint ${i}: [${x}, ${y}, ${z}]`); |
| 90 | + } |
| 91 | + } |
| 92 | + } |
| 93 | +} |
| 94 | +main(); |
| 95 | +``` |
| 96 | + |
| 97 | +#### Parameters for handpose.load() |
| 98 | + |
| 99 | +`handpose.load()` takes a configuration object with the following properties: |
| 100 | + |
| 101 | +* **maxContinuousChecks** - How many frames to go without running the bounding box detector. Defaults to infinity. Set to a lower value if you want a safety net in case the mesh detector produces consistently flawed predictions. |
| 102 | + |
| 103 | +* **detectionConfidence** - Threshold for discarding a prediction. Defaults to 0.8. |
| 104 | + |
| 105 | +* **iouThreshold** - A float representing the threshold for deciding whether boxes overlap too much in non-maximum suppression. Must be between [0, 1]. Defaults to 0.3. |
| 106 | + |
| 107 | +* **scoreThreshold** - A threshold for deciding when to remove boxes based on score in non-maximum suppression. Defaults to 0.75. |
| 108 | + |
| 109 | +#### Parameters for handpose.estimateHands() |
| 110 | + |
| 111 | +* **input** - The image to classify. Can be a tensor, DOM element image, video, or canvas. |
| 112 | + |
| 113 | +* **flipHorizontal** - Whether to flip/mirror the facial keypoints horizontally. Should be true for videos that are flipped by default (e.g. webcams). |
0 commit comments