|
| 1 | +# MediaPipe Facemesh |
| 2 | + |
| 3 | +MediaPipe Facemesh is a lightweight machine learning pipeline predicting 486 3D facial landmarks to infer the approximate surface geometry of a human face ([paper](https://arxiv.org/pdf/1907.06724.pdf)). |
| 4 | + |
| 5 | +<img src="demo.gif" alt="demo" style="width: 640px;"/> |
| 6 | + |
| 7 | +More background information about the model, as well as its performance characteristics on different datasets, can be found here: [https://drive.google.com/file/d/1VFC_wIpw4O7xBOiTgUldl79d9LA-LsnA/view](https://drive.google.com/file/d/1VFC_wIpw4O7xBOiTgUldl79d9LA-LsnA/view) |
| 8 | + |
| 9 | +The model is designed for front-facing cameras on mobile devices, where faces in view tend to occupy a relatively large fraction of the canvas. MediaPipe Facemesh may struggle to identify far-away faces. |
| 10 | + |
| 11 | +Check out our [demo](https://storage.googleapis.com/tfjs-models/demos/facemesh/index.html), which uses the model to detect facial landmarks in a live video stream. |
| 12 | + |
| 13 | +This model is also available as part of [MediaPipe](https://github.com/google/mediapipe/tree/master/mediapipe/models), a |
| 14 | +framework for building multimodal applied ML pipelines. |
| 15 | + |
| 16 | +## Installation |
| 17 | + |
| 18 | +Using `yarn`: |
| 19 | + |
| 20 | + $ yarn add @tensorflow-models/facemesh |
| 21 | + |
| 22 | +Using `npm`: |
| 23 | + |
| 24 | + $ npm install @tensorflow-models/facemesh |
| 25 | + |
| 26 | +Note that this package specifies `@tensorflow/tfjs-core` and `@tensorflow/tfjs-converter` as peer dependencies, so they will also need to be installed. |
| 27 | + |
| 28 | +## Usage |
| 29 | + |
| 30 | +To import in npm: |
| 31 | + |
| 32 | +```js |
| 33 | +import * as facemesh from '@tensorflow-models/facemesh'; |
| 34 | +``` |
| 35 | + |
| 36 | +or as a standalone script tag: |
| 37 | + |
| 38 | +```html |
| 39 | +<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-core"></script> |
| 40 | +<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-converter"></script> |
| 41 | +<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/facemesh"></script> |
| 42 | +``` |
| 43 | + |
| 44 | +Then: |
| 45 | + |
| 46 | +```js |
| 47 | + |
| 48 | +async function main() { |
| 49 | + // Load the MediaPipe facemesh model. |
| 50 | + const model = await facemesh.load(); |
| 51 | + |
| 52 | + // Pass in a video stream (or an image, canvas, or 3D tensor) to obtain an |
| 53 | + // array of detected faces from the MediaPipe graph. |
| 54 | + const predictions = await model.estimateFaces(document.querySelector("video")); |
| 55 | + |
| 56 | + if (predictions.length > 0) { |
| 57 | + /* |
| 58 | + `predictions` is an array of objects describing each detected face, for example: |
| 59 | +
|
| 60 | + [ |
| 61 | + { |
| 62 | + faceInViewConfidence: 1, // The probability of a face being present. |
| 63 | + boundingBox: { // The bounding box surrounding the face. |
| 64 | + topLeft: [232.28, 145.26], |
| 65 | + bottomRight: [449.75, 308.36], |
| 66 | + }, |
| 67 | + mesh: [ // The 3D coordinates of each facial landmark. |
| 68 | + [92.07, 119.49, -17.54], |
| 69 | + [91.97, 102.52, -30.54], |
| 70 | + ... |
| 71 | + ], |
| 72 | + scaledMesh: [ // The 3D coordinates of each facial landmark, normalized. |
| 73 | + [322.32, 297.58, -17.54], |
| 74 | + [322.18, 263.95, -30.54] |
| 75 | + ], |
| 76 | + annotations: { // Semantic groupings of the `scaledMesh` coordinates. |
| 77 | + silhouette: [ |
| 78 | + [326.19, 124.72, -3.82], |
| 79 | + [351.06, 126.30, -3.00], |
| 80 | + ... |
| 81 | + ], |
| 82 | + ... |
| 83 | + } |
| 84 | + } |
| 85 | + ] |
| 86 | + */ |
| 87 | + |
| 88 | + for (let i = 0; i < predictions.length; i++) { |
| 89 | + const keypoints = predictions[i].scaledMesh; |
| 90 | + |
| 91 | + // Log facial keypoints. |
| 92 | + for (let i = 0; i < keypoints.length; i++) { |
| 93 | + const [x, y, z] = keypoints[i]; |
| 94 | + |
| 95 | + console.log(`Keypoint ${i}: [${x}, ${y}, ${z}]`); |
| 96 | + } |
| 97 | + } |
| 98 | + } |
| 99 | +} |
| 100 | + |
| 101 | +main(); |
| 102 | + |
| 103 | +``` |
| 104 | + |
| 105 | +#### Parameters for facemesh.load() |
| 106 | + |
| 107 | +`facemesh.load()` takes a configuration object with the following properties: |
| 108 | + |
| 109 | +* **maxContinuousChecks** - How many frames to go without running the bounding box detector. Only relevant if maxFaces > 1. Defaults to 5. |
| 110 | + |
| 111 | +* **detectionConfidence** - Threshold for discarding a prediction. Defaults to 0.9. |
| 112 | + |
| 113 | +* **maxFaces** - The maximum number of faces detected in the input. Should be set to the minimum number for performance. Defaults to 10. |
| 114 | + |
| 115 | +* **iouThreshold** - A float representing the threshold for deciding whether boxes overlap too much in non-maximum suppression. Must be between [0, 1]. Defaults to 0.3. |
| 116 | + |
| 117 | +* **scoreThreshold** - A threshold for deciding when to remove boxes based on score in non-maximum suppression. Defaults to 0.75. |
| 118 | + |
| 119 | +#### Parameters for model.estimateFace() |
| 120 | + |
| 121 | +* **input** - The image to classify. Can be a tensor, DOM element image, video, or canvas. |
| 122 | + |
| 123 | +* **returnTensors** - (defaults to `false`) Whether to return tensors as opposed to values. |
| 124 | + |
| 125 | +* **flipHorizontal** - Whether to flip/mirror the facial keypoints horizontally. Should be true for videos that are flipped by default (e.g. webcams). |
| 126 | + |
| 127 | +#### Keypoints |
| 128 | + |
| 129 | +Here is map of the keypoints: |
| 130 | + |
| 131 | +<img src="mesh_map.jpg" alt="keypoints_map" style="width: 500px; height: 500px"> |
0 commit comments