|
| 1 | +# BodyPix |
| 2 | + |
| 3 | +Body Segmentation - Body Pix wraps the BodyPix JS Solution within the familiar |
| 4 | +TFJS API [BodyPix](https://github.com/tensorflow/tfjs-models/tree/master/body-pix). |
| 5 | + |
| 6 | +This model can be used to segment an image into pixels that are and are not part of a person, and into |
| 7 | +pixels that belong to each of twenty-four body parts. It works for multiple people in an input image or video. |
| 8 | + |
| 9 | +-------------------------------------------------------------------------------- |
| 10 | + |
| 11 | +## Table of Contents |
| 12 | + |
| 13 | +1. [Installation](#installation) |
| 14 | +2. [Usage](#usage) |
| 15 | + |
| 16 | +## Installation |
| 17 | + |
| 18 | +To use BodyPix: |
| 19 | + |
| 20 | +Via script tags: |
| 21 | + |
| 22 | +```html |
| 23 | +<!-- Require the peer dependencies of body-segmentation. --> |
| 24 | +<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-core"></script> |
| 25 | +<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-converter"></script> |
| 26 | + |
| 27 | +<!-- You must explicitly require a TF.js backend if you're not using the TF.js union bundle. --> |
| 28 | +<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-webgl"></script> |
| 29 | + |
| 30 | +<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/body-segmentation"></script> |
| 31 | +``` |
| 32 | + |
| 33 | +Via npm: |
| 34 | +```sh |
| 35 | +yarn add @tensorflow-models/body-segmentation |
| 36 | +yarn add @tensorflow/tfjs-core, @tensorflow/tfjs-converter |
| 37 | +yarn add @tensorflow/tfjs-backend-webgl |
| 38 | +``` |
| 39 | + |
| 40 | +----------------------------------------------------------------------- |
| 41 | +## Usage |
| 42 | + |
| 43 | +If you are using the Body Segmentation API via npm, you need to import the libraries first. |
| 44 | + |
| 45 | +### Import the libraries |
| 46 | + |
| 47 | +```javascript |
| 48 | +import * as bodySegmentation from '@tensorflow-models/body-segmentation'; |
| 49 | +import '@tensorflow/tfjs-core'; |
| 50 | +import '@tensorflow/tfjs-converter'; |
| 51 | +// Register WebGL backend. |
| 52 | +import '@tensorflow/tfjs-backend-webgl'; |
| 53 | +``` |
| 54 | + |
| 55 | +### Create a detector |
| 56 | + |
| 57 | +Pass in `bodySegmentation.SupportedModels.BodyPix` from the |
| 58 | +`bodySegmentation.SupportedModel` enum list along with an optional `segmenterConfig` to the |
| 59 | +`createSegmenter` method to load and initialize the model. |
| 60 | + |
| 61 | +**By default**, BodyPix loads a MobileNetV1 architecture with a **`0.75`** multiplier. This is recommended for computers with mid-range/lower-end GPUs. A model with a **`0.50`** multiplier is recommended for mobile. The ResNet architecture is recommended for computers with even more powerful GPUs. |
| 62 | + |
| 63 | +`segmenterConfig` is an object that defines BodyPix specific configurations for `BodyPixModelConfig`: |
| 64 | + |
| 65 | + * **architecture** - Can be either `MobileNetV1` or `ResNet50`. It determines which BodyPix architecture to load. |
| 66 | + |
| 67 | + * **outputStride** - Can be one of `8`, `16`, `32` (Stride `16`, `32` are supported for the ResNet architecture and stride `8`, and `16` are supported for the MobileNetV1 architecture). It specifies the output stride of the BodyPix model. The smaller the value, the larger the output resolution, and more accurate the model at the cost of speed. ***A larger value results in a smaller model and faster prediction time but lower accuracy***. |
| 68 | + |
| 69 | + * **multiplier** - Can be one of `1.0`, `0.75`, or `0.50` (The value is used *only* by the MobileNetV1 architecture and not by the ResNet architecture). It is the float multiplier for the depth (number of channels) for all convolution ops. The larger the value, the larger the size of the layers, and more accurate the model at the cost of speed. ***A smaller value results in a smaller model and faster prediction time but lower accuracy***. |
| 70 | + |
| 71 | + * **quantBytes** - This argument controls the bytes used for weight quantization. The available options are: |
| 72 | + |
| 73 | + - `4`. 4 bytes per float (no quantization). Leads to highest accuracy and original model size. |
| 74 | + - `2`. 2 bytes per float. Leads to slightly lower accuracy and 2x model size reduction. |
| 75 | + - `1`. 1 byte per float. Leads to lower accuracy and 4x model size reduction. |
| 76 | + |
| 77 | + The following table contains the corresponding BodyPix 2.0 model checkpoint sizes (widthout gzip) when using different quantization bytes: |
| 78 | + |
| 79 | + | Architecture | quantBytes=4 | quantBytes=2 | quantBytes=1 | |
| 80 | + | ------------------ |:------------:|:------------:|:------------:| |
| 81 | + | ResNet50 | ~90MB | ~45MB | ~22MB | |
| 82 | + | MobileNetV1 (1.00) | ~13MB | ~6MB | ~3MB | |
| 83 | + | MobileNetV1 (0.75) | ~5MB | ~2MB | ~1MB | |
| 84 | + | MobileNetV1 (0.50) | ~2MB | ~1MB | ~0.6MB | |
| 85 | + |
| 86 | + |
| 87 | +* **modelUrl** - An optional string that specifies custom url of the model. This is useful for local development or countries that don't have access to the models hosted on GCP. |
| 88 | + |
| 89 | +```javascript |
| 90 | +const model = bodySegmentation.SupportedModels.BodyPix; |
| 91 | +const segmenterConfig = { |
| 92 | + architecture: 'ResNet50', |
| 93 | + outputStride: 32, |
| 94 | + quantBytes: 2 |
| 95 | +}; |
| 96 | +segmenter = await bodySegmentation.createSegmenter(model, segmenterConfig); |
| 97 | +``` |
| 98 | + |
| 99 | +### Run inference |
| 100 | + |
| 101 | +Now you can use the segmenter to segment people. The `segmentPeople` method |
| 102 | +accepts both image and video in many formats, including: |
| 103 | +`HTMLVideoElement`, `HTMLImageElement`, `HTMLCanvasElement`, `ImageData`, `Tensor3D`. If you want more |
| 104 | +options, you can pass in a second `segmentationConfig` parameter. |
| 105 | + |
| 106 | +`segmentationConfig` is an object that defines BodyPix specific configurations for `BodyPixSegmentationConfig`: |
| 107 | + |
| 108 | + * **multiSegmentation** - Required. If set to true, then each person is segmented in a separate output, otherwise all people are segmented together in one segmentation. |
| 109 | + * **segmentBodyParts** - Required. If set to true, then 24 body parts are segmented in the output, otherwise only foreground / background binary segmentation is performed. |
| 110 | + * **flipHorizontal** - Defaults to false. If the segmentation & pose should be flipped/mirrored horizontally. This should be set to true for videos where the video is by default flipped horizontally (i.e. a webcam), and you want the segmentation & pose to be returned in the proper orientation. |
| 111 | + * **internalResolution** - Defaults to `medium`. The internal resolution percentage that the input is resized to before inference. The larger the `internalResolution` the more accurate the model at the cost of slower prediction times. Available values are `low`, `medium`, `high`, `full`, or a percentage value between 0 and 1. The values `low`, `medium`, `high`, and |
| 112 | +`full` map to 0.25, 0.5, 0.75, and 1.0 correspondingly. |
| 113 | + * **segmentationThreshold** - Defaults to 0.7. Must be between 0 and 1. For each pixel, the model estimates a score between 0 and 1 that indicates how confident it is that part of a person is displayed in that pixel. This *segmentationThreshold* is used to convert these values |
| 114 | +to binary 0 or 1s by determining the minimum value a pixel's score must have to be considered part of a person. In essence, a higher value will create a tighter crop |
| 115 | +around a person but may result in some pixels being that are part of a person being excluded from the returned segmentation mask. |
| 116 | + * **maxDetections** - Defaults to 10. For pose estimation, the maximum number of person poses to detect per image. |
| 117 | + * **scoreThreshold** - Defaults to 0.3. For pose estimation, only return individual person detections that have root part score greater or equal to this value. |
| 118 | + * **nmsRadius** - Defaults to 20. For pose estimation, the non-maximum suppression part distance in pixels. It needs to be strictly positive. Two parts suppress each other if they are less than `nmsRadius` pixels away. |
| 119 | + |
| 120 | +If **multiSegmentation** is set to true then the following additional parameters can be adjusted: |
| 121 | + |
| 122 | + * **minKeypointScore** - Default to 0.3. Keypoints above the score are used for matching and assigning segmentation mask to each person.. |
| 123 | + * **refineSteps** - Default to 10. The number of refinement steps used when assigning the individual person segmentations. It needs to be strictly positive. The larger the higher the accuracy and slower the inference. |
| 124 | + |
| 125 | +The following code snippet demonstrates how to run the model inference: |
| 126 | + |
| 127 | +```javascript |
| 128 | +const segmentationConfig = {multiSegmentation: true, segmentBodyParts: false}; |
| 129 | +const people = await segmenter.segmentPeople(image, segmentationConfig); |
| 130 | +``` |
| 131 | + |
| 132 | +When `multiSegmentation` is set to false, the returned `people` array contains a single element where all the people segmented in the image are found in that single segmentation element. When `multiSegmentation` is set to true, then the length of the array will be equal to the number of detected people, each segmentation containing one person. |
| 133 | + |
| 134 | +When `segmentBodyParts` is set to false, the only label returned by the maskValueToLabel function is 'person'. When `segmentBodyParts` is set to true, the maskValueToLabel function will return one of the body parts defined by BodyPix, where the mapping of mask values to label is as follows: |
| 135 | + |
| 136 | +| Part Id | Part Name | Part Id | Part Name | |
| 137 | +|---------|------------------------|---------|------------------------| |
| 138 | +| 0 | left_face | 12 | torso_front | |
| 139 | +| 1 | right_face | 13 | torso_back | |
| 140 | +| 2 | left_upper_arm_front | 14 | left_upper_leg_front | |
| 141 | +| 3 | left_upper_arm_back | 15 | left_upper_leg_back |
| 142 | +| 4 | right_upper_arm_front | 16 | right_upper_leg_front |
| 143 | +| 5 | right_upper_arm_back | 17 | right_upper_leg_back |
| 144 | +| 6 | left_lower_arm_front | 18 | left_lower_leg_front |
| 145 | +| 7 | left_lower_arm_back | 19 | left_lower_leg_back |
| 146 | +| 8 | right_lower_arm_front | 20 | right_lower_leg_front |
| 147 | +| 9 | right_lower_arm_back | 21 | right_lower_leg_back |
| 148 | +| 10 | left_hand | 22 | left_foot |
| 149 | +| 11 | right_hand | 23 | right_foot |
| 150 | + |
| 151 | + |
| 152 | +Please refer to the Body Segmentation API |
| 153 | +[README](https://github.com/tensorflow/tfjs-models/blob/master/body-segmentation/README.md#how-to-run-it) |
| 154 | +about the structure of the returned `people` array. |
0 commit comments