
The machine learning model was built and trained using TensorFlow.
List of elements it can distinguish: paragraph, label, header, button, checkbox, radiobutto, rating, toggle, dropdown, listbox, textarea, textinput, datepicker, stepperinput, slider, progressbar, image, video.
The API is currently in closed alpha, but feel free to contact us if you want early access.
Send all requests to the API endpoint: https://api.vision.teleporthq.io/detection
Make sure to add a Content-Type key with the value application/json and a Teleport-Token key with the key provided by us.
The body of the request is a json with two keys: image and threshold.
- imageis a required string parameter that denotes the direct url to a publicly available jpg or png image.
- thresholdis an optional parameter. Default value is- 0.1. The detection model outputs a confidence score for each detection (between 0 and 1) and won't include in the response detections with confidence lower than this threshold.
Request body example:
{
    "image": "https://i.imgur.com/eF9KN8U.jpg", 
    "threshold": 0.5
}
curl -d '{"image": "https://i.imgur.com/eF9KN8U.jpg", "threshold": 0.5}' \
     -H "Teleport-Token: 123" -H "Content-Type: application/json" \ 
     -X POST https://api.vision.teleporthq.io/detection
If your request is a valid one, you will recieve back a json with the following structure:
[
    {
        "box": [y, x, height, width],
        "detectionClass": numeric_label,
        "detectionString": string_label,
        "score": confidence_rating
    },
    ...
]
The json contains a list of objects, each one of this objects corresponding to a detected atomic UI element in the image sent in the request. All of the keys will appear in all of the objects in your response array.
- boxcontains the coordinates of the bounding box surrounding the detected element.- xand- yare the coordinates of the top left corner of the box and- widthand- heightare self explanatory. All coordinates are normalized between [0, 1] where- (0,0)is the top left corner of your image and- (1, 1)is the bottom right corner. In other words, if you want to get the pixel coordinates you have to multiply- xand- widthwith the width of your image and- yand- heightwith the height of your image.
- detectionClassis the numeric class of the detection.
- detectionStringis the human-readable label of the detection.
- scorerepresents how confident the algorithm is that the predicted object is a correct / valid one. It takes values between- [0, 1], where- 1represents a 100% confidence in its detection.
The detectionClass to detectionString mapping is done according to this dictionary:
{
    1: "paragraph",
    2: "label",
    3: "header",
    4: "button",
    5: "checkbox",
    6: "radiobutton",
    7: "rating",
    8: "toggle",
    9: "dropdown",
    10: "listbox",
    11: "textarea",
    12: "textinput",
    13: "datepicker",
    14: "stepperinput",
    15: "slider",
    16: "progressbar",
    17: "image",
    18: "video"
}
Full response here.
[
    {
        "box": [
            0.144408,
            0.521686,
            0.548181,
            0.276308
        ],
        "detectionClass": 17,
        "detectionString": "image",
        "score": 0.999999
    },
    {
        "box": [
            0.886546,
            0.333103,
            0.06273400000000007,
            0.11624700000000004
        ],
        "detectionClass": 4,
        "detectionString": "button",
        "score": 0.989777
    },
    {
        "box": [
            0.252631,
            0.126722,
            0.04488399999999998,
            0.066244
        ],
        "detectionClass": 2,
        "detectionString": "label",
        "score": 0.98929
    }
]
If you are interested in using this API, feel free to get in touch with us via email, twitter or LinkedIn.
