diff --git a/README.md b/README.md index 24249e8..a0fa9a1 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ 𝚊𝚋𝚞𝚕𝚊𝚏𝚒𝚊 is a tool for creating and deploying tasks on the the [Toloka](https://toloka.ai) crowdsourcing platform. -The tool allows you to create crowdsourcing tasks using pre-defined task interfaces and configuring their settings using [YAML](https://en.wikipedia.org/wiki/YAML) files. +The tool allows you to create crowdsourcing tasks using pre-defined task interfaces and to configure their settings using [YAML](https://en.wikipedia.org/wiki/YAML) files. For a description of the tool and the motivation for its development, see this [publication](https://aclanthology.org/2022.latechclfl-1.2/). @@ -10,7 +10,7 @@ Please cite the following publication if you use the tool in your research. > Tuomo Hiippala, Helmiina Hotti, and Rosa Suviranta. 2022. Developing a tool for fair and reproducible use of paid crowdsourcing in the digital humanities. In *Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature*, pages 7–12, Gyeongju, Republic of Korea. International Conference on Computational Linguistics. -For convenience, you can find the BibTeX entry below. +For convenience, you can use the BibTeX entry below. ```text @inproceedings{hiippala-etal-2022-developing, @@ -33,189 +33,27 @@ You can install the tool from [PyPI](https://pypi.org/project/abulafia/) using t Alternatively, you can clone this repository and install the tool locally. Move to the directory that contains the repository and type: `pip install .` -## Key concepts +## Usage -𝚊𝚋𝚞𝚕𝚊𝚏𝚒𝚊 defines three basic components for building crowdsourcing pipelines: tasks, actions and task sequences. +See the directory [`examples`](/examples) for documentation and practical examples. -In [Toloka terms](https://toloka.ai/docs/guide/glossary.html), tasks are equal to projects, whereas task sequences consist of projects that are connected to each other. Actions, in turn, operate on the input/output data of projects. +To deploy your crowdsourcing tasks to Toloka, the tool needs to read your credentials from a JSON file e.g. `creds.json`. Never add this file to version control. -### Tasks +The file must contain the following key/value pairs in JSON: -Each crowdsourcing task is specified and configured using a YAML file. Each configuration file should include the following keys: - -- `name` of the task -- the types of `input` and `output` data under key `data` -- `actions`, if applicable -- `interface` settings -- `project` settings -- `pool` settings - -Optionally, you can add `quality_control` settings. Options for quality control are the following: - -- [Fast responses](https://toloka.ai/docs/guide/concepts/quick-answers.html) -- [Skipped assignments](https://toloka.ai/docs/guide/concepts/skipped-assignments.html) -- Re-do assignments from banned users -- [Captcha](https://toloka.ai/docs/guide/concepts/captcha.html) -- [Golden set](https://toloka.ai/en/docs/toloka-kit/reference/toloka.client.collectors.GoldenSet) (performance on control tasks) - -See the directory [`examples/config`](https://github.com/thiippal/abulafia/tree/main/examples/config) for examples of YAML configuration files. - -**Blocklist:** If you want to prevent some users from having access to a specific pool, add the key `blocklist` under `pool` configuration and give a path to a TSV file containing the column `user_id` with user identifiers of the workers you would like to block (see the example in [`examples/config/detect_text.yaml`](https://github.com/thiippal/abulafia/blob/main/examples/config/detect_text.yaml)). - -### Actions - -Just like crowdsourcing tasks, each action requires its own YAML configuration file. [`examples/action_demo.py`](https://github.com/thiippal/abulafia/blob/main/examples/action_demo.py) defines a pipeline that uses the `Aggregate`, `Forward` and `SeparateBBoxes` actions. - -**Forward** action requires the following keys: - -- `name` of the action -- `data` -- `source`, the pool where the tasks to be forwarded originate - -Variable names for the possible outputs for the source task and pools to which they should be forwarded are configured under the key `on_result` under `actions`. - -You can either configure a pool to which to forward, or use the keywords `accept` or `reject` to automatically accept or reject tasks based on the output. These keywords are meant to be used for tasks that involve workers verifying work submitte by other workers. - -For example, you can ask workers to determine if an image has been annotated correctly. You can then use aggregation and forwarding to automatically accept or reject the *original* task by using key-value pairs such as `correct: accept` and `incorrect: reject` in your `Forward` configuration. You can also configure both accepting/rejecting and forwarding to another pool. In that case, use a list as the value for the variable name of the output. See the file [`examples/action_demo.py`](https://github.com/thiippal/abulafia/blob/main/examples/action_demo.py) and the associated YAML configuration files for an example. - -Configure `Forward` actions to the source pool/action under `actions` with the key `on_result`. - -**Aggregate** action requires the keys: - -- `name` of the action -- `source`, the pool from which tasks go to the aggregate action -- The forward action to which the aggregated results will be sent should be configured under key `on_result` under `actions` -- `method`, which is the desired [aggregation algorithm](https://toloka.ai/en/docs/crowd-kit/). For now, categorical methods are supported. - -Configure `Aggregate` actions to the source pool under `actions` with the key `on_closed`; aggregation can only be done after all tasks are complete and the pool is closed. - -**SeparateBBoxes** action requires the keys: - -- `name` of the action -- The type of data that the action produces should be configured under the key `output` under `data` - -If you wish to start your pipeline with `SeparateBBoxes`, configure it under `actions` as value for the key `data_source` in the following pool. Then, the action reads a TSV file with images and bounding boxes and separates the bounding boxes to one per task. Note that the bounding boxes must be in the format that Toloka uses. If you want to have the action in the middle of a pipeline, you can configure it in your `Forward` action under one of the possible outputs of your task (for example; if you want all tasks with the output `True` to be forwarded to `SeparateBBoxes`, configure `True: name_of_your_separatebboxes_action` under `on_result` under `actions`. See `config/forward_verify.yaml` for an example). If you want, you can add a label for the bounding boxes in the resulting tasks, by giving the label as a value for the parameter `add_label`. Labelled bounding boxes are used in, for example, `AddOutlines` and `LabelledSegmentationVerification` tasks. - -### Task sequences - -Task sequences are pipelines can consist of crowdsourcing tasks as well as actions that perform operations before, between or after tasks. The `Forward` action is used to transfer tasks from one pool to another based on task output. The `Aggregate` action is used to aggregate the output of a task; the action uses your aggregation algorithm of choice to determine the most probable output to a task. `SeparateBBoxes` is an action that takes an image with several bounding boxes, separates the bounding boxes to one per image, and creates new tasks from those. - -If you wish to move tasks from one pool to another based on the acceptance status of the task, not the task output, you can configure the receiving pool under `actions` with keys `on_submitted`, `on_accepted` or `on_rejected`. For example, if you wish rejected work to go back to the pool to be re-completed by another worker, you can configure the current pool as value to the key `on_rejected`. - -To deploy your crowdsourcing tasks to Toloka, the tool needs to read your credentials from a JSON file e.g. `creds.json`. Remember to never add this file to public version control. The contents of the file should be the following: - -``` +```json { "token": "YOUR_OAUTH_TOKEN", "mode": "SANDBOX" } ``` -When you've tested your pipeline in the Toloka sandbox, change the value for `"mode"` from `"SANDBOX"` to `"PRODUCTION"`. - -See the directory [`examples/`](https://github.com/thiippal/abulafia/tree/main/examples) for examples of crowdsourcing pipelines. +When you have tested your tasks in the Toloka sandbox, change the value for `"mode"` from `"SANDBOX"` to `"PRODUCTION"` to deploy the tasks on Toloka. -The screenshot below shows an example of running the tool. +The screenshot below illustrates tool in action. -## Ensuring fair payments - -The tool has a built-in mechanism that guides the user to determine rewards that result in a fair hourly wage ($12) for the crowdsourced workers. In the pool configuration, the user should add a key `estimated_time_per_suite`. The value for the key should be the estimated time in seconds it takes for the worker to complete one task suite. Based on this value and the value `reward_per_assignment`, the tool checks if the reward is high enough to result in a fair hourly wage. The user is presented with a warning and prompted to cancel the pipeline if the configured reward is too low. A warning is also raised if `estimated_time_per_suite` is not found in the pool configuration. - -To calculate a fair reward per task suite, you can use the interactive script `utils/calculate_fair_rewards.py`. - -## Pre-defined interfaces - -Define crowdsourcing tasks in a Python file by creating one or many of the task objects listed below. They all take arguments `configuration`, which is the path to the correct YAML configuration file, and `client`, which should be your Toloka client. - -You can define additional task interfaces by inheriting the [`CrowdsourcingTask`](https://github.com/thiippal/abulafia/blob/main/src/abulafia/task_specs/core_task.py) class. The currently implemented task interfaces can be found in [`src/abulafia/task_specs/task_specs.py`](https://github.com/thiippal/abulafia/tree/main/src/abulafia/task_specs). These task interfaces are described in greater detail below. - -### ImageClassification - -Interface for binary image classification tasks. - -|input|output| -|-----|------| -| `url` (image) | `boolean` (true/false) | - -### ImageSegmentation - -Interface for image segmentation tasks. - -|input|output| -|-----|------| -|`url` (image) | `json` (bounding boxes) | - -### AddOutlines - -Interface for image segmentation tasks with pre-existing labelled outlines. - -|input|output| -|-----|------| -|`url` (image) | `json` (bounding boxes) | -| `json` (bounding boxes) | | - -### SegmentationClassification - -Interface for binary segmentation classification tasks. - -|input|output| -|-----|------| -|`url` (image) | `boolean` (true/false) | -| `json` (bounding boxes) | | - -input: url to an image, JSON coordinates of bounding boxes\ -output: boolean - -### SegmentationVerification - -Interface for binary segmentation verification tasks. - -|input|output| -|-----|------| -|`url` (image) | `boolean` (true/false) | -| `json` (bounding boxes) | | - -### LabelledSegmentationVerification - -Interface for verifying image segmentation tasks where the bounding boxes have labels. - -|input|output| -|-----|------| -|`url` (image) | `boolean` (true/false) | -| `json` (bounding boxes) | | - -### FixImageSegmentation - -Interface for fixing and adding more outlines to images with pre-existing non-labelled outlines. - -|input|output| -|-----|------| -|`url` (image) | `json` (bounding boxes) | -| `json` (bounding boxes) | | - -### MulticlassVerification - -Interface for verification tasks with more than two possible outputs (for example: *yes*, *no* and *maybe*). - -|input|output| -|-----|------| -|`url` (image) | `string` (values) | -| `json` (bounding boxes) | | - -### TextClassification - -Interface for the classification of text. - -|input|output| -|-----|------| -|`string`|`string`| - -### TextAnnotation - -Interface for annotation words or other segments within a text. +## Contact -|input|output| -|-----|------| -|`string`|`json`| +If you have questions about the tool, feel free to contact tuomo.hiippala (at) helsinki.fi or open an issue on GitHub. diff --git a/examples/README.md b/examples/README.md new file mode 100644 index 0000000..4579b58 --- /dev/null +++ b/examples/README.md @@ -0,0 +1,1081 @@ +# Examples and tutorials + +- [Creating Task objects](#creating-task-objects) + - [ImageClassification](#imageclassification) + - [ImageSegmentation](#imagesegmentation) + - [SegmentationVerification](#segmentationverification) + - [TextClassification](#textclassification) + - [TextAnnotation](#textannotation) +- [Configuring Tasks](#configuring-tasks) + - [Naming a Task](#naming-a-task) + - [Defining input and output data](#defining-input-and-output-data) + - [Setting up projects](#setting-up-projects) + - [Creating pools](#creating-pools) + - [Configuring training](#configuring-training) + - [Configuring quality control](#configuring-quality-control) +- [Combining Tasks into Task Sequences](#combining-tasks-into-task-sequences) +- [Processing Task outputs using Actions](#processing-task-outputs-using-actions) + - [Forward](#forward) + - [Aggregate](#aggregate) + - [VerifyPolygon](#verifypolygon) + - [SeparateBBoxes](#separatebboxes) +- [Tutorials](#tutorials) + - [Creating a Task for classifying images](#creating-a-task-for-classifying-images) + +## Creating Task objects + +In 𝚊𝚋𝚞𝚕𝚊𝚏𝚒𝚊, user interfaces are associated with Python classes that define the allowed input and output data types. + +To create a Task object, create a YAML configuration file for a Task and pass this configuration to the appropriate class. + +The following example creates a Task object using the `TextClassification` class, whose configuration is contained in a YAML configuration file named `classify_text`. A Toloka client stored under the variable `client` is used to interact with the Toloka platform. + +```python +task = TextClassification(configuration='classify_text.yaml', client=client) +``` + +Use the top-level key `interface` in the YAML file to configure the user interface for a given Task as exemplified in connection with each pre-defined interface below. + +You can create additional interfaces by inheriting the [`CrowdsourcingTask`](src/abulafia/task_specs/core_task.py) class, which defines the basic functionalities of a Task. + +The currently implemented interfaces can be found in [`task_specs.py`](src/abulafia/task_specs/task_specs.py). These interfaces are documented below. + +### ImageClassification + +A class for image classification tasks. The following input and output formats are supported. + +| Input | Output | +|---------------|--------------------------------| +| `url` (image) | `boolean` (true/false) | +| | `string` (for multiple labels) | + +Configure the interface by adding the following keys under the top-level key `interface`. + +| Key | Description | +|----------|--------------------------------------------------------------------------------------------------| +| `prompt` | A string that defines a text that is shown above the buttons on the interface. | +| `labels` | Key/value pairs that define the labels shown on the interface and the values stored in the data. | + +The following example adds a prompt with two labels. The interface will show two options, *Yes* and *No*, which store the values `true` and `false`, respectively. + +```yaml +interface: + prompt: "Does the image contain text, letters or numbers?" + labels: + true: "Yes" + false: "No" +``` + +### ImageSegmentation + +A class for image segmentation tasks. The following input and output formats are supported. + +| input | output | +|-------------------------|-------------------------------| +| `url` (image) | `json` (bounding boxes) | +| `json` (bounding boxes) | `boolean` (optional checkbox) | + +Configure the interface by adding the following keys under the top-level key `interface`. + +| Key | Description | +|-----------------------|--------------------------------------------------------------------------------------------------| +| `prompt` | A string that defines a text that is shown below the image annotation interface. | +| `tools` | A list of values that defines the annotation tools available for the interface. | +| `labels` (optional) | Key/value pairs that define the labels shown on the interface and the values stored in the data. | +| `checkbox` (optional) | A string that defines a text that is shown above the checkbox in the interface. | + +The following example defines a prompt, an image segmentation interface with three labels, two annotation tools and a checkbox. + +```yaml +interface: + prompt: "Outline all elements with text, letters or numbers." + tools: + - rectangle + - polygon + labels: + text: "Text" + letter: "Letter" + number: "Number" + checkbox: "Check this box if there is nothing to outline." +``` + +For the annotation tools, valid values include `rectangle`, `polygon` and `point`. Their order defines the order in which they appear in the user interface. If no tools are defined, all tools are made available by default. + +If a `checkbox` is added to the user inteface, you must add an input data variable with the type `boolean` to the output. The checkbox can be used to mark images that do not contain any objects to be segmented. If selected, the checkbox stores the value `true`, and `false` if the checkpoint is not selected. + +If you want to show pre-existing annotations, you must add an input data variable with the type `json`. + +### SegmentationVerification + +A class for verifying bounding boxes and other forms of image segmentation. The following input and output formats are supported. + +| Input | Output | +|-------------------------|--------------------------------| +| `url` (image) | `boolean` (true/false) | +| `json` (bounding boxes) | `string` (for multiple labels) | +| `boolean` (checkbox) | | +| `string` (checkbox) | | + +Configure the interface by adding the following keys under the top-level key `interface`. + +| Key | Description | +|----------------------------------|-------------------------------------------------------------------------------------------------| +| `prompt` | A string that defines a text that is shown above the radio buttons on the interface. | +| `labels` | Key/value pairs that define the labels for the radio buttons and the values stored in the data. | +| `segmentation/labels` (optional) | Key/value pairs that define the labels for bounding boxes and the values stored in the data. | +| `checkbox` (optional) | A string that defines a text that is shown above the checkbox in the interface. | + +The following example defines a prompt, an image segmentation interface with two labels for bounding boxes and two labels for the radio buttons. + +```yaml +interface: + prompt: "Is the image annotated correctly?" + checkbox: "There is nothing to annotate in this image." + segmentation: + labels: + source: "Source" + target: "Target" + labels: + true: "Yes" + false: "No" +``` + +### TextClassification + +A class for text classification tasks. The following input and output formats are supported. + +| Input | Output | +|----------|--------------------------------| +| `string` | `boolean` (true/false) | +| | `string` (for multiple labels) | + +Configure the interface by adding the following keys under the top-level key `interface`. + +| Key | Description | +|----------|--------------------------------------------------------------------------------------------------| +| `prompt` | A string that defines a text that is shown above the buttons on the interface. | +| `labels` | Key/value pairs that define the labels shown on the interface and the values stored in the data. | + +The following example defines an interface with a prompt and three labels. The interface will show three options, *Positive*, *Negative* and *Neutral*, which store the values `positive`, `negative` and `neutral`, respectively. + +```yaml +interface: + prompt: Read the text and classify its sentiment. + labels: + positive: Positive + negative: Negative + neutral: Neutral +``` + +### TextAnnotation + +A class for text annotation tasks. The following input and output formats are supported. + +| input | output | +|----------|--------| +| `string` | `json` | + +Configure the interface by adding the following keys under the top-level key `interface`. + +| Key | Description | +|----------|--------------------------------------------------------------------------------------------------| +| `prompt` | A string that defines a text that is shown above the buttons on the interface. | +| `labels` | Key/value pairs that define the labels shown on the interface and the values stored in the data. | + +The following example defines an interface with a prompt and three labels. The interface will show three options, *Verb*, *Noun* and *Adjective*, which store the values `verb`, `noun` and `adj`, respectively. + +```yaml +interface: + prompt: Annotate verbs, nouns and adjectives in the text below. + labels: + verb: Verb + noun: Noun + adj: Adjective +``` + +## Configuring Tasks + +In 𝚊𝚋𝚞𝚕𝚊𝚏𝚒𝚊, a Task refers to a crowdsourcing task that is defined using its own YAML configuration file. + +The following sections describe how to configure a Task. + +### Naming a Task + +Use the top-level key `name` in the YAML configuration to name the Task. Task names are used to identify and set up connections between Tasks in a pipeline. + +The following example gives the Task the name `my_task`. + +```yaml +name: my_task +``` + +### Defining input and output data + +#### Specifying data types + +Each Task requires a data specification, which determines the types of input and output data associated with the Task. + +In the YAML configuration, the inputs and outputs are defined under the top-level key `data` using the keys `input` and `output`. + +To define input and output data, provide key/value pairs that define the name of the data and its type, e.g. `outlines` and `json`. + +```yaml +data: + input: + outlines: json + output: + correct: bool +``` + +#### Loading data from a file + +You can place the key `file` under `data` to provide input data to the Task. The value of this key should point towards a TSV file that contains the input data. The TSV file must contain columns with headers that match those defined under the key `input`. + +```yaml +data: + file: images.tsv + input: + image: url + output: + result: bool +``` + +#### Setting up human verification + +If a Task is used for verifying work submitted by other crowdsourced workers, you must add the key `verify` under `data` and set its value to `true`. This adds the output data from the incoming Tasks to the input of the current Task, while also making the verification assignment unavailable to the worker who completed the original assignment. + +```yaml +data: + verify: true + input: + outlines: json + output: + result: bool +``` + +### Setting up projects + +Projects are the most abstract entity on Toloka. A project may include multiple pools, which contain assignments for the workers. User interfaces are also defined at the level of a project. + +In the YAML configuration file, project settings are configured using the top-level key `projects`. + +#### Loading an existing project from Toloka + +To load an existing project from Toloka, add the key `id` under the top-level key `project`. Then provide the project ID as the value. + +```yaml +project: + id: 12345 +``` + +#### Creating new projects + +To create a new project, use the key `setup` to define a public name and a description for the project, which are displayed on the Toloka platform for prospective workers. + +Use the keys `public_name` and `public_description` to provide the name and description as strings. + +The key `instructions` should point to an HTML file that provides instructions for completing the task. + +```yaml +project: + setup: + public_name: "Project name" + public_description: "A brief description of the project." + instructions: my_instructions.html +``` + +### Creating pools + +A pool contains **assignments** for the workers to complete. Each assignment is contained in a **task suite**. + +Pool settings are configured under the top-level key `pool` in the YAML configuration file. + +#### Loading existing pools + +To load an existing pool from Toloka, add the key `id` under the top-level key `pool`. Then provide the pool ID as the value. + +```yaml +pool: + id: 6789 +``` + +#### Creating new pools + +To begin with, the key `estimated_time_per_suite`, which must be placed under the key `pool`, is used to calculate a fair reward for the workers (an average hourly wage of 12 USD). + +Provide the estimated time required to complete a task suite in seconds. To calculate a fair reward per task suite, you can use the interactive script [`utils/calculate_fair_rewards.py`](../utils/calculate_fair_rewards.py). + +The following example sets the estimated time needed for completing a single task suite to 60 seconds. + +```yaml +pool: + estimated_time_per_suite: 60 +``` + +The following sections describe how to set the main properties of pools. + +##### `setup` + +The basic properties of a pool are defined under the mandatory key `setup`. The following key/value pairs can be defined under the key `pool`. + +| Key | Value | Description | +|:----------------------------------|:--------|:---------------------------------------------------------------| +| `private_name` | string | A private name for the pool; not shown on the platform | +| `reward_per_assignment` | float | The reward paid for completing a single task suite in USD | +| `assignment_max_duration_seconds` | integer | The maximum time allowed for competing a task suite in seconds | +| `auto_accept_solutions` | boolean | Whether submitted work is accepted and paid for immediately | + +If the value of `auto_accept_solutions` is set to `false`, the task suites must be accepted or rejected manually. This may be achieved using the Toloka web interface or by directing the tasks to another pool for [verification by other workers](#setting-up-human-verification). + +The following example illustrates the use of the variables discussed above. + +```yaml +pool: + setup: + private_name: "Dataset 1" + reward_per_assignment: 0.15 + assignment_max_duration_seconds: 600 + auto_accept_solutions: false +``` + +#### `defaults` + +The mandatory key `defaults` is used to define default settings for assignments and task suites. The following key/value paris can be defined under the key `mixer`. + +| Key | Value | Description | +|:---------------------------------|:--------|:-------------------------------------------------| +| `default_overlap_for_new_tasks` | integer | How many workers should complete each assignment | +| `default_overlap_for_new_suites` | integer | How many workers should complete each task suite | + +The following example sets the value of both settings to 3. + +```yaml +pool: + defaults: + default_overlap_for_new_tasks: 3 + default_overlap_for_new_task_suites: 3 +``` + +#### `mixer` + +The mandatory key `mixer` is used to define the mix of different assignment types in each task suite. The following key/value pairs can be provided under the key `mixer`. + +| Key | Value | Description | +|:-------------------------|:--------|:----------------------------------------------------------------| +| `real_tasks_count` | integer | The number of actual assignments in each task suite | +| `golden_tasks_count` | integer | The number of assignments with known answers in each task suite | +| `training_tasks_count` | integer | The number of training assignments in each task suite | + +The actual assignments are drawn from the [input data](#specifying-data-types), whereas the golden assignments can be used to evaluate the quality of work submitted to the pool. + +The following example sets the number of real assignments to 5 and the number of golden assignments to 1, while leaving the number of training assignments to 0. This means that each task suite in the pool contains 6 assignments. + +```yaml +pool: + mixer: + real_tasks_count: 5 + golden_tasks_count: 1 + training_tasks_count: 0 +``` + +#### `filter` + +The optional key `filter` is used to allow only workers with certain characteristics to access the pool. Note that filters are used to *limit* access: without any filters, all workers on Toloka can access the pool. + +The following key/value pairs can be provided under the key `filter`. + +| Key | Value | Description | +|:------------------|:---------------------|:-------------------------------------------------------------------------| +| `skill` | list of dictionaries | Limit workers to these skills and skill levels | +| `languages` | list of strings | Limit workers to these languages (two-letter ISO 639-1 code) | +| `client_type` | list of strings | Limit workers to these clients (BROWSER or TOLOKA_APP) | +| `education` | list of strings | Limit workers to these education levels (BASIC, MIDDLE, HIGH) | +| `gender` | string | Limit workers to this gender (MALE or FEMALE) | +| `adult_allowed` | boolean | Limit workers to those who have agreed to work with adult content | +| `country` | list of strings | Limit workers to these countries (two-letter ISO3166-1 codes) | +| `date_of_birth` | dictionary | Limit workers to those born before or after this date (unix timestamp) | +| `user_agent_type` | list of strings | Limit workers to these user agent types (BROWSER, MOBILE_BROWSER, OTHER) | + +The following example demonstrates the use of all currently implemented filters: + +```yaml +pool: + filter: + skill: + - 12345: 80 + languages: + - EN + - FI + client_type: + - BROWSER + - TOLOKA_APP + education: + - HIGH + - MIDDLE + gender: FEMALE + adult_allowed: false + country: + - GB + - US + - FI + date_of_birth: + before: 631144800 + user_agent_type: + - BROWSER + - MOBILE_BROWSER + - OTHER +``` + +#### `blocklist` + +Use the optional key `blocklist` to block certain users from accessing the pool and the associated training. Provide a path to a TSV file with user identiers to be blocked as the value for this key. + +The blocklist column that contains the user identifiers must have the header `user_id`. See an example of a blocklist file [here](data/blocklist.tsv). + +The following example illustrates the use of a blocklist file. + +```yaml +pool: + blocklist: data/blocklist.tsv +``` + +#### `exam` + +The optional key `exam` can be used to configure an examination pool, which contains assignments with known solutions. These assignments can be used to evaluate the performance of workers and to grant them skills. + +The following key/value pairs can be provided under the key `exam`. + +| Key | Value | Description | +|:-----------------|:--------|:--------------------------------------------------------------------------------| +| `history_size` | integer | The number of assignments taken into account when evaluating examination score | +| `min_answers` | integer | The minimum number of assignments that the worker must complete to be evaluated | +| `max_performers` | integer | How many workers are allowed to take the exam before the pool closes | + +The following example sets the number of assignments to be evaluated to 20, defines that the worker must complete all 20 assignments to be evaluated, and closes the pool when 50 performers have taken the exam. + +```yaml +pool: + exam: + history_size: 20 + min_answers: 20 + max_performers: 20 +``` + +#### `skill` + +The optional key `skill` is used to define the skill assigned to a worker upon completing the examination. + +The following key/value pairs can be provided under the key `exam`. + +| Key | Value | Description | +|:-----------------|:--------|:-----------------------------------------------------------------| +| `id` | integer | A valid identifier for a pre-existing skill on Toloka | +| `name` | string | The name for the new skill to be created | +| `language` | string | The language associated with the new skill as an ISO 639-1 code | +| `description` | string | A description of the new skill in the language defined above | + +The following example shows how to grant an existing skill to workers who complete the examination: + +```yaml +pool: + skill: + id: 12345 +``` + +The following example shows how to create a new skill that is granted to workers who complete the examination: + +```yaml +pool: + skill: + name: "My new skill" + language: EN + description: "This is my new skill." +``` + +#### `training` + +Use the optional key `training` to set the skill level that the workers must achieve in [training](#configuring-training). + +Use the key `training_passing_skill_value` to determine the percentage of correct answers needed for accessing the actual task suites. + +The following example sets the training performance threshold to 70% for accessing the pool. + +```yaml +pool: + training: + training_passing_skill_value: 70 +``` + +### Configuring training + +To train the workers in performing a task, use the top-level key `training` to define a training pool that must be completed before accessing the pool that contains the actual assignments. + +Use the key `setup` to configure the training pool. The following key/value pairs can be defined under the key `setup`. + +| Key | Value | Description | +|:-----------------------------------|:--------|:----------------------------------------------------------------------------| +| private_name | string | A private name for the training pool; not shown on the platform | +| shuffle_tasks_in_training_suite | boolean | Defines whether the assignments are shuffled in the training pool | +| assignment_max_duration_seconds | integer | The maximum time allowed for competing a task suite in seconds | +| training_tasks_in_task_suite_count | integer | The number of assignments in each training pool | +| retry_training_after_days | integer | Defines when the worker can try the training again after failing | +| inherited_instructions | boolean | Defines whether the training pool uses the same instructions as the project | + +Use the key `data` to configure input and output variables and the source of data, as instructed [above](#defining-input-and-output-data). + +The following example illustrates the configuration of training tasks. + +```yaml +training: + setup: + private_name: Training for an examination + shuffle_tasks_in_task_suite: false + assignment_max_duration_seconds: 600 + training_tasks_in_task_suite_count: 5 + retry_training_after_days: 1 + inherited_instructions: true + data: + file: training.tsv + input: + image: url + outlines: json + no_target: bool + output: + result: bool +``` + +### Configuring quality control + +Use the optional top-level key `quality_control` to define settings for automatic quality control. The following keys can be used to configure the quality control mechanisms. + +#### `speed_quality_balance` + +Use the key `speed_quality_balance` to limit access to the Task according to worker reputation. The following key/value pairs must be defined under the key `speed_quality_balance`. + +| Key | Value | Description | +|:-----------------------------------|:--------|:-------------------------------------------------------------------------------| +| `top_percentage_by_quality` | integer | The percentage of workers with the highest reputation who can access the Task. | +| `best_concurrent_users_by_quality` | integer | The number of workers with the highest reputation who can access the Task. | + +The following example allows only the highest-ranked 10% of workers to access the Task. + +```yaml +quality_control: + speed_quality_balance: + top_percentage_by_quality: 10 +``` + +The example below allows only the 20 workers with the highest reputation currently active on the platform to access the Task. + +```yaml +quality_control: + speed_quality_balance: + best_concurrent_users_by_quality: 20 +``` + +#### `fast_responses` + +Use the key `fast_responses` to ban workers if they complete assignments too quickly, which may be indicative of spamming. The following key/value pairs must be defined under the key `fast_responses`. + +| Key | Value | Description | +|:---------------|:--------|:------------------------------------------------------------------------------| +| `history_size` | integer | The number of previous assignments considered when evaluating response times. | +| `count` | integer | The maximum number of fast responses allowed within the `history_period`. | +| `threshold` | integer | The threshold for defining a response as fast in seconds. | +| `ban_duration` | integer | How long the worker will be banned from accessing the Task. | +| `ban_units` | string | Temporal unit that defines ban duration: MINUTES, HOURS, DAYS or PERMANENT. | + +The following example bans users who complete 3 out of the 5 most recent assignments in less than 10 seconds for 2 days. + +```yaml +quality_control: + fast_responses: + history_size: 5 + count: 3 + threshold: 10 + ban_duration: 2 + ban_units: DAYS +``` + +#### `skipped_assignments` + +Use the key `skipped_assignments` to ban workers who skip too many assignments in a row. The following key/value pairs must be defined under the key `skipped_assignments`. + +| Key | Value | Description | +|:---------------|:--------|:------------------------------------------------------------------------------------| +| `count` | integer | The maximum number of assignments that the user may skip without getting banned. | +| `ban_duration` | integer | How long the worker will be banned from accessing the Task. | +| `ban_units` | string | Temporal unit that defines ban duration: `MINUTES`, `HOURS`, `DAYS` or `PERMANENT`. | + +The following example bans workers who skip more than 10 tasks in a row for 30 minutes. + +```yaml +quality_control: + skipped_assignments: + count: 10 + ban_duration: 30 + ban_units: MINUTES +``` + +#### `redo_banned` + +Use the key `redo_banned` to re-do all assignments completed by a banned user. The following key/value pair must be defined under the key `redo_banned`. + +| Key | Value | Description | +|:--------------|:--------|:-----------------------------------------------------------------| +| `redo_banned` | boolean | Whether assignments from banned users should be completed again. | + +This following example re-does assignments completed by banned users. + +```yaml +quality_control: + redo_banned: true +``` + +#### `golden_set` + +Use the key `golden_set` to evaluate worker performance using 'golden' assignments with known answers. The following key/value pair must be defined under the key `golden_set`. + +| Key | Value | Description | +|:---------------|:--------|:------------------------------------------------------------------------------------------------| +| `history_size` | integer | The number of previous assignments with known answers that are evaluated when processing rules. | + +The following example evaluates worker responses to the last 10 assignments with known answers when processing the rules defined shortly below. + +```yaml +quality_control: + golden_set: + history_size: 10 +``` + +Use the key `ban_rules` under `quality_control` to ban workers based on their performance against the assignments with known answers. + +| Key | Value | Description | +|:----------------------|:--------|:------------------------------------------------------------------------------------| +| `incorrect_threshold` | integer | Percentage of incorrect assignments that will result in the worker getting banned. | +| `ban_duration` | integer | How long the worker will be banned from accessing the Task. | +| `ban_units` | string | Temporal unit that defines ban duration: `MINUTES`, `HOURS`, `DAYS` or `PERMANENT`. | + +The following example bans workers who fail 90% of the last 10 assignments with known answers for 7 days. + +```yaml +quality_control: + golden_set: + history_size: 10 + ban_rules: + incorrect_threshold: 90 + ban_duration: 7 + ban_units: DAYS +``` + +Use the key `reject_rules` under `quality_control` to reject all work from workers based on their performance against the assignments with known answers. + +| Key | Value | Description | +|:----------------------|:--------|:-----------------------------------------------------------------------------------------------------------| +| `incorrect_threshold` | integer | Percentage of incorrect assignments that will result in rejecting all assignments submitted by the worker. | + +The following example rejects all assignments submitted by workers who fail more than 50% of the last 10 assignments with known answers. + +```yaml +quality_control: + golden_set: + history_size: 10 + reject_rules: + incorrect_threshold: 50 +``` + +Use the key `approve_rules` under `quality_control` to accept all work from workers based on their performance against the assignments with known answers. + +| Key | Value | Description | +|:--------------------|:--------|:-----------------------------------------------------------------------------------------------------------| +| `correct_threshold` | integer | Percentage of correct assignments that will result in accepting all submitted assignments from the worker. | + +The following example accepts all assignments submitted by workers who answer correctly to more than 70% of the last 10 assignments with known answers. + +```yaml +quality_control: + golden_set: + history_size: 10 + approve_rules: + correct_threshold: 70 +``` + +Use the key `skill_rules` under `quality_control` to grant skills to workers based on their performance against the assignments with known answers. + +| Key | Value | Description | +|:--------------------|:--------|:---------------------------------------------------------------| +| `correct_threshold` | integer | Percentage of correct assignments needed to receive the skill. | +| `skill_id` | integer | A valid identifier for a skill. | +| `skill_value` | integer | A value associated with a skill. | + +The following example grants the skill 12345 with a value of 80 to all workers who answer correctly to more than 80% of the assignments with known answers. + +```yaml +quality_control: + golden_set: + history_size: 10 + skill_rules: + correct_threshold: 80 + skill_id: 12345 + skill_value: 80 +``` + +## Combining Tasks into Task Sequences + +One key functionality of 𝚊𝚋𝚞𝚕𝚊𝚏𝚒𝚊 is the creation of Task Sequences, which allow moving assignments from one Task to another. + +The connections between individual Tasks are defined in the YAML configuration under the top-level key `actions`. + +The following key/value pairs can be provided under the key `actions`. + +| Key | Value | Description | +|:---------------|:-----------|:------------------------------------------------------------------------------------------------------------------------------------| +| `on_submitted` | string | The [name](#naming-a-task) of the Task or Action to which submitted assignments should be sent. | +| `on_rejected` | string | The [name](#naming-a-task) of the Task or Action to which rejected assignments should be sent. | +| `on_accepted` | string | The [name](#naming-a-task) of the Task or Action to which accepted assignments should be sent. | +| `on_closed` | string | The [name](#naming-a-task) of the Task or Action to which assignments should be sent when the pool closes. | +| `on_result` | dictionary | A dictionary that maps a particular output value to the [name](#naming-a-task) of a Task or Action to which the assignment is sent. | + +The following example sets up three actions. All submitted assignment will be sent to a Task named `verification`. If an assignment is rejected, the assignment is sent to a Task named `annotation`. If the assignment is accepted, it will be sent to a Task named `segmentation`. + +```yaml +actions: + on_submitted: verification + on_rejected: annotation + on_accepted: segmentation +``` + +The following example illustrates the use of the `on_result` action. If the output value is `true`, the assignment will be sent to a Task named `next_task`. If the value is `false`, the task will be sent to a Task named `previous_task`. + +```yaml + on_result: + true: next_task + false: previous_task +``` + +## Processing Task outputs using Actions + +In 𝚊𝚋𝚞𝚕𝚊𝚏𝚒𝚊, Actions are used to process outputs from Tasks and other Actions. + +### Forward + +The Forward Action can be used to accept, reject and forward assignments *based on the output values*. + +To create a Forward Action, initialise a Forward object that points towards the YAML configuration file and the Tasks or Actions to which the assignments will be forwarded to. + +The following example creates a Forward action using a configuration file named `fwd_config.yaml` and a Toloka client named `client`. The argument `targets` takes the names of the *Python objects* (Tasks or Actions) to which the assignments will be forward to (`outline_img` and `classify_txt`). + +```python +from abulafia.actions import Forward + +fwd = Forward(configuration='fwd_config.yaml', + client=client, + targets=[outline_img, classify_txt]) +``` + +To configure the Forward Action, use the following top-level keys in the YAML configuration file. + +| Key | Value | Description | +|:------------|:-----------|:---------------------------------------------------------------------------------| +| `name` | string | A unique [name](#naming-a-task) for the Forward Action. | +| `data` | string | The name of the variable that contains the output data to be evaluated. | +| `on_result` | dictionary | A dictionary that maps outputs to actions. | +| `messages` | dictionary | A dictionary that maps outputs to messages for accepted or rejected assignments. | + +The following example defines a Forward Action named `fwd_results`, which processes incoming data stored under the variable `result`. + +If the variable `result` in the incoming data contains the value `text`, the assignment is forwarded to a Task named `classify_text`. Conversely, if the variable `result` contains the value `image`, the assignment will be forwarded to a Task named `outline_image`. + +```yaml +name: fwd_results +data: result +on_result: + text: classify_text + image: outline_image +``` + +The next example defines a Forward Action named `reject_accept` and uses the Forward Action to accept and reject incoming assignments based on their outputs. + +If the variable `classification` in the incoming data contains the value `correct`, the assignment is accepted. If the value is `incorrect`, the assignment is rejected. + +The messages associated with these outputs are defined under the top-level key `messages`. + +```yaml +name: reject_accept +data: classification +on_result: + correct: accept + incorrect: reject +messages: + correct: "Your assignment was classified as correct." + incorrect: "Your assignment was classified as incorrect." +``` + +The final example defines a Forward Action named `accept_fwd` and shows how to accept/reject and forward incoming assignments based on their outputs. + +If the variable `result` in the incoming data has the value `correct`, the assignment is accepted and forwarded to a Task named `classify_outlines`. If the value is `incorrect`, the assignment is rejected. If you want rejected assignments to be added automatically to the Task in which the incoming assignments originate, add the name of the Task under the key `on_rejected` when defining [actions](#combining-tasks-into-task-sequences). Finally, if the value is `human_error`, the assignment is accepted and forwarded to a Task named `fix_outlines`. + +The top-level key `messages` defines messages associated with all three outputs. + +```yaml +name: accept_fwd +data: result +on_result: + correct: + - accept + - classify_outlines + incorrect: reject + human_error: + - accept + - fix_outlines +messages: + correct: "Your assignment was classified as correct." + incorrect: "Your assignment was classified as incorrect." + human_error: "Your assignment contained some errors, but you will be paid for the work." +``` + +For more examples on using the Forward Action, see the file [`examples/action_demo.py`](examples/action_demo.py) and the associated YAML configuration files. + +### Aggregate + +The Aggregate Action can be used to aggregate outputs from crowdsourced workers using various algorithms implemented in the [*Crowd-Kit*](https://github.com/Toloka/crowd-kit/) library. + +To create an Aggregate Action, initialise an Aggregate object that points towards a YAML configuration file, a Task object that contains the output to be aggregated, and a [Forward](#forward) object that is used to process the results. + +The following example creates an Aggregate object using a configuration file named `agg_conf.yaml`. The argument `task` needs to be provided with the Task object that contains the outputs to be aggregated. The input for the argument `forward` is a Forward object, which will be used to process the aggregated results. + +```python +from abulafia.actions import Aggregate + +agg = Aggregate(configuration='agg_conf.yaml', + task=detect_text, + forward=fwd_agg_text) +``` + +The Aggregate Action may only be applied to Task outputs once the Task is complete and closed. To aggregate the outputs of a pool, provide the name of the Aggregate Action under the key top-level key [`actions`](#processing-task-outputs-using-actions) and the key `on_closed`. + +The following example applies an Aggregate Action named `aggregate_action` to the Task output when the task is completed. + +```yaml +actions: + on_closed: aggregate_action +``` + +To configure the Aggregate Action, use the following top-level keys in the YAML configuration file. + +| Key | Value | Description | +|:-----------|:-----------|:---------------------------------------------------------------------------------| +| `name` | string | A unique [name](#naming-a-task) for the Aggregate Action. | +| `method` | string | The name of the aggregation algorithm to be used. | +| `messages` | dictionary | An optional dictionary that maps particular outputs to messages for the workers. | + +The following aggregation methods are currently supported. Provide the name as the value for the `method` key. + +| Name | Method | +|:-------------------|:------------------------------------------------------------------------------------------------------------------------------------| +| `majority_vote` | [Majority vote](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.majority_vote.MajorityVote/) | +| `dawid_skene` | [Dawid-Skene](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.dawid_skene.DawidSkene/) | +| `mmsr` | [M-MSR](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.m_msr.MMSR/) | +| `wawa` | [Wawa](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.wawa.Wawa/) | +| `zero_based_skill` | [Zero-based skill](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.zero_based_skill.ZeroBasedSkill/) | +| `glad` | [GLAD](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.glad.GLAD/) | + +The following example defines an Aggregate Action named `agg_ds`, which uses the Dawid-Skene algorithm for aggregating the outputs. + +The top-level key `messages` defines three outputs and messages associated with them, which are added to input to the Forward Action. + +```yaml +name: agg_ds +method: dawid_skene +messages: + correct: "Your assignment was classified as correct." + incorrect: "Your assignment was classified as incorrect." + human_error: "Your assignment contained some errors, but you will be paid for the work." +``` + +### VerifyPolygon + +The VerifyPolygon Action can be used to check that the bounding boxes submitted by crowdsourced workers are valid, that is, the lines of a polygon do not cross each other. This is performed automatically using the [Shapely](https://pypi.org/project/shapely/) library. + +To create a VerifyPolygon Action, initialise a VerifyPolygon object that points towards a YAML configuration file, a Task object that contains the polygons to be validated, and a [Forward](#forward) object that is used to process the results. + +The following example creates a VerifyPolygon object using a configuration file named `verify.yaml`. The argument `task` needs to be provided with the Task object that contains the outputs to be aggregated. The input for the argument `forward` is a Forward object, which will be used to process the results. + +```python +from abulafia.actions import VerifyPolygon + +vp = VerifyPolygon(configuration='verify.yaml', + task=outline_objects, + forward=verify_fwd) +``` + +The VerifyPolygon Action may only applied to Task outputs once the Task is complete and closed. To verify the polygons submitted by workers, provide the name of the VerifyPolygon Action under the top-levle key [`actions`](#processing-task-outputs-using-actions) and the key `on_closed`. + +The following example applies a VerifyPolygon Action named `verify_polygon` to the Task outputs. + +```yaml +actions: + on_closed: verify_polygon +``` + +To configure the VerifyPolygon Action, use the following top-level keys in the YAML configuration file. + + +| Key | Value | Description | +|:---------|:-------|:-----------------------------------------------------------------------------------------------------------------------| +| `name` | string | A unique [name](#naming-a-task) for the VerifyPolygon Action. | +| `data` | string | The name of the variable that contains the output data to be validated. | +| `labels` | list | A list of strings or dictionaries that define bounding box labels and their counts that should be present in the data. | + +The following example creates a VerifyPolygon Action named `verify_poly`, which validates incoming bounding boxes stored under the variable `polygons`. The items provided in the list under the top-level key `labels` define that the incoming data must contain precisely one polygon labelled as `text` and an arbitrary number of polygons labelled as `graphics`. + +```yaml +name: verify_poly +data: polygons +labels: + - text: 1 + - graphics +``` + +For an additional example of using the VerifyPolygon Action, see the example in [segment_and_verify.py](segment_and_verify.py). + +### SeparateBBoxes + +The SeparateBBoxes Action can be used to separate groups of bounding boxes submitted by workers into individual bounding boxes. This Action is particularly useful if you need to ... + +To create a SeparateBBoxes Action, initialise a SeparateBBoxes object that points towards a YAML configuration file and a Task object to which the individual bounding boxes will be forwarded. + +The following example creates a SeparateBBox object using a configuration file named `sep.yaml`. The argument `target` defines the Task object to which the bounding boxes will be sent to. In this case, the bounding boxes are sent to a Task object named `describe_object`. + +```python +from abulafia.actions import SeparateBBoxes + +sp = SeparateBBoxes(configuration='sep.yaml', + target=describe_object) +``` + +Optionally, you can also add labels to the bounding boxes by providing the argument `add_label` to the `SeparateBBoxes` object. The label should be a string, as exemplified below. The following example adds the label `source` to each bounding box. + +```python +sp = SeparateBBoxes(configuration='sep.yaml', + target=describe_object, + add_label='source') +``` + +To configure the SeparateBBoxes Action, use the following top-level keys in the YAML configuration file. + + +| Key | Value | Description | +|:---------|:-----------|:-------------------------------------------------------------------------------------------------------------------------| +| `name` | string | A unique [name](#naming-a-task) for the Aggregate Action. | +| `data` | dictionary | A dictionary that defines the variable names that contain the images and bounding boxes within the incoming assignments. | + +The following example creates a SeparateBBoxes Action named `sep_boxes`. The top-devel key `data` is used to declare the variables that contain the images and bounding boxes among the incoming assignments. In this case, the images are found under the variable `img`, whereas the bounding boxes are stored under the variable `box`. + +```yaml +name: sep_boxes +data: + image: img + bboxes: box +``` + +If you wish to load the bounding boxes to be separated from a file, provide the key `file` that points towards a TSV file under the key `data`. The following example loads data from a file named `bboxes.csv`. In this case, the keys `image` and `bboxes` refer to names of the columns in the input TSV file. + +```yaml +data: + image: img + bboxes: box + file: bboxes.csv +``` + +## Tutorials +### Creating a Task for classifying images + +In this tutorial, we create a YAML configuration file for the `ImageClassification` class. + +This example breaks down how 𝚊𝚋𝚞𝚕𝚊𝚏𝚒𝚊 uses YAML files to create and configure Tasks. The complete configuration file referred to in this example may be found [here](config/classify_image.yaml). + +First we define a unique name for the Task under the key `name`. In this case, we call the Task *classify_images*. + +Next, we must provide information about the data and its structure under the key `data`. For this purpose, we define three additional keys: `file`, `input` and `output`. + +The value under the key `file` must point towards a TSV file containing the data to be loaded on the Toloka platform. The keys `input` and `output` contain key/value pairs that define the names of the input and output variables and their type. + +To exemplify, the input data consists of a URL, which can be found under the key `image`, as shown in the [TSV file](data/verify_image_data.tsv). The output data, in turn, consists of Boolean values stored under the variable *result*. + +```yaml +name: classify_images +data: + file: data/verify_image_data.tsv + input: + image: url + output: + result: bool +``` + +Next, we proceed to set up the user interface under the key `interface`. The key `prompt` defines the text that is positioned above the buttons for various labels. + +These labels are defined under the key `labels`. Each key under `labels` defines the value that will be stored when the worker selects the label, whereas the label defines what shown in the user interface. Here we set up two labels in the user interface, *Yes* and *No*, which store the values *true* and *false*, respectively. + +```yaml +interface: + prompt: "Does the image contain text, letters or numbers?" + labels: + true: "Yes" + false: "No" +``` + +After configuring the user interface, we proceed to set up a project on Toloka. In Toloka, user interfaces are associated with projects, which may contain multiple different pools with different tasks. + +To create a project, we provide the following information under the key `project`. The key `setup` contains two key/value pairs, `public_name` and `public_description`, which define basic information shown to workers on the platform. The key `instructions`, in turn, points towards an HTML file that contains instructions for completing the task. + +```yaml +project: + setup: + public_name: "Check if an image contains text, letters or numbers" + public_description: "Look at diagrams from science textbooks and state if they + contain text, letters or numbers." + instructions: instructions/detect_text_instructions.html +``` + +Next, we configure a pool within the project to which the assignments will be uploaded. This configuration is provided under the key `pool`. + +To begin with, we use the `estimated_time_per_suite` to estimate the time spent for completing each task suite (a group of one or more assignments) in seconds. This will allow 𝚊𝚋𝚞𝚕𝚊𝚏𝚒𝚊 to estimate whether the payment for the work is fair. + +Next, under the key `setup`, we provide a `private_name` for the pool, together with essential information. The key/value pairs `reward_per_assignment`, `assignment_max_duration_seconds` and `auto_accept_solutions` define the amount of money paid for each task suite, the maximum amount of time allowed for completing a task suite in seconds and whether the task suites submitted by workers should be accepted automatically. + +```yaml +pool: + estimated_time_per_suite: 10 + setup: + private_name: "Classify images" + reward_per_assignment: 0.034 + assignment_max_duration_seconds: 600 + auto_accept_solutions: true + defaults: + default_overlap_for_new_tasks: 1 + default_overlap_for_new_task_suites: 1 + mixer: + real_tasks_count: 1 + golden_tasks_count: 0 + training_tasks_count: 0 + filter: + client_type: + - TOLOKA_APP + - BROWSER +``` + +This finishes the configuration. We can now use this configuration to create an `ImageClassification` object, as illustrated in [here](classify_images.py). + +As shown on [line 35](classify_images.py#L35), we must provide a path to the configuration file using the parameter `configuration`. + +```python +classify_image = ImageClassification(configuration="config/classify_image.yaml", + client=tclient) +``` + +This object may be then added to a crowdsourcing pipeline, as shown on [line 39](classify_images.py#L39). + +```python +pipe = TaskSequence(sequence=[classify_image], client=tclient) +``` diff --git a/examples/action_demo.py b/examples/action_demo.py index 3d6c0a5..2b6ac2c 100644 --- a/examples/action_demo.py +++ b/examples/action_demo.py @@ -1,8 +1,7 @@ # -*- coding: utf-8 -*- from abulafia.actions import Forward, Aggregate, SeparateBBoxes -from abulafia.task_specs import ImageSegmentation, TaskSequence, MulticlassVerification, FixImageSegmentation, \ - SegmentationClassification, ImageClassification +from abulafia.task_specs import TaskSequence, ImageClassification, ImageSegmentation, SegmentationClassification import argparse import json import toloka.client as toloka @@ -29,49 +28,75 @@ with open(cred_file) as cred_f: creds = json.loads(cred_f.read()) - tclient = toloka.TolokaClient(creds['token'], creds['mode']) + client = toloka.TolokaClient(creds['token'], creds['mode']) -# Create class instances of all CrowdsourcingTasks and Actions in the pipeline +# Create instances of all CrowdsourcingTasks and Actions in the pipeline # Binary image classification task for identifying possible text in diagrams -detect_text = ImageClassification(configuration="config/detect_text.yaml", client=tclient) +detect_text = ImageClassification(configuration="config/action_demo/detect_text.yaml", + client=client) # Image segmentation task asking the worker to outline text elements from diagrams -outline_text = ImageSegmentation(configuration="config/outline_text.yaml", client=tclient) - -# Forward action that forwards all tasks with output "True" from the detext_text pool to outline_text pool -forward_detect = Forward(configuration="config/forward_detect.yaml", client=tclient, targets=[outline_text]) - -# Aggregate action that determines the most probable correct outputs for the detect_text pool. Aggregated -# tasks are then forwarded with the forward_detect action defined above. -aggregate_detect = Aggregate(configuration="config/aggregate_detect.yaml", task=detect_text, forward=forward_detect) - -# Verification task asking the workers to determine if image segmentations from pool outline_text are done correctly -verify_outlines = MulticlassVerification(configuration="config/verify_outlines.yaml", client=tclient) +outline_text = ImageSegmentation(configuration="config/action_demo/outline_text.yaml", + client=client) + +# Forward action that forwards all tasks with output "True" from the detect_text pool to +# outline_text pool +forward_detect = Forward(configuration="config/action_demo/forward_detect.yaml", + client=client, + targets=[outline_text]) + +# Aggregate action that determines the most probable correct outputs for the detect_text pool. +# Aggregated tasks are then forwarded with the forward_detect action defined above. +aggregate_detect = Aggregate(configuration="config/action_demo/aggregate_detect.yaml", + task=detect_text, + forward=forward_detect) + +# Verification task asking the workers to determine if image segmentations from pool outline_text +# are done correctly +verify_outlines = SegmentationClassification(configuration="config/action_demo/verify_outlines.yaml", + client=client) # Pool where partially correct image segmentations go from forwarding -fix_outlines = FixImageSegmentation(configuration="config/fix_outlines.yaml", client=tclient) - -# Binary segmentation classification task where workers identify potential targets for the outlined text elements -has_target = SegmentationClassification(configuration="config/detect_target.yaml", client=tclient) - -# SeparateBBoxes action to separate bounding boxes from the pool outline_text and create new tasks to pool has_target -separate_bboxes = SeparateBBoxes(configuration="config/separate_bboxes.yaml", target=has_target, add_label="source") - -# Forward action to forward outline_text pool results based on the results from the verification pool verify_outlines. -# Tasks are either forwarded to be corrected by another worker, rejected, or accepted and forwarded to the action -# separate_bboxes. -forward_verify = Forward(configuration="config/forward_verify.yaml", client=tclient, targets=[fix_outlines, - separate_bboxes]) - -# Aggregate action to determine most probable correct answers to the verificatoin task verify_outlines. -# After aggregation, tasks are forwarded with forward_verify. -aggregate_verify = Aggregate(configuration="config/aggregate_verify.yaml", task=verify_outlines, forward=forward_verify) +fix_outlines = ImageSegmentation(configuration="config/action_demo/fix_outlines.yaml", + client=client) + +# Binary segmentation classification task where workers identify potential targets for the outlined +# text elements +has_target = SegmentationClassification(configuration="config/action_demo/detect_target.yaml", + client=client) + +# SeparateBBoxes action to separate bounding boxes from the pool outline_text and create new tasks +# to pool has_target. Also add the label 'source' to every bounding box. +separate_bboxes = SeparateBBoxes(configuration="config/action_demo/separate_bboxes.yaml", + target=has_target, + add_label="source") + +# Forward action to forward outline_text pool results based on the results from the verification +# pool verify_outlines. Tasks are either forwarded to be corrected by another worker, rejected, +# or accepted and forwarded to the action separate_bboxes. +forward_verify = Forward(configuration="config/action_demo/forward_verify.yaml", + client=client, + targets=[fix_outlines, separate_bboxes]) + +# Aggregate action to determine most probable correct answers to the verification task +# verify_outlines. After aggregation, tasks are forwarded with forward_verify. +aggregate_verify = Aggregate(configuration="config/action_demo/aggregate_verify.yaml", + task=verify_outlines, + forward=forward_verify) # Combine the tasks and actions into one pipeline -pipe = TaskSequence(sequence=[detect_text, aggregate_detect, forward_detect, outline_text, verify_outlines, - aggregate_verify, forward_verify, fix_outlines, separate_bboxes, has_target], - client=tclient) +pipe = TaskSequence(sequence=[detect_text, + aggregate_detect, + forward_detect, + outline_text, + verify_outlines, + aggregate_verify, + forward_verify, + fix_outlines, + separate_bboxes, + has_target], + client=client) # Start the task sequence; create the tasks on Toloka pipe.start() diff --git a/examples/annotate_text.py b/examples/annotate_text.py new file mode 100644 index 0000000..288714d --- /dev/null +++ b/examples/annotate_text.py @@ -0,0 +1,40 @@ +# -*- coding: utf-8 -*- + +from abulafia.task_specs import TaskSequence, TextAnnotation +import argparse +import json +import toloka.client as toloka + + +# Set up the argument parser +ap = argparse.ArgumentParser() + +# Add argument for input +ap.add_argument("-c", "--creds", required=True, + help="Path to a JSON file that contains Toloka credentials. " + "The file should have two keys: 'token' and 'mode'. " + "The key 'token' should contain the Toloka API key, whereas " + "the key 'mode' should have the value 'PRODUCTION' or 'SANDBOX' " + "that defines the environment in which the pipeline should be run.") + +# Parse the arguments +args = vars(ap.parse_args()) + +# Assign arguments to variables +cred_file = args['creds'] + +# Read the credentials from the JSON file +with open(cred_file) as cred_f: + + creds = json.loads(cred_f.read()) + tclient = toloka.TolokaClient(creds['token'], creds['mode']) + +# Define a text annotation task +annotate_text = TextAnnotation(configuration="config/annotate_text.yaml", + client=tclient) + +# Add the task into a pipeline +pipe = TaskSequence(sequence=[annotate_text], client=tclient) + +# Start the task sequence; create the task on Toloka +pipe.start() diff --git a/examples/classify_images.py b/examples/classify_images.py new file mode 100644 index 0000000..7cc2cdf --- /dev/null +++ b/examples/classify_images.py @@ -0,0 +1,42 @@ +# -*- coding: utf-8 -*- + +from abulafia.actions import Forward, VerifyPolygon +from abulafia.task_specs import ImageSegmentation, TaskSequence, MulticlassVerification, FixImageSegmentation, \ + SegmentationClassification, ImageClassification +import argparse +import json +import toloka.client as toloka + + +# Set up the argument parser +ap = argparse.ArgumentParser() + +# Add argument for input +ap.add_argument("-c", "--creds", required=True, + help="Path to a JSON file that contains Toloka credentials. " + "The file should have two keys: 'token' and 'mode'. " + "The key 'token' should contain the Toloka API key, whereas " + "the key 'mode' should have the value 'PRODUCTION' or 'SANDBOX' " + "that defines the environment in which the pipeline should be run.") + +# Parse the arguments +args = vars(ap.parse_args()) + +# Assign arguments to variables +cred_file = args['creds'] + +# Read the credentials from the JSON file +with open(cred_file) as cred_f: + + creds = json.loads(cred_f.read()) + tclient = toloka.TolokaClient(creds['token'], creds['mode']) + +# Define an image classification task +classify_image = ImageClassification(configuration="config/classify_image.yaml", + client=tclient) + +# Add the task into a pipeline +pipe = TaskSequence(sequence=[classify_image], client=tclient) + +# Start the task sequence; create the task on Toloka +pipe.start() diff --git a/examples/classify_outlines.py b/examples/classify_outlines.py new file mode 100644 index 0000000..385ba7d --- /dev/null +++ b/examples/classify_outlines.py @@ -0,0 +1,41 @@ +# -*- coding: utf-8 -*- + +from abulafia.actions import Forward, VerifyPolygon +from abulafia.task_specs import TaskSequence, SegmentationClassification +import argparse +import json +import toloka.client as toloka + + +# Set up the argument parser +ap = argparse.ArgumentParser() + +# Add argument for input +ap.add_argument("-c", "--creds", required=True, + help="Path to a JSON file that contains Toloka credentials. " + "The file should have two keys: 'token' and 'mode'. " + "The key 'token' should contain the Toloka API key, whereas " + "the key 'mode' should have the value 'PRODUCTION' or 'SANDBOX' " + "that defines the environment in which the pipeline should be run.") + +# Parse the arguments +args = vars(ap.parse_args()) + +# Assign arguments to variables +cred_file = args['creds'] + +# Read the credentials from the JSON file +with open(cred_file) as cred_f: + + creds = json.loads(cred_f.read()) + tclient = toloka.TolokaClient(creds['token'], creds['mode']) + +# Define an image classification task +classify_outlines = SegmentationClassification(configuration="config/classify_segmentation.yaml", + client=tclient) + +# Add the task into a pipeline +pipe = TaskSequence(sequence=[classify_outlines], client=tclient) + +# Start the task sequence; create the task on Toloka +pipe.start() diff --git a/examples/classify_text.py b/examples/classify_text.py new file mode 100644 index 0000000..de2a858 --- /dev/null +++ b/examples/classify_text.py @@ -0,0 +1,40 @@ +# -*- coding: utf-8 -*- + +from abulafia.task_specs import TaskSequence, TextClassification +import argparse +import json +import toloka.client as toloka + + +# Set up the argument parser +ap = argparse.ArgumentParser() + +# Add argument for input +ap.add_argument("-c", "--creds", required=True, + help="Path to a JSON file that contains Toloka credentials. " + "The file should have two keys: 'token' and 'mode'. " + "The key 'token' should contain the Toloka API key, whereas " + "the key 'mode' should have the value 'PRODUCTION' or 'SANDBOX' " + "that defines the environment in which the pipeline should be run.") + +# Parse the arguments +args = vars(ap.parse_args()) + +# Assign arguments to variables +cred_file = args['creds'] + +# Read the credentials from the JSON file +with open(cred_file) as cred_f: + + creds = json.loads(cred_f.read()) + tclient = toloka.TolokaClient(creds['token'], creds['mode']) + +# Define a text classification task +classify_image = TextClassification(configuration="config/classify_text.yaml", + client=tclient) + +# Add the task into a pipeline +pipe = TaskSequence(sequence=[classify_image], client=tclient) + +# Start the task sequence; create the task on Toloka +pipe.start() diff --git a/examples/config/action_demo/aggregate_detect.yaml b/examples/config/action_demo/aggregate_detect.yaml new file mode 100644 index 0000000..c8d4337 --- /dev/null +++ b/examples/config/action_demo/aggregate_detect.yaml @@ -0,0 +1,2 @@ +name: aggregate_detect +method: dawid_skene \ No newline at end of file diff --git a/examples/config/action_demo/aggregate_verify.yaml b/examples/config/action_demo/aggregate_verify.yaml new file mode 100644 index 0000000..0e6bab5 --- /dev/null +++ b/examples/config/action_demo/aggregate_verify.yaml @@ -0,0 +1,6 @@ +name: aggregate_verify +method: dawid_skene +messages: + correct: "Your work was verified as correct by other workers." + incorrect: "Your work was verified as incorrect by other workers." + human_error: "Your work was verified as partially correct by other users. You will be paid for the work." \ No newline at end of file diff --git a/examples/config/detect_target.yaml b/examples/config/action_demo/detect_target.yaml similarity index 86% rename from examples/config/detect_target.yaml rename to examples/config/action_demo/detect_target.yaml index 28227e1..d493e15 100644 --- a/examples/config/detect_target.yaml +++ b/examples/config/action_demo/detect_target.yaml @@ -8,11 +8,16 @@ data: interface: prompt: Read the instructions first. Does the text refer to another part of the diagram? labels: - source: Source + true: "Yes" + false: "No" + segmentation: + labels: + source: Source project: setup: public_name: "Examine diagrams from school textbooks" - public_description: "In this task you will be shown diagrams from school textbooks. Your task is to examine the diagrams and reason about their content." + public_description: "In this task you will be shown diagrams from school textbooks. + Your task is to examine the diagrams and reason about their content." instructions: instructions/detect_target_instructions.html pool: estimated_time_per_suite: 50 diff --git a/examples/config/detect_text.yaml b/examples/config/action_demo/detect_text.yaml similarity index 96% rename from examples/config/detect_text.yaml rename to examples/config/action_demo/detect_text.yaml index efeece4..5b7f833 100644 --- a/examples/config/detect_text.yaml +++ b/examples/config/action_demo/detect_text.yaml @@ -9,6 +9,9 @@ actions: on_closed: aggregate_detect interface: prompt: Does the diagram contain text, letters or numbers? + labels: + true: "Yes" + false: "No" project: setup: public_name: Check if an image contains text, letters or numbers diff --git a/examples/config/fix_outlines.yaml b/examples/config/action_demo/fix_outlines.yaml similarity index 100% rename from examples/config/fix_outlines.yaml rename to examples/config/action_demo/fix_outlines.yaml diff --git a/examples/config/action_demo/forward_detect.yaml b/examples/config/action_demo/forward_detect.yaml new file mode 100644 index 0000000..b1117d3 --- /dev/null +++ b/examples/config/action_demo/forward_detect.yaml @@ -0,0 +1,8 @@ +name: forward_detect # Name of the action: this is how pools and actions refer to each other +data: result # The variable in the incoming data that contains the output to be evaluated +on_result: # Keys: possible outputs to the task + # Values: pools to which assignments with the output are forwarded + # Special cases: value "accept" automatically accepts task, "reject" automatically rejects task + # empty value simply submits the task (does not forward, reject or accept) + true: outline_text + false: diff --git a/examples/config/action_demo/forward_verify.yaml b/examples/config/action_demo/forward_verify.yaml new file mode 100644 index 0000000..987520a --- /dev/null +++ b/examples/config/action_demo/forward_verify.yaml @@ -0,0 +1,14 @@ +name: forward_verify +data: result +on_result: + correct: + - accept + - separate_bboxes + incorrect: reject + human_error: + - accept + - fix_outlines +messages: + correct: "The outlines were evaluated as correct." + incorrect: "The outlines were evaluated as incorrect." + human_error: "The outlines were evaluated as correct, although they contained human errors. You will be paid for the work." diff --git a/examples/config/outline_text.yaml b/examples/config/action_demo/outline_text.yaml similarity index 94% rename from examples/config/outline_text.yaml rename to examples/config/action_demo/outline_text.yaml index c0de84c..d5663d9 100644 --- a/examples/config/outline_text.yaml +++ b/examples/config/action_demo/outline_text.yaml @@ -5,11 +5,14 @@ data: output: outlines: json actions: - on_submitted: verify_polygon + on_submitted: verify_outlines on_rejected: outline_text interface: prompt: Read the instructions first. Then outline all diagram elements that contain text, letters or numbers. + tools: + - rectangle + - polygon project: setup: public_name: Outline text, letters or numbers in diagrams diff --git a/examples/config/action_demo/separate_bboxes.yaml b/examples/config/action_demo/separate_bboxes.yaml new file mode 100644 index 0000000..1953f8b --- /dev/null +++ b/examples/config/action_demo/separate_bboxes.yaml @@ -0,0 +1,4 @@ +name: separate_bboxes +data: + image: image + bboxes: outlines \ No newline at end of file diff --git a/examples/config/verify_outlines.yaml b/examples/config/action_demo/verify_outlines.yaml similarity index 88% rename from examples/config/verify_outlines.yaml rename to examples/config/action_demo/verify_outlines.yaml index 393ca61..784c2bb 100644 --- a/examples/config/verify_outlines.yaml +++ b/examples/config/action_demo/verify_outlines.yaml @@ -1,18 +1,19 @@ name: verify_outlines data: + verify: true input: image: url outlines: json output: result: str -options: - correct: "Yes" - incorrect: "No" - human_error: "Human error" actions: on_closed: aggregate_verify interface: prompt: Read the instructions first. Is the diagram annotated according to the instructions? + labels: + correct: "Yes" + incorrect: "No" + human_error: "Human error" project: setup: public_name: Verify outlines for text, letters or numbers in diagrams @@ -41,8 +42,8 @@ pool: - BROWSER - TOLOKA_APP quality_control: - speed_quality_balance: - top_percentage_by_quality: 40 + # speed_quality_balance: + # top_percentage_by_quality: 40 fast_responses: history_size: 5 count: 3 diff --git a/examples/config/aggregate_detect.yaml b/examples/config/aggregate_detect.yaml deleted file mode 100644 index 9197cc1..0000000 --- a/examples/config/aggregate_detect.yaml +++ /dev/null @@ -1,6 +0,0 @@ -name: aggregate_detect -source: detect_text # Pool from which tasks are forwarded to the Aggregate-action -actions: # What happens to tasks after aggregation - on_result: forward_detect -method: dawid_skene # Aggregation method. For options, see https://toloka.ai/en/docs/crowd-kit/ - # For now, methods for categorical responses are supported \ No newline at end of file diff --git a/examples/config/aggregate_verify.yaml b/examples/config/aggregate_verify.yaml deleted file mode 100644 index 9232db4..0000000 --- a/examples/config/aggregate_verify.yaml +++ /dev/null @@ -1,6 +0,0 @@ -name: aggregate_verify -source: verify_outlines # Pool from which tasks are forwarded to the Aggregate-action -actions: # What happens to tasks after aggregation - on_result: forward_verify -method: dawid_skene # Aggregation method. For options, see https://toloka.ai/en/docs/crowd-kit/ - # For now, methods for categorical responses are supported \ No newline at end of file diff --git a/examples/config/annotate_text.yaml b/examples/config/annotate_text.yaml new file mode 100644 index 0000000..ea20970 --- /dev/null +++ b/examples/config/annotate_text.yaml @@ -0,0 +1,35 @@ +name: annotate_text +data: + file: data/classify_text_data.tsv + input: + text: str + output: + annotations: json +interface: + prompt: Annotate verbs, nouns and adjectives in the text below. + labels: + verb: Verb + noun: Noun + adj: Adjective +project: + setup: + public_name: Annotate texts for word classes + public_description: Annotate texts for word classes. + instructions: instructions/classify_text_instructions.html +pool: + estimated_time_per_suite: 60 + setup: + private_name: Classify text + reward_per_assignment: 0.2 + assignment_max_duration_seconds: 600 + auto_accept_solutions: true + defaults: + default_overlap_for_new_tasks: 1 + default_overlap_for_new_task_suites: 1 + mixer: + real_tasks_count: 1 + golden_tasks_count: 0 + training_tasks_count: 0 + filter: + languages: + - EN \ No newline at end of file diff --git a/examples/config/classify_image.yaml b/examples/config/classify_image.yaml new file mode 100644 index 0000000..a4f5744 --- /dev/null +++ b/examples/config/classify_image.yaml @@ -0,0 +1,36 @@ +name: classify_images +data: + file: data/image_data.tsv + input: + image: url + output: + result: bool +interface: + prompt: "Does the image contain text, letters or numbers?" + labels: + true: "Yes" + false: "No" +project: + setup: + public_name: "Check if an image contains text, letters or numbers" + public_description: "Look at diagrams from science textbooks and state if they + contain text, letters or numbers." + instructions: instructions/detect_text_instructions.html +pool: + estimated_time_per_suite: 10 + setup: + private_name: "Classify images" + reward_per_assignment: 0.034 + assignment_max_duration_seconds: 600 + auto_accept_solutions: true + defaults: + default_overlap_for_new_tasks: 1 + default_overlap_for_new_task_suites: 1 + mixer: + real_tasks_count: 1 + golden_tasks_count: 0 + training_tasks_count: 0 + filter: + client_type: + - TOLOKA_APP + - BROWSER diff --git a/examples/config/classify_segmentation.yaml b/examples/config/classify_segmentation.yaml new file mode 100644 index 0000000..b3c0e0f --- /dev/null +++ b/examples/config/classify_segmentation.yaml @@ -0,0 +1,44 @@ +name: classify_outlines +data: + file: data/segmentation_data.tsv + input: + image: url + outlines: json + targets: bool + output: + result: bool +interface: + prompt: "Is the image annotated correctly?" + checkbox: "There is nothing to annotate in this image." + segmentation: + labels: + source: "Source" + target: "Target" + labels: + true: "Yes" + false: "No" + verification: false +project: + setup: + public_name: "Verify outlines in images" + public_description: "Look at diagrams from science textbooks and verify if they have been + annotated correctly." + instructions: instructions/detect_text_instructions.html +pool: + estimated_time_per_suite: 10 + setup: + private_name: "Classify outlines" + reward_per_assignment: 0.034 + assignment_max_duration_seconds: 600 + auto_accept_solutions: true + defaults: + default_overlap_for_new_tasks: 1 + default_overlap_for_new_task_suites: 1 + mixer: + real_tasks_count: 3 + golden_tasks_count: 0 + training_tasks_count: 0 + filter: + client_type: + - TOLOKA_APP + - BROWSER diff --git a/examples/config/classify_text.yaml b/examples/config/classify_text.yaml index 4edcf1b..f8fb66f 100644 --- a/examples/config/classify_text.yaml +++ b/examples/config/classify_text.yaml @@ -5,16 +5,16 @@ data: text: str output: result: str -options: - positive: Positive - negative: Negative - neutral: Neutral interface: - prompt: Read the text and assign it to the most appropriate category. + prompt: Read the text and classify its sentiment. + labels: + positive: Positive + negative: Negative + neutral: Neutral project: setup: - public_name: Classify text into categories - public_description: Read the text and assign it to the most appropriate category. + public_name: Classify the sentiment of texts + public_description: Read the text and classify its sentiment. instructions: instructions/classify_text_instructions.html pool: estimated_time_per_suite: 60 diff --git a/examples/config/forward_detect.yaml b/examples/config/forward_detect.yaml deleted file mode 100644 index 903a739..0000000 --- a/examples/config/forward_detect.yaml +++ /dev/null @@ -1,12 +0,0 @@ -name: forward_detect # Name of the action: this is how pools and actions refer to each other -data: - output: result # What the action outputs -source: aggregate_detect # Pool/action from which tasks are forwarded to this action -actions: - on_result: # Keys: possible outputs to the task - # Values: pools to which assignments with the output are forwarded - # Special cases: value "accept" automatically accepts task, "reject" automatically rejects task - # empty value simply submits the task (does not forward, reject or accept) - true: outline_text - false: - \ No newline at end of file diff --git a/examples/config/forward_verify.yaml b/examples/config/forward_verify.yaml deleted file mode 100644 index dcc537e..0000000 --- a/examples/config/forward_verify.yaml +++ /dev/null @@ -1,16 +0,0 @@ -name: forward_verify # Name of the action: this is how pools and actions refer to each other -data: - output: result # What the action outputs -source: aggregate_verify # Pool/action from which tasks are forwarded to this action -actions: - on_result: # Keys: possible outputs to the task - # Values: pools to which assignments with the output are forwarded - # Special cases: value "accept" automatically accepts task, "reject" automatically rejects task - # empty value simply submits the task (does not forward, reject or accept) - correct: - - accept - - separate_bboxes - incorrect: reject - human_error: - - accept - - fix_outlines diff --git a/examples/config/outline_text_verify.yaml b/examples/config/outline_text_verify.yaml index 45cf6ba..e095311 100644 --- a/examples/config/outline_text_verify.yaml +++ b/examples/config/outline_text_verify.yaml @@ -1,6 +1,6 @@ name: outline_text data: - file: data/verify_image_data.tsv + file: data/image_data.tsv input: image: url output: diff --git a/examples/config/segment_image.yaml b/examples/config/segment_image.yaml new file mode 100644 index 0000000..21f2fb4 --- /dev/null +++ b/examples/config/segment_image.yaml @@ -0,0 +1,40 @@ +name: segment_images +data: + file: data/segmentation_data.tsv + input: + image: url + output: + outlines: json + no_objects: bool +interface: + prompt: "Outline all elements with text, letters or numbers." + tools: + - rectangle + - polygon + labels: + text: "Text" + letter: "Letter" + number: "Number" + checkbox: "Check this box if there is nothing to outline." +project: + setup: + public_name: "Outline text, letters and numbers in images" + public_description: "Look at diagrams from science textbooks and outline text, letters and numbers." + instructions: instructions/detect_text_instructions.html +pool: + estimated_time_per_suite: 10 + setup: + private_name: "Segment images" + reward_per_assignment: 0.034 + assignment_max_duration_seconds: 600 + auto_accept_solutions: true + defaults: + default_overlap_for_new_tasks: 1 + default_overlap_for_new_task_suites: 1 + mixer: + real_tasks_count: 1 + golden_tasks_count: 0 + training_tasks_count: 0 + filter: + client_type: + - BROWSER diff --git a/examples/config/separate_bboxes.yaml b/examples/config/separate_bboxes.yaml deleted file mode 100644 index d9580d3..0000000 --- a/examples/config/separate_bboxes.yaml +++ /dev/null @@ -1,3 +0,0 @@ -name: separate_bboxes -data: - output: result \ No newline at end of file diff --git a/examples/config/verify_polygon.yaml b/examples/config/verify_polygon.yaml index 0d11be7..d4dc6fd 100644 --- a/examples/config/verify_polygon.yaml +++ b/examples/config/verify_polygon.yaml @@ -1,6 +1,5 @@ name: verify_polygon -data: - output: outlines # Provide the variable name used for the bounding boxes +data: outlines # Provide the variable name used for the bounding boxes labels: # If you want to validate labels for bounding boxes, use a list - text: 1 - image \ No newline at end of file diff --git a/examples/data/verify_image_data.tsv b/examples/data/image_data.tsv similarity index 100% rename from examples/data/verify_image_data.tsv rename to examples/data/image_data.tsv diff --git a/examples/data/segmentation_data.tsv b/examples/data/segmentation_data.tsv new file mode 100644 index 0000000..adedc0e --- /dev/null +++ b/examples/data/segmentation_data.tsv @@ -0,0 +1,4 @@ +image outlines targets +https://s3.datacloud.helsinki.fi/crowdsrc:ai2d/2764.png "[{""shape"": ""rectangle"", ""left"": 0.4284493875327437, ""top"": 0.9540226714520437, ""width"": 0.15762586324618427, ""height"": 0.034259330778690145, ""label"": ""source""}, {""shape"": ""polygon"", ""points"": [{""left"": 0.5111378863475986, ""top"": 0.7831976606437732}, {""left"": 0.4847909488109421, ""top"": 0.9409208241285196}, {""left"": 0.5132886567587542, ""top"": 0.9402039006581345}, {""left"": 0.5353340534730994, ""top"": 0.9272992781912007}, {""left"": 0.5455502129260889, ""top"": 0.9143946557242668}, {""left"": 0.5606056058041783, ""top"": 0.8799823291457767}, {""left"": 0.561143298406967, ""top"": 0.8541730842119091}, {""left"": 0.5579171427902335, ""top"": 0.8305146096891971}, {""left"": 0.5412486721037776, ""top"": 0.8018376708737887}, {""left"": 0.5256555866228991, ""top"": 0.7874992014660844}], ""label"": ""target""}]" false +https://s3.datacloud.helsinki.fi/crowdsrc:ai2d/3194.png "[{""shape"": ""rectangle"", ""left"": 0.43443306761602496, ""top"": 0.31435508369284804, ""width"": 0.06575584141693347, ""height"": 0.034250466219265474, ""label"": ""source""}, {""shape"": ""polygon"", ""points"": [{""left"": 0.3779473245284631, ""top"": 0.5263191256681407}, {""left"": 0.3882049508349794, ""top"": 0.5144975830030077}, {""left"": 0.3948947071218379, ""top"": 0.51534197890766}, {""left"": 0.39712462588412406, ""top"": 0.5263191256681407}, {""left"": 0.40247643091361085, ""top"": 0.5432070437611879}, {""left"": 0.40203044716115355, ""top"": 0.5634725454728445}, {""left"": 0.4064902846857259, ""top"": 0.5786716717565871}, {""left"": 0.4064902846857259, ""top"": 0.5989371734682436}, {""left"": 0.4073822521906404, ""top"": 0.6192026751799004}, {""left"": 0.40336839841852534, ""top"": 0.639468176891557}, {""left"": 0.40292241466606804, ""top"": 0.6555116990799518}, {""left"": 0.39935454464641024, ""top"": 0.669866429459042}, {""left"": 0.39311077211200895, ""top"": 0.6774659926009132}, {""left"": 0.38508306456777874, ""top"": 0.669866429459042}, {""left"": 0.37839330828092027, ""top"": 0.6597336786032135}, {""left"": 0.38508306456777874, ""top"": 0.620891466989205}, {""left"": 0.38151519454812094, ""top"": 0.5676945249961064}, {""left"": 0.3779473245284631, ""top"": 0.5474290232844498}], ""label"": ""target""}]" false +https://s3.datacloud.helsinki.fi/crowdsrc:ai2d/3533.png "[{""shape"": ""rectangle"", ""left"": 0.8275615212527964, ""top"": 0.22073079791200595, ""width"": 0.1082343193305163, ""height"": 0.046234153616703966, ""label"": ""source""}, {""shape"": ""polygon"", ""points"": [{""left"": 0.36635357006491026, ""top"": 0.1228385970654012}, {""left"": 0.3733597580562071, ""top"": 0.1992036218840343}, {""left"": 0.4259061679909331, ""top"": 0.27011400207276504}, {""left"": 0.4731979369321866, ""top"": 0.3382970599465446}, {""left"": 0.5012226888973739, ""top"": 0.4528445971744942}, {""left"": 0.5292474408625611, ""top"": 0.6055746468117603}, {""left"": 0.5415082698473306, ""top"": 0.7201221840397098}, {""left"": 0.562526833821221, ""top"": 0.8455790105274642}, {""left"": 0.5905515857864082, ""top"": 0.8783068783068785}, {""left"": 0.6360918077298375, ""top"": 0.8783068783068785}, {""left"": 0.6588619187015522, ""top"": 0.8401243658975619}, {""left"": 0.709656781638454, ""top"": 0.7364861179294171}, {""left"": 0.7394330806014655, ""top"": 0.5919380352370044}, {""left"": 0.7464392685927623, ""top"": 0.4746631756941036}, {""left"": 0.7446877215949381, ""top"": 0.40375279550537285}, {""left"": 0.7359299866058171, ""top"": 0.31375115911198387}, {""left"": 0.7201660636253993, ""top"": 0.2592047128129603}, {""left"": 0.6903897646623878, ""top"": 0.1637484317896689}, {""left"": 0.6150732437559471, ""top"": 0.09011072928598703}, {""left"": 0.5117319708843191, ""top"": 0.07101947308132876}, {""left"": 0.4469247319648236, ""top"": 0.07647411771123112}, {""left"": 0.39262677503227333, ""top"": 0.0928380516009382}], ""label"": ""target""}]" false \ No newline at end of file diff --git a/examples/verify_demo.py b/examples/segment_and_verify.py similarity index 89% rename from examples/verify_demo.py rename to examples/segment_and_verify.py index 153df3f..c18c4d2 100644 --- a/examples/verify_demo.py +++ b/examples/segment_and_verify.py @@ -1,8 +1,7 @@ # -*- coding: utf-8 -*- from abulafia.actions import Forward, VerifyPolygon -from abulafia.task_specs import ImageSegmentation, TaskSequence, MulticlassVerification, FixImageSegmentation, \ - SegmentationClassification, ImageClassification +from abulafia.task_specs import ImageSegmentation, TaskSequence import argparse import json import toloka.client as toloka @@ -37,11 +36,12 @@ outline_text = ImageSegmentation(configuration="config/outline_text_verify.yaml", client=tclient) -# Forward action +# Set up a Forward action for processing the pool outputs forward_detect = Forward(configuration="config/forward_verify_polygon.yaml", client=tclient, targets=[outline_text]) +# Set up a VerifyPolygon action to validate the polygons created by the crowdsourced workers verify_polygon = VerifyPolygon(configuration="config/verify_polygon.yaml", task=outline_text, forward=forward_detect) diff --git a/examples/segment_image.py b/examples/segment_image.py new file mode 100644 index 0000000..1525163 --- /dev/null +++ b/examples/segment_image.py @@ -0,0 +1,40 @@ +# -*- coding: utf-8 -*- + +from abulafia.task_specs import ImageSegmentation, TaskSequence +import argparse +import json +import toloka.client as toloka + + +# Set up the argument parser +ap = argparse.ArgumentParser() + +# Add argument for input +ap.add_argument("-c", "--creds", required=True, + help="Path to a JSON file that contains Toloka credentials. " + "The file should have two keys: 'token' and 'mode'. " + "The key 'token' should contain the Toloka API key, whereas " + "the key 'mode' should have the value 'PRODUCTION' or 'SANDBOX' " + "that defines the environment in which the pipeline should be run.") + +# Parse the arguments +args = vars(ap.parse_args()) + +# Assign arguments to variables +cred_file = args['creds'] + +# Read the credentials from the JSON file +with open(cred_file) as cred_f: + + creds = json.loads(cred_f.read()) + tclient = toloka.TolokaClient(creds['token'], creds['mode']) + +# Define an image classification task +segment_image = ImageSegmentation(configuration="config/segment_image.yaml", + client=tclient) + +# Add the task into a pipeline +pipe = TaskSequence(sequence=[segment_image], client=tclient) + +# Start the task sequence; create the task on Toloka +pipe.start() diff --git a/pyproject.toml b/pyproject.toml index b979e46..3d266d6 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "abulafia" -version = "0.1.11" +version = "0.2.1" description = "A tool for fair and reproducible crowdsourcing using Toloka" readme = "README.md" requires-python = ">=3.8" @@ -14,10 +14,10 @@ authors = [ {name = "Rosa Suviranta"} ] dependencies = [ - "crowd_kit>=1.0.0", + "crowd_kit>=1.2.1", "pytest", - "PyYAML>=6.0", - "shapely>=2.0.1", + "PyYAML==6.0", + "shapely==2.0.1", "toloka_kit[pandas]>=1.0.0", - "wasabi>=0.9.0" + "wasabi==1.1.1" ] diff --git a/requirements.txt b/requirements.txt index 42026e2..0cd967a 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,8 +1,9 @@ --find-links https://download.pytorch.org/whl/torch_stable.html -torch==1.13.1 -crowd_kit==1.0.0 -pandas==1.4.1 -pytest==7.1.0 +torch==2.0.1 +crowd_kit==1.2.1 +pandas==2.0.3 +pytest==7.4.0 PyYAML==6.0 -toloka_kit==0.1.26 -wasabi==0.9.0 +shapely==2.0.1 +toloka_kit==1.4.1 +wasabi==1.1.1 \ No newline at end of file diff --git a/src/abulafia/actions/actions.py b/src/abulafia/actions/actions.py index 4effe36..550390c 100644 --- a/src/abulafia/actions/actions.py +++ b/src/abulafia/actions/actions.py @@ -29,98 +29,10 @@ warn = f.getvalue() if warn.startswith("None of PyTorch"): - msg.warn(f"Could not find a working installation of PyTorch or TensorFlow, one of which is " + msg.fail(f"Could not find a working installation of PyTorch or TensorFlow, one of which is " f"needed for the crowd-kit aggregators to function. Cancelling pipeline.", exits=1) -class Verify: - """ - This class defines an action for manually verifying crowdsourced answers by showing them to - other crowdsourced workers. - """ - def __init__(self, configuration, task): - """ - This function initialises the manual verification mechanism. - - Parameters: - configuration: A string object that defines a path to a YAML file with configuration. - task: An object that inherits from the CrowdsourcingTask class. - - Returns: - None - """ - self.conf = read_configuration(configuration) - self.name = self.conf['name'] - self.task = task - self.client = self.task.client - self.queue = collections.defaultdict(list) - self.aggregator = None - - def __call__(self, events: List[AssignmentEvent]) -> None: - - # Loop over the list of incoming AssignmentEvent objects - for event in events: - - # Zip and iterate over tasks and solutions in each event - for task, solution in zip(event.assignment.tasks, event.assignment.solutions): - - # Retrieve the answer - answer = solution.output_values[self.conf['data']['output']] - - # Add the answer to the queue under assignment id - self.queue[task.input_values['assignment_id']].append(answer) - - # Set up a placeholder for processed task suites - processed = [] - - # Loop over the assignments in the queue - for assignment_id, results in self.queue.items(): - - try: - - # Accept the task suite if all assignments in the suite have been verified as correct - if all(results) is True: - - try: - - self.client.accept_assignment(assignment_id=assignment_id, - public_comment=self.conf['messages']['accepted']) - - msg.good(f'Accepted assignment {assignment_id}') - - except IncorrectActionsApiError: - - msg.fail(f'Failed to accept assignment {assignment_id}!') - - # Reject the task suite if all assignments in the suite have not been verified as correct - if all(results) is not True: - - try: - - self.client.reject_assignment(assignment_id=assignment_id, - public_comment=self.conf['messages']['rejected']) - - msg.warn(f'Rejected assignment {assignment_id}') - - except IncorrectActionsApiError: - - msg.fail(f'Failed to reject assignment {assignment_id}!') - - # Catch the error that might be raised by manually accepting/rejecting tasks in - # the web interface - except IncorrectActionsApiError: - - msg.fail(f'Could not {"accept" if all(results) == True else "reject"} assignment {assignment_id}!') - - # Append the task suite to the list of processed suites - processed.append(assignment_id) - - # Delete the assignment from the list of processed task suites - for assignment_id in processed: - - processed.remove(assignment_id) - - class Aggregate: """ This class can be used to aggregate crowdsourced answers using various algorithms. @@ -140,6 +52,7 @@ def __init__(self, configuration, task, forward=None): self.name = self.conf['name'] self.forward = forward + self.messages = self.conf['messages'] if 'messages' in self.conf else None self.majority_vote = True if self.conf['method'] == 'majority_vote' else False self.dawid_skene = True if self.conf['method'] == 'dawid_skene' else False @@ -161,6 +74,7 @@ def __call__(self, pool: toloka.Pool) -> None: toloka.Assignment.ACCEPTED])) if assignments: + a_dict = {"task": [], "inputs": [], "label": [], "worker": [], "id": []} input_data = list(self.task.data_conf['input'].keys())[0] @@ -208,12 +122,14 @@ def __call__(self, pool: toloka.Pool) -> None: assert self.result is not None, raise_error("Aggregation did not produce a result!") - forward_data = [{"id": df.loc[df["task"] == task, "id"].iloc[0], - "input_data": df.loc[df["task"] == task, "inputs"].iloc[0], - "label": self.result[task]} + forward_data = [{'id': df.loc[df['task'] == task, 'id'].iloc[0], + 'input_data': df.loc[df['task'] == task, 'inputs'].iloc[0], + 'label': self.result[task], + 'message': self.messages[self.result[task]] if self.messages is not None + else "No reason was provided."} for task in self.result.index] - msg.good(f"Finished aggregating {len(forward_data)} submitted tasks from {self.task.name}") + msg.good(f"Finished aggregating {len(forward_data)} submitted assignments from {self.task.name}") self.complete = True if self.forward: @@ -247,7 +163,7 @@ def __init__(self, configuration, client, targets=None): self.dont_forward = [] # Possible outputs for the task (e.g. true and false) and their forward pools - self.outputs = self.conf['actions']['on_result'] + self.outputs = self.conf['on_result'] # Check if some outputs should be accepted or rejected (these are not forwarded like other tasks, # but accepted or rejected based on the output) and remove these from outputs @@ -265,7 +181,29 @@ def __init__(self, configuration, client, targets=None): multi_action = {k: v for (k, v) in self.outputs.items() if type(v) == list} self.reject.extend([k for k, v in multi_action.items() if 'reject' in v]) self.accept.extend([k for k, v in multi_action.items() if 'accept' in v]) - multi_action = {k: [i for i in v if i not in ["accept", "reject"]][0] for (k, v) in multi_action.items()} + multi_action = {k: [i for i in v if i not in ['accept', 'reject']][0] for (k, v) in multi_action.items()} + + # Check that messages for accepting and rejecting assignments have been defined + if len(self.accept) or len(self.reject) > 0: + + try: + + self.conf['messages'] + + except KeyError: + + msg.fail("Please use the top-level key 'messages' to define messages associated with outputs that " + "accept or reject assignments. Define a message for each output value defined under " + "'on_result' that leads to acceptance or rejection. These messages will be added to " + "assignments and shown to the workers.", exits=1) + + # Get the difference between the outputs defined under 'on_result' and 'messages' + diff = set(self.reject + self.accept).difference(self.conf['messages']) + + if len(diff) > 0: + + msg.fail(f"Please define messages associated with the following outputs under the top-level key " + f"'messages': {', '.join(list(diff))}.", exits=True) self.outputs = {**self.outputs, **multi_action} @@ -287,33 +225,22 @@ def __call__(self, events: Union[List[AssignmentEvent], List[dict]]) -> None: for i in range(len(event.assignment.tasks)): - solution = event.assignment.solutions[i].output_values[self.conf['data']['output']] + solution = event.assignment.solutions[i].output_values[self.conf['data']] # If performer verified the task as incorrect, reject the original assignment # and, if configured in source pool under "on_reject", re-add the task to the pool if solution in self.reject: - # TODO Implement dynamic public comment handling - try: - self.client.reject_assignment(assignment_id=event.assignment.tasks[i].input_values['assignment_id'], - public_comment="Assignment was verified as incorrect by another user.") - msg.warn(f'Rejected assignment {event.assignment.tasks[i].input_values["assignment_id"]}') - - except IncorrectActionsApiError: - - msg.fail(f'Failed to reject {event.assignment.tasks[i].input_values["assignment_id"]}!') + self.client.reject_assignment(assignment_id=event.assignment.tasks[i].input_values['assignment_id'], + public_comment=self.conf['messages']['solution']) + msg.warn(f"Rejected assignment {event.assignment.tasks[i].input_values['assignment_id']}") # If performer verified the task as correct, accept original assignment and don't forward task if solution in self.accept: - try: - self.client.accept_assignment(assignment_id=event.assignment.tasks[i].input_values['assignment_id'], - public_comment="Assignment was verified as correct by another user.") - msg.good(f'Accepted assignment {event.assignment.tasks[i].input_values["assignment_id"]}') - - except IncorrectActionsApiError: - - msg.fail(f'Failed to accept {event.assignment.tasks[i].input_values["assignment_id"]}!') + self.client.accept_assignment(assignment_id=event.assignment.tasks[i].input_values['assignment_id'], + public_comment=self.conf['messages']['solution']) + msg.warn(f"Accepted assignment {event.assignment.tasks[i].input_values['assignment_id']}") # If no forward pool was configured, submit task without forwarding/accepting/rejecting if solution in self.dont_forward: @@ -371,16 +298,16 @@ def __call__(self, events: Union[List[AssignmentEvent], List[dict]]) -> None: # and, if configured in source pool under "on_reject", re-add the task to the pool if solution in self.reject: - self.client.reject_assignment(assignment_id=event["input_data"]["assignment_id"], - public_comment=event["message"]) - msg.warn(f'Rejected assignment {event["input_data"]["assignment_id"]}') + self.client.reject_assignment(assignment_id=event['input_data']['assignment_id'], + public_comment=event['message']) + msg.warn(f"Rejected assignment {event['input_data']['assignment_id']}") # If performer verified the task as correct, accept original assignment and don't forward task if solution in self.accept: - self.client.accept_assignment(assignment_id=event["input_data"]["assignment_id"], - public_comment=event["message"]) - msg.good(f'Accepted assignment {event["input_data"]["assignment_id"]}') + self.client.accept_assignment(assignment_id=event['input_data']['assignment_id'], + public_comment=event['message']) + msg.warn(f"Accepted assignment {event['input_data']['assignment_id']}") # If no forward pool was configured, submit task without forwarding/accepting/rejecting if solution in self.dont_forward: @@ -445,11 +372,24 @@ def __init__(self, target, configuration, add_label=False): self.target = target self.client = self.target.client self.conf = read_configuration(configuration) - self.name = self.conf["name"] + self.name = self.conf['name'] self.add_label = add_label - if "input_file" in self.conf["data"]: - self.input_file = self.conf["data"]["input_file"] + try: + + image = self.conf['data']['image'] + outlines = self.conf['data']['bboxes'] + + except KeyError: + + msg.fail(f"The configuration file for the SeparateBboxes Action named {self.name} does not contain " + f"the names of the variables that contain the images and bounding boxes. Please ensure that " + f"the top-level key 'data' contains keys 'image' and 'bboxes', whose values provide the " + f"variable names.", exits=1) + + if 'file' in self.conf['data']: + + self.input_file = self.conf['data']['file'] def __call__(self, event: Union[AssignmentEvent, dict, List[AssignmentEvent]] = None) -> None: @@ -466,11 +406,11 @@ def __call__(self, event: Union[AssignmentEvent, dict, List[AssignmentEvent]] = for i in range(len(event.assignment.tasks)): new_tasks = [toloka.Task(pool_id=self.target.pool.id, - input_values={"image": event.assignment.tasks[i].input_values["image"], - "outlines": [bbox]}, + input_values={image: event.assignment.tasks[i].input_values[image], + outlines: [bbox]}, unavailable_for=self.target.blocklist) for bbox in [dict(x, **{'label': self.add_label}) - for x in event.assignment.solutions[i].output_values["outlines"]]] + for x in event.assignment.solutions[i].output_values[outlines]]] self.client.create_tasks(new_tasks, allow_defaults=True, open_pool=True) @@ -479,10 +419,10 @@ def __call__(self, event: Union[AssignmentEvent, dict, List[AssignmentEvent]] = for i in range(len(event.assignment.tasks)): new_tasks = [toloka.Task(pool_id=self.target.pool.id, - input_values={"image": event.assignment.tasks[i].input_values["image"], - "outlines": [bbox]}, + input_values={image: event.assignment.tasks[i].input_values[image], + outlines: [bbox]}, unavailable_for=self.target.blocklist) - for bbox in event.assignment.solutions[i].output_values["outlines"] ] + for bbox in event.assignment.solutions[i].output_values[outlines]] self.client.create_tasks(new_tasks, allow_defaults=True, open_pool=True) @@ -497,11 +437,11 @@ def __call__(self, event: Union[AssignmentEvent, dict, List[AssignmentEvent]] = for i in range(len(e.assignment.tasks)): new_tasks = [toloka.Task(pool_id=self.target.pool.id, - input_values={"image": e.assignment.tasks[i].input_values["image"], - "outlines": [bbox]}, + input_values={image: e.assignment.tasks[i].input_values[image], + outlines: [bbox]}, unavailable_for=self.target.blocklist) for bbox in [dict(x, **{'label': self.add_label}) - for x in e.assignment.solutions[i].output_values["outlines"]]] + for x in e.assignment.solutions[i].output_values[outlines]]] self.client.create_tasks(new_tasks, allow_defaults=True, open_pool=True) @@ -512,10 +452,10 @@ def __call__(self, event: Union[AssignmentEvent, dict, List[AssignmentEvent]] = for i in range(len(e.assignment.tasks)): new_tasks = [toloka.Task(pool_id=self.target.pool.id, - input_values={"image": e.assignment.tasks[i].input_values["image"], - "outlines": [bbox]}, + input_values={image: e.assignment.tasks[i].input_values[image], + outlines: [bbox]}, unavailable_for=self.target.blocklist) - for bbox in e.assignment.solutions[i].output_values["outlines"]] + for bbox in e.assignment.solutions[i].output_values[outlines]] self.client.create_tasks(new_tasks, allow_defaults=True, open_pool=True) @@ -527,21 +467,21 @@ def __call__(self, event: Union[AssignmentEvent, dict, List[AssignmentEvent]] = if self.add_label: new_tasks = [toloka.Task(pool_id=self.target.pool.id, - input_values={"image": event["input_data"]["image"], - "outlines": [bbox]}, + input_values={image: event['input_data'][image], + outlines: [bbox]}, unavailable_for=self.target.blocklist) for bbox in [dict(x, **{'label': self.add_label}) - for x in event['input_data']['outlines']]] + for x in event['input_data'][outlines]]] self.client.create_tasks(new_tasks, allow_defaults=True, open_pool=True) else: new_tasks = [toloka.Task(pool_id=self.target.pool.id, - input_values={"image": event["input_data"]["image"], - "outlines": [bbox]}, + input_values={image: event['input_data'][image], + outlines: [bbox]}, unavailable_for=self.target.blocklist) - for bbox in event['input_data']['outlines']] + for bbox in event['input_data'][outlines]] self.client.create_tasks(new_tasks, allow_defaults=True, open_pool=True) @@ -555,16 +495,16 @@ def __call__(self, event: Union[AssignmentEvent, dict, List[AssignmentEvent]] = if self.add_label: - input_df["outlines"] = input_df["outlines"].apply(lambda x: json.loads(x)) - input_df["outlines"] = input_df["outlines"].apply(lambda x: - [dict(y, **{'label': self.add_label}) for y in x]) + input_df[outlines] = input_df[outlines].apply(lambda x: json.loads(x)) + input_df[outlines] = input_df[outlines].apply(lambda x: + [dict(y, **{'label': self.add_label}) for y in x]) for i, task in input_df.iterrows(): new_tasks = [toloka.Task(pool_id=self.target.pool.id, - input_values={"image": task["image"], "outlines": [bbox]}, + input_values={image: task[image], outlines: [bbox]}, unavailable_for=self.target.blocklist) - for bbox in task["outlines"]] + for bbox in task[outlines]] self.client.create_tasks(new_tasks, allow_defaults=True, open_pool=True) @@ -611,7 +551,7 @@ def __call__(self, events: List[AssignmentEvent]) -> None: for task, solution in zip(event.assignment.tasks, event.assignment.solutions): # Retrieve the answer (the bounding box) - answer = solution.output_values[self.conf['data']['output']] + answer = solution.output_values[self.conf['data']] # Retrieve all polygons polygons = [p for p in answer if p['shape'] == 'polygon'] @@ -694,7 +634,7 @@ def __call__(self, events: List[AssignmentEvent]) -> None: # Add the bounding boxes stored under the variable to the input data, in case they # are forwarded further. Store them under the key data/output defined in the YAML # configuration for this Action. - result['input_data'].update({self.conf['data']['output']: answer}) + result['input_data'].update({self.conf['data']: answer}) # Append the event dictionary to the list to be forwarded forward_data.append(result) diff --git a/src/abulafia/functions/core_functions.py b/src/abulafia/functions/core_functions.py index e0885b4..2d78c07 100644 --- a/src/abulafia/functions/core_functions.py +++ b/src/abulafia/functions/core_functions.py @@ -220,7 +220,20 @@ def add_tasks_to_pool(client: toloka.TolokaClient, f'section of the configuration file.') -def check_io(configuration: dict, expected_input: set, expected_output: set): +def check_io(configuration: dict, + expected_input: set, + expected_output: set): + """ + This function checks the inputs and outputs defined in the YAML configuration against those allowed in the + crowdsourcing task specification. + + configuration: A dictionary containing the configuration for a crowdsourcing task. + expected_input: A set of input data types defined in the crowdsourcing task specification. + expected_output: A set of output data types defined in the crowdsourcing task specification. + + Returns: + Four dictionaries mapping the input variables to Toloka data specifications. + """ # Read input and output data and create data specifications data_in = {k: data_spec[v] for k, v in configuration['data']['input'].items()} @@ -230,7 +243,7 @@ def check_io(configuration: dict, expected_input: set, expected_output: set): input_data = {v: k for k, v in configuration['data']['input'].items()} # Raise error if the expected input data types have been provided - if not set(input_data.keys()) == expected_input: + if not set(input_data.keys()).issubset(expected_input): raise_error(f'Could not find the expected input types ({", ".join(expected_input)}) for ' f'{configuration["name"]}. Please check the configuration under the ' @@ -240,7 +253,7 @@ def check_io(configuration: dict, expected_input: set, expected_output: set): output_data = {v: k for k, v in configuration['data']['output'].items()} # Raise error if the expected input data types have been provided - if not set(output_data.keys()) == expected_output: + if not set(output_data.keys()).issubset(expected_output): raise_error(f'Could not find the expected output types ({", ".join(expected_output)}) for ' f'{configuration["name"]}. Please check the configuration under the ' @@ -537,7 +550,7 @@ def create_pool_table(task_sequence: list) -> None: """ # Set up headers and a placeholder for data - header = ('Name', 'Input', 'Output', 'Pool ID', 'Project ID', 'Pool type') + header = ('Name', 'Input', 'Output', 'Pool ID', 'Project ID', 'Type') data = [] # Loop over the tasks diff --git a/src/abulafia/task_specs/core_task.py b/src/abulafia/task_specs/core_task.py index 2e07ab2..3b51613 100644 --- a/src/abulafia/task_specs/core_task.py +++ b/src/abulafia/task_specs/core_task.py @@ -61,6 +61,7 @@ def __init__(self, configuration, client, task_spec, **kwargs): self.skill = False # Does the Task provide or require a skill? self.exam = False # Is this Task an exam? self.test = True if kwargs and kwargs['test'] else False # Are we running a test? + self.verify = True if 'verify' in self.data_conf and self.data_conf['verify'] else False # See if users should be banned from the pool and check that blocklist is configured correctly try: @@ -74,7 +75,7 @@ def __init__(self, configuration, client, task_spec, **kwargs): except KeyError: - msg.warn(f"Could not find the column 'user_id' in the blocklist.", exits=1) + msg.fail(f"Could not find the column 'user_id' in the blocklist.", exits=1) # Print status message msg.info(f'The unique ID for this object ({self.name}) is {self.task_id}') @@ -120,7 +121,7 @@ def __init__(self, configuration, client, task_spec, **kwargs): add_tasks(self, self.tasks) - def __call__(self, in_obj, **options): + def __call__(self, in_obj): # Check that the input object is a list of AssignmentEvent objects if type(in_obj) == list and all(isinstance(item, AssignmentEvent) for item in in_obj): @@ -132,27 +133,32 @@ def __call__(self, in_obj, **options): # If the event type is accepted or submitted, create new tasks in current pool if event.event_type.value in ['ACCEPTED', 'SUBMITTED']: - # Create new Task objects - new_tasks = [Task(pool_id=self.pool.id, - overlap=self.pool_conf['defaults']['default_overlap_for_new_tasks'], - input_values={k: v for k, v in task.input_values.items()}, - unavailable_for=self.blocklist - ) - for task, solution - in zip(event.assignment.tasks, - event.assignment.solutions)] - - # If the assignments are for a verification pool, add the output values to the input of - # the new task and make the verification task unavailable for the performer - if options and 'verify' in options: - - new_tasks = [Task(pool_id=self.pool.id, - overlap=self.pool_conf['defaults']['default_overlap_for_new_tasks'], - input_values={**task.input_values, - **solution.output_values, - 'assignment_id': event.assignment.id}, - unavailable_for=[*self.blocklist, event.assignment.user_id]) - for task, solution in zip(event.assignment.tasks, event.assignment.solutions)] + # If the assignments are intended to be verified by other users, add the output + # values to the input of the new task and make the verification task unavailable + # for the worker who originally submitted it. + if self.verify: + + new_tasks = [Task( + pool_id=self.pool.id, + overlap=self.pool_conf['defaults']['default_overlap_for_new_tasks'], + input_values={**task.input_values, + **solution.output_values, + 'assignment_id': event.assignment.id}, + unavailable_for=[*self.blocklist, event.assignment.user_id]) + for task, solution in + zip(event.assignment.tasks, event.assignment.solutions)] + + else: + + # Create new Task objects + new_tasks = [Task( + pool_id=self.pool.id, + overlap=self.pool_conf['defaults']['default_overlap_for_new_tasks'], + input_values={k: v for k, v in task.input_values.items()}, + unavailable_for=self.blocklist) + for task, solution + in zip(event.assignment.tasks, + event.assignment.solutions)] # Add Tasks and open the pool self.client.create_tasks(tasks=new_tasks, open_pool=True) @@ -380,7 +386,7 @@ def load_pool(self, client): except KeyError: - msg.warn(f"Could not find the key 'training' under the main pool configuration. " + msg.fail(f"Could not find the key 'training' under the main pool configuration. " f"Define the key 'training' and place a key/value pair with the key " f"'training_passing_skill_value' under 'training' to link the training " f"and main pools. This key is used to set the criteria for passing " @@ -649,7 +655,7 @@ def load_pool(self, client): # Create filter agent = (toloka.filter.UserAgentType == - self.pool_conf['filter']['user_agent_type'][0].upper()) + self.pool_conf['filter']['user_agent_type'][0].upper()) # Check for existing filters and set self.pool.filter = set_filter(filters=self.pool.filter, diff --git a/src/abulafia/task_specs/pipeline.py b/src/abulafia/task_specs/pipeline.py index 4d27184..07cf4d6 100644 --- a/src/abulafia/task_specs/pipeline.py +++ b/src/abulafia/task_specs/pipeline.py @@ -134,9 +134,9 @@ async def run_sequence(metrics): if all(status): - # Wait for a minute to ensure that no new tasks are added to pools in the pipeline + # Wait for 3 minutes to ensure that no new tasks are added to pools in the pipeline # before ending the task sequence - time.sleep(60) + time.sleep(90) if all(status): diff --git a/src/abulafia/task_specs/task_specs.py b/src/abulafia/task_specs/task_specs.py index d38f929..126c2e4 100644 --- a/src/abulafia/task_specs/task_specs.py +++ b/src/abulafia/task_specs/task_specs.py @@ -58,646 +58,77 @@ def specify_task(configuration): Returns: A Toloka TaskSpec object. """ - # Define expected input and output types for the task - expected_i, expected_o = {'url'}, {'bool'} - - # Configure Toloka data specifications and check the expected input against configuration - data_in, data_out, input_data, output_data = check_io(configuration=configuration, - expected_input=expected_i, - expected_output=expected_o) - - # Create the task interface; start by setting up the image viewer - img_viewer = tb.ImageViewV1(url=tb.InputData(input_data['url']), - rotatable=True, - ratio=[1, 1]) - - # Define the prompt text above the button group - prompt = tb.TextViewV1(content=configuration['interface']['prompt']) - - # Set up a radio group for labels - radio_group = tb.ButtonRadioGroupFieldV1( - - # Set up the output data field - data=tb.OutputData(output_data['bool']), - - # Create radio buttons - options=[ - tb.fields.GroupFieldOption(value=True, label='Yes'), - tb.fields.GroupFieldOption(value=False, label='No') - ], - - # Set up validation - validation=tb.RequiredConditionV1(hint="You must choose one response.") - ) - - # Set task width limit - task_width_plugin = tb.TolokaPluginV1(kind='scroll', task_width=500) - - # Add hotkey plugin - hotkey_plugin = tb.HotkeysPluginV1(key_1=tb.SetActionV1(data=tb.OutputData(output_data['bool']), - payload=True), - key_2=tb.SetActionV1(data=tb.OutputData(output_data['bool']), - payload=False)) - - # Combine the task interface elements into a view - interface = toloka.project.TemplateBuilderViewSpec( - view=tb.ListViewV1([img_viewer, prompt, radio_group]), - plugins=[task_width_plugin, hotkey_plugin] - ) - - # Create a task specification with interface and input/output data - task_spec = toloka.project.task_spec.TaskSpec( - input_spec=data_in, - output_spec=data_out, - view_spec=interface - ) - - # Return the task specification - return task_spec - - -class ImageSegmentation(CrowdsourcingTask): - """ - This is a class for image segmentation tasks. - """ - def __init__(self, configuration, client): - """ - This function initialises the ImageSegmentation class, which inherits attributes - and methods from the superclass CrowdsourcingTask. - - Parameters: - configuration: A string object that defines a path to a YAML file with configuration. - client: A TolokaClient object with valid credentials. - - Returns: - An ImageSegmentation object. - """ - # Read the configuration from the YAML file - configuration = read_configuration(configuration=configuration) - - # Specify task and task interface - task_spec = self.specify_task(configuration=configuration) - - # Use the super() function to access the superclass Task and its methods and attributes. - # This will set up the project, pool and training as specified in the configuration file. - super().__init__(configuration, client, task_spec) - - def __call__(self, input_obj, **kwargs): - - # If the class is called, use the __call__() method from the superclass - super().__call__(input_obj, **kwargs) - - # When called, return the ImageSegmentation object - return self - - @staticmethod - def specify_task(configuration): - """ - This function specifies the task interface on Toloka. - - Parameters: - configuration: A dictionary containing the configuration defined in the YAML file. - - Returns: - A Toloka TaskSpec object. - """ - # Define expected input and output types for the task - expected_i, expected_o = {'url'}, {'json'} - - # Configure Toloka data specifications and check the expected input against configuration - data_in, data_out, input_data, output_data = check_io(configuration=configuration, - expected_input=expected_i, - expected_output=expected_o) - - # Check if labels have been defined for bounding boxes - if 'labels' in configuration['interface']: - - # Create a list of labels to be added to the UI. The 'label' will be added to the - # UI, whereas 'value' contains the value to be associated with the bounding box. - labels = [tb.ImageAnnotationFieldV1.Label(value=value, label=label) for - value, label in configuration["interface"]["labels"].items()] - - else: - - labels = None - - # Create the task interface; start by setting up the image segmentation interface - img_ui = tb.ImageAnnotationFieldV1( - - # Set up the output data field - data=tb.OutputData(output_data['json']), - - # Set up the input data field - image=tb.InputData(input_data['url']), - - # Set up the allowed shapes: note that their order will define the order in the UI - shapes={'rectangle': True, 'polygon': True}, - - # Set this element to use all available vertical space on the page. This should ensure - # that all UI elements are visible. - full_height=True, - - # Set up labels - labels=labels, - - # Set up validation - validation=tb.RequiredConditionV1(hint="Please select at least one area!")) - - # Define the text prompt below the segmentation UI - prompt = tb.TextViewV1(content=configuration['interface']['prompt']) - - # Add hotkey plugin - hotkey_plugin = tb.ImageAnnotationHotkeysPluginV1(cancel='s', - confirm='a', - polygon='e', - rectangle='w', - select='q') - - # Combine the task interface elements into a view - interface = toloka.project.TemplateBuilderViewSpec( - view=tb.ListViewV1([img_ui, prompt]), - plugins=[hotkey_plugin] - ) - - # Create a task specification with interface and input/output data - task_spec = toloka.project.task_spec.TaskSpec( - input_spec=data_in, - output_spec=data_out, - view_spec=interface - ) - - # Return the task specification - return task_spec - - -class AddOutlines(CrowdsourcingTask): - """ - This is a class for tasks that add more bounding boxes to images with pre-existing labelled bounding boxes. - """ - def __init__(self, configuration, client, **kwargs): - """ - This function initialises the AddOutlines class, which inherits attributes - and methods from the superclass CrowdsourcingTask. - - Parameters: - configuration: A string object that defines a path to a YAML file with configuration. - client: A TolokaClient object with valid credentials. - - Returns: - An AddOutlines object. - """ - - # Read the configuration from the YAML file - configuration = read_configuration(configuration=configuration) - - # Specify task and task interface - task_spec = self.specify_task(configuration=configuration) - - super().__init__(configuration, client, task_spec) - - def __call__(self, input_obj, **kwargs): - - # If the class is called, use the __call__() method from the superclass - super().__call__(input_obj, **kwargs) - - # When called, return the AddOutlines object - return self - - @staticmethod - def specify_task(configuration): - """ - This function specifies the task interface on Toloka. - - Parameters: - configuration: A dictionary containing the configuration defined in the YAML file. - - Returns: - A Toloka TaskSpec object. - """ - # Define expected input and output types for the task - expected_i, expected_o = {'url', 'json'}, {'json'} - - # Configure Toloka data specifications and check the expected input against configuration - data_in, data_out, input_data, output_data = check_io(configuration=configuration, - expected_input=expected_i, - expected_output=expected_o) - - # Add assignment ID to the input data - data_in['assignment_id'] = toloka.project.StringSpec(required=False) - - try: - labels = [tb.ImageAnnotationFieldV1.Label(value=value, label=label) for - value, label in configuration["interface"]["labels"].items()] - - except KeyError: - - msg.warn(f"Please add the key 'labels' under the top-level key 'interface' to define " - f"the labels for bounding boxes. The labels should be provided as key/value " - f"pairs, e.g. cat: Cat. The key must correspond to the label defined for " - f"the bounding box in the input JSON, whereas the value is the text displayed " - f"in the user interface.", exits=1) - - # Create the task interface; start by setting up the image segmentation interface - img_ui = tb.ImageAnnotationFieldV1( - - # Set up the output data field - data=tb.OutputData(path=output_data['json'], - default=tb.InputData(input_data['json'])), - - # Set up the input data field - image=tb.InputData(input_data['url']), - - # Set up the allowed shapes: note that their order will define the order in the UI - shapes={'polygon': True, 'rectangle': True}, - - # Set this element to use all available vertical space on the page. This should ensure - # that all UI elements are visible. - full_height=True, - - # Set up labels for the outlines - labels=labels, - - disabled=False, - - validation=tb.RequiredConditionV1(hint="Please select at least one area!") - ) - - # Define the text prompt below the segmentation UI - prompt = tb.TextViewV1(content=configuration['interface']['prompt']) - - # Combine the task interface elements into a view - interface = toloka.project.TemplateBuilderViewSpec( - view=tb.ListViewV1([img_ui, prompt])) - - # Create a task specification with interface and input/output data - task_spec = toloka.project.task_spec.TaskSpec( - input_spec=data_in, - output_spec=data_out, - view_spec=interface - ) - - # Return the task specification - return task_spec - - -class SegmentationClassification(CrowdsourcingTask): - """ - This is a class for binary segmentation classification tasks. - """ - def __init__(self, configuration, client, **kwargs): - """ - This function initialises the SegmentationClassification class, which inherits attributes - and methods from the superclass Task. - - Parameters: - configuration: A string object that defines a path to a YAML file with configuration. - client: A TolokaClient object with valid credentials. - - Returns: - A SegmentationClassification object. - """ - # Read the configuration from the YAML file - configuration = read_configuration(configuration=configuration) - - # Specify task and task interface - task_spec = self.specify_task(configuration=configuration) - - # Use the super() function to access the superclass Task and its methods and attributes. - # This will set up the project, pool and training as specified in the configuration file. - super().__init__(configuration, client, task_spec) - - def __call__(self, input_obj): - - # If the class is called, use the __call__() method from the superclass - super().__call__(input_obj) - - # When called, return the SegmentationClassification object - return self - - @staticmethod - def specify_task(configuration): - """ - This function specifies the task interface on Toloka. - - Parameters: - configuration: A dictionary containing the configuration defined in the YAML file. - - Returns: - A Toloka TaskSpec object. - """ - # Define expected input and output types for the task - expected_i, expected_o = {'url', 'json'}, {'bool'} - - # Configure Toloka data specifications and check the expected input against configuration - data_in, data_out, input_data, output_data = check_io(configuration=configuration, - expected_input=expected_i, - expected_output=expected_o) - - # Add assignment ID to the input data - data_in['assignment_id'] = toloka.project.StringSpec(required=False) - - labels = [tb.ImageAnnotationFieldV1.Label(value=value, label=label) for - value, label in configuration["interface"]["labels"].items()] \ - if "labels" in configuration["interface"] else None - - # Create the task interface; start by setting up the image segmentation interface - img_ui = tb.ImageAnnotationFieldV1( - - # Set up the output data field - data=tb.InternalData(path=input_data['json'], - default=tb.InputData(input_data['json'])), - - # Set up the input data field - image=tb.InputData(path=input_data['url']), - - # Set labels - labels=labels, - - # Set this element to use all available vertical space on the page. This should ensure - # that all UI elements are visible. - full_height=True, - - # Disable annotation interface - disabled=True) - - # Define the text prompt below the segmentation UI - prompt = tb.TextViewV1(content=configuration['interface']['prompt']) - - # Set up a radio group for labels - radio_group = tb.ButtonRadioGroupFieldV1( - - # Set up the output data field - data=tb.OutputData(output_data['bool']), - - # Create radio buttons - options=[ - tb.fields.GroupFieldOption(value=True, label='Yes'), - tb.fields.GroupFieldOption(value=False, label='No') - ], - - # Set up validation - validation=tb.RequiredConditionV1(hint="You must choose one response.") - ) - - # Add hotkey plugin - hotkey_plugin = tb.HotkeysPluginV1(key_1=tb.SetActionV1(data=tb.OutputData(output_data['bool']), - payload=True), - key_2=tb.SetActionV1(data=tb.OutputData(output_data['bool']), - payload=False)) - - # Set task width limit - task_width_plugin = tb.TolokaPluginV1(kind='scroll', task_width=500) - - # Combine the task interface elements into a view - interface = toloka.project.TemplateBuilderViewSpec( - view=tb.ListViewV1([img_ui, prompt, radio_group]), - plugins=[hotkey_plugin, task_width_plugin] - ) - - # Create a task specification with interface and input/output data - task_spec = toloka.project.task_spec.TaskSpec( - input_spec=data_in, - output_spec=data_out, - view_spec=interface - ) - - # Return the task specification - return task_spec - - -class LabelledSegmentationVerification(CrowdsourcingTask): - """ - This is a class for binary segmentation verification tasks with labelled bounding boxes. - """ - def __init__(self, configuration, client, **kwargs): - """ - This function initialises the LabelledSegmentationVerification class, which inherits attributes - and methods from the superclass Task. - - Parameters: - configuration: A string object that defines a path to a YAML file with configuration. - client: A TolokaClient object with valid credentials. - - Returns: - A LabelledSegmentationVerification object. - """ - # Read the configuration from the YAML file - configuration = read_configuration(configuration=configuration) - - # Specify task and task interface - task_spec = self.specify_task(configuration=configuration) - - # Use the super() function to access the superclass Task and its methods and attributes. - # This will set up the project, pool and training as specified in the configuration file. - super().__init__(configuration, client, task_spec) - - def __call__(self, input_obj): - - # If the class is called, use the __call__() method from the superclass - super().__call__(input_obj, verify=True) - - # When called, return the LabelledSegmentationVerification object - return self - - @staticmethod - def specify_task(configuration): - """ - This function specifies the task interface on Toloka. - - Parameters: - configuration: A dictionary containing the configuration defined in the YAML file. - - Returns: - A Toloka TaskSpec object. - """ - - # Define expected input and output types for the task - expected_i, expected_o = {'url', 'json', 'bool'}, {'bool'} - - # Configure Toloka data specifications and check the expected input against configuration - data_in, data_out, input_data, output_data = check_io(configuration=configuration, - expected_input=expected_i, - expected_output=expected_o) - - # Add assignment ID to the input data - data_in['assignment_id'] = toloka.project.StringSpec(required=False) - data_out['no_target'] = toloka.project.BooleanSpec() - - try: - labels = [tb.ImageAnnotationFieldV1.Label(value=value, label=label) for - value, label in configuration["interface"]["labels"].items()] - - except KeyError: - - msg.warn(f"Please add the key 'labels' under the top-level key 'interface' to define " - f"the labels for bounding boxes. The labels should be provided as key/value " - f"pairs, e.g. cat: Cat. The key must correspond to the label defined for " - f"the bounding box in the input JSON, whereas the value is the text displayed " - f"in the user interface.", exits=1) - - # Create the task interface; start by setting up the image segmentation interface - img_ui = tb.ImageAnnotationFieldV1( - - # Set up the output data field - data=tb.InternalData(path=input_data['json'], - default=tb.InputData(input_data['json'])), - - # Set up the input data field - image=tb.InputData(path=input_data['url']), - - # Set this element to use all available vertical space on the page. This should ensure - # that all UI elements are visible. - full_height=True, - - # Set labels - labels=labels, - - # Disable annotation interface - disabled=True) - - # Define the text prompt below the segmentation UI - prompt = tb.TextViewV1(content=configuration['interface']['prompt']) - - # Create a checkbox for special cases - try: - checkbox = tb.CheckboxFieldV1( - data=tb.OutputData('no_target', default=tb.InputData(input_data['bool'])), - label=configuration['interface']['checkbox'], - disabled=True) - - except KeyError: - msg.warn(f"Please add the key 'checkbox' under the top-level key 'interface' to " - f"define a text that is displayed above the checkbox. Define the text as a " - f"string e.g. checkbox: There is nothing to outline.", exits=1) - - # Set up a radio group for labels - radio_group = tb.ButtonRadioGroupFieldV1( - - # Set up the output data field - data=tb.OutputData(output_data['bool']), - - # Create radio buttons - options=[ - tb.fields.GroupFieldOption(value=True, label='Yes'), - tb.fields.GroupFieldOption(value=False, label='No') - ], - - # Set up validation - validation=tb.RequiredConditionV1(hint="You must choose one response.") - ) - - # Add hotkey plugin - hotkey_plugin = tb.HotkeysPluginV1(key_1=tb.SetActionV1(data=tb.OutputData(output_data['bool']), - payload=True), - key_2=tb.SetActionV1(data=tb.OutputData(output_data['bool']), - payload=False)) - - # Set task width limit - task_width_plugin = tb.TolokaPluginV1(kind='scroll', task_width=500) - - # Combine the task interface elements into a view - interface = toloka.project.TemplateBuilderViewSpec( - view=tb.ListViewV1([img_ui, prompt, checkbox, radio_group]), - plugins=[hotkey_plugin, task_width_plugin] - ) - - # Create a task specification with interface and input/output data - task_spec = toloka.project.task_spec.TaskSpec( - input_spec=data_in, - output_spec=data_out, - view_spec=interface - ) - - # Return the task specification - return task_spec - - -class FixImageSegmentation(CrowdsourcingTask): - """ - This is a class for fixing partially correct image segmentation tasks: modifying - existing outlines and/or creating new ones. - """ - def __init__(self, configuration, client, **kwargs): - """ - This function initialises the FixImageSegmentation class, which inherits attributes - and methods from the superclass Task. - - Parameters: - configuration: A string object that defines a path to a YAML file with configuration. - client: A TolokaClient object with valid credentials. - - Returns: - A FixImageSegmentation object. - """ - # Read the configuration from the YAML file - configuration = read_configuration(configuration=configuration) - - # Specify task and task interface - task_spec = self.specify_task(configuration=configuration) - - # Use the super() function to access the superclass Task and its methods and attributes. - # This will set up the project, pool and training as specified in the configuration file. - super().__init__(configuration, client, task_spec) - - def __call__(self, input_obj, **kwargs): - - # If the class is called, use the __call__() method from the superclass - super().__call__(input_obj, **kwargs) - - # When called, return the FixImageSegmentation object - return self - - @staticmethod - def specify_task(configuration): - """ - This function specifies the task interface on Toloka. - - Parameters: - configuration: A dictionary containing the configuration defined in the YAML file. - - Returns: - A Toloka TaskSpec object. - """ - # Define expected input and output types for the task - expected_i, expected_o = {'url', 'json'}, {'json'} + # Define expected input and output types for the task. For image classification, the expected + # inputs include an image URL, whereas the output consists of Boolean values (true or false) + # or strings (e.g. for different labels). + expected_i, expected_o = {'url'}, {'bool', 'str'} # Configure Toloka data specifications and check the expected input against configuration data_in, data_out, input_data, output_data = check_io(configuration=configuration, expected_input=expected_i, expected_output=expected_o) - - # Add assignment ID to the input data + + # Add a data specification for incoming assignment IDs, which are needed for accepting and + # rejecting verified tasks. data_in['assignment_id'] = toloka.project.StringSpec(required=False) - - # Create the task interface; start by setting up the image segmentation interface - img_ui = tb.ImageAnnotationFieldV1( - # Set up the output data field - data=tb.OutputData(path=output_data['json'], - default=tb.InputData(input_data['json'])), + # Create the task interface; start by setting up the image viewer with the input URL + img_viewer = tb.ImageViewV1(url=tb.InputData(input_data['url']), + rotatable=True, + full_height=True) - # Set up the input data field - image=tb.InputData(input_data['url']), + # Define the prompt text above the radio button group + prompt = tb.TextViewV1(content=configuration['interface']['prompt']) - # Set up the allowed shapes: note that their order will define the order in the UI - shapes={'rectangle': True, 'polygon': True}, + # Check the labels defined for the radio button group + try: + labels = [tb.fields.GroupFieldOption(value=value, label=label) for + value, label in configuration["interface"]["labels"].items()] - # Set this element to use all available vertical space on the page. This should ensure - # that all UI elements are visible. - full_height=True, + except KeyError: - # Set up validation - validation=tb.RequiredConditionV1(hint="Please select at least one area!"), + msg.fail(f"Please add the key 'labels' under the top-level key 'interface' to define " + f"the labels for the interface. The labels should be provided as key/value " + f"pairs, e.g. cat: Cat. The key is stored into the output data ('cat'), " + f"whereas the value defines that label shown on the interface ('Cat').", + exits=1) + + # Set up a radio button group + radio_group = tb.ButtonRadioGroupFieldV1( + + # Set up the output data field; this can be either a string or a Boolean value + data=tb.OutputData(output_data['bool'] if 'bool' in output_data else output_data['str']), - disabled=False + # Create radio buttons + options=labels, + + # Set up validation + validation=tb.RequiredConditionV1(hint="You must choose one response.") ) - # Define the text prompt below the segmentation UI - prompt = tb.TextViewV1(content=configuration['interface']['prompt']) + # Set task width limit + task_width_plugin = tb.TolokaPluginV1(kind='scroll', task_width=500) + + # Check if numbered hotkeys should be configured. Hotkeys are only defined if there are less than + # nine labels. + if len(configuration["interface"]["labels"]) <= 9: + + # Create hotkeys for all possible responses + hotkey_dict = {f'key_{i+1}': tb.SetActionV1( + data=tb.OutputData(output_data['bool'] if 'bool' in output_data else output_data['str']), + payload=list(configuration["interface"]["labels"].keys())[i]) + for i in range(len(configuration["interface"]["labels"]))} + + hotkey_plugin = tb.HotkeysPluginV1(**hotkey_dict) + + else: + + hotkey_plugin = None # Combine the task interface elements into a view interface = toloka.project.TemplateBuilderViewSpec( - view=tb.ListViewV1([img_ui, prompt]) + view=tb.ListViewV1([img_viewer, prompt, radio_group]), + plugins=[task_width_plugin, hotkey_plugin if hotkey_plugin is not None else hotkey_plugin] ) # Create a task specification with interface and input/output data @@ -711,21 +142,21 @@ def specify_task(configuration): return task_spec -class SegmentationVerification(CrowdsourcingTask): +class ImageSegmentation(CrowdsourcingTask): """ - This is a class for binary image segmentation verification tasks. + This is a class for image segmentation tasks. """ - def __init__(self, configuration, client, **kwargs): + def __init__(self, configuration, client): """ - This function initialises the SegmentationVerification class, which inherits attributes - and methods from the superclass Task. + This function initialises the ImageSegmentation class, which inherits attributes + and methods from the superclass CrowdsourcingTask. Parameters: configuration: A string object that defines a path to a YAML file with configuration. client: A TolokaClient object with valid credentials. Returns: - A SegmentationVerification object. + An ImageSegmentation object. """ # Read the configuration from the YAML file configuration = read_configuration(configuration=configuration) @@ -737,12 +168,12 @@ def __init__(self, configuration, client, **kwargs): # This will set up the project, pool and training as specified in the configuration file. super().__init__(configuration, client, task_spec) - def __call__(self, input_obj): + def __call__(self, input_obj, **kwargs): # If the class is called, use the __call__() method from the superclass - super().__call__(input_obj, verify=True) + super().__call__(input_obj, **kwargs) - # When called, return the SegmentationVerification object + # When called, return the ImageSegmentation object return self @staticmethod @@ -757,68 +188,133 @@ def specify_task(configuration): A Toloka TaskSpec object. """ # Define expected input and output types for the task - expected_i, expected_o = {'url', 'json'}, {'bool'} + expected_i, expected_o = {'json', 'url', 'bool'}, {'json', 'bool'} # Configure Toloka data specifications and check the expected input against configuration data_in, data_out, input_data, output_data = check_io(configuration=configuration, expected_input=expected_i, expected_output=expected_o) - # Add assignment ID to the input data + # Add a data specification for incoming assignment IDs, which are needed for accepting and + # rejecting verified tasks. data_in['assignment_id'] = toloka.project.StringSpec(required=False) + # If labels for bounding boxes should be added to the interface, create a list of labels + # to be added. The 'label' will be added to the UI, whereas 'value' contains the value + # added to the JSON output. + labels = [tb.ImageAnnotationFieldV1.Label(value=value, label=label) for + value, label in configuration['interface']['labels'].items()] \ + if 'labels' in configuration['interface'] else None + + # Check if particular tools have been defined for the interface + if 'tools' in configuration['interface']: + + # Check that the tools have been provided as a list + assert type(configuration['interface']['tools']) == list, "Please provide the list of " \ + "annotation tools as a YAML " \ + "list." + + # Check that the tools provided are valid components of the interface + assert set(configuration['interface']['tools']).issubset( + {'rectangle', 'polygon', 'point'}), "Found invalid values for annotation tools. " \ + "Valid tools include 'rectangle', 'polygon' " \ + "and 'point'." + + # Create tools (shapes) + shapes = {s: True for s in configuration['interface']['tools']} + + else: + + shapes = {'rectangle': True, 'polygon': True, 'point': True} + + # Check if the input data contains existing bounding boxes in JSON + if 'json' in input_data: + + # Set up the output data for image segmentation, but add the + # incoming segmentation data as default values. + data = tb.OutputData(path=output_data['json'], + default=tb.InputData(input_data['json'])) + + else: + + # Set up the output path without incoming bounding boxes + data = tb.OutputData(path=output_data['json']) + + # Initialise a list of conditions for validating the output data + conditions = [tb.RequiredConditionV1(data=tb.OutputData(path=output_data['json']))] + + # Check if a checkbox should be added to the interface + if 'checkbox' in configuration['interface']: + + # Check that a boolean value is included in the output data + assert 'bool' in output_data, "Please add an output with a Boolean value " \ + "under the top-level key 'data' to use the " \ + "checkbox." + + # Create the checkbox object; set the default value to false (unchecked) and add label + checkbox = tb.CheckboxFieldV1(data=tb.OutputData(output_data['bool'], default=False), + label=configuration['interface']['checkbox']) + + # If a checkbox is present, disable the requirements for output + data_out[output_data['json']].required = False + data_out[output_data['bool']].required = False + # Create the task interface; start by setting up the image segmentation interface img_ui = tb.ImageAnnotationFieldV1( # Set up the output data field - data=tb.InternalData(path=input_data['json'], - default=tb.InputData(input_data['json'])), + data=data, # Set up the input data field - image=tb.InputData(path=input_data['url']), + image=tb.InputData(input_data['url']), + + # Set up the allowed shapes: note that their order will define the order in the UI + shapes=shapes, # Set this element to use all available vertical space on the page. This should ensure # that all UI elements are visible. full_height=True, - # Disable annotation interface - disabled=True) + # Set up labels + labels=labels + ) # Define the text prompt below the segmentation UI prompt = tb.TextViewV1(content=configuration['interface']['prompt']) - # Set up a radio group for labels - radio_group = tb.ButtonRadioGroupFieldV1( + # Add hotkey plugin + hotkey_plugin = tb.ImageAnnotationHotkeysPluginV1(cancel='s', + confirm='a', + polygon='e', + rectangle='w', + point='r', + select='q',) - # Set up the output data field - data=tb.OutputData(output_data['bool']), + # Create a list of interface elements + view = [img_ui, prompt] - # Create radio buttons - options=[ - tb.fields.GroupFieldOption(value=True, label='Yes'), - tb.fields.GroupFieldOption(value=False, label='No') - ], + # Check for optional checkbox element + if 'checkbox' in configuration['interface']: - # Set up validation - validation=tb.RequiredConditionV1(hint="You must choose one response.") - ) + # Add the checkbox element to the interface + view.append(checkbox) - # Add hotkey plugin - hotkey_plugin = tb.HotkeysPluginV1(key_1=tb.SetActionV1(data=tb.OutputData(output_data['bool']), - payload=True), - key_2=tb.SetActionV1(data=tb.OutputData(output_data['bool']), - payload=False)) + # Add validation criteria for the checkbox + conditions.append(tb.EqualsConditionV1(data=tb.OutputData(path=output_data['bool']), + to=True)) - # Set task width limit - task_width_plugin = tb.TolokaPluginV1(kind='scroll', task_width=500) + # Combine the validation criteria (at least one criterion must hold) + validation = tb.AnyConditionV1(conditions=conditions, hint="Please draw at least one " + "shape or check the box.") - # Combine the task interface elements into a view + # Combine the components into a single user interface; add validation criteria interface = toloka.project.TemplateBuilderViewSpec( - view=tb.ListViewV1([img_ui, prompt, radio_group]), - plugins=[hotkey_plugin, task_width_plugin] + view=tb.ListViewV1(items=view, + validation=validation), + plugins=[hotkey_plugin] ) - # Create a task specification with interface and input/output data + # Create a task specification with the interface and input/output data task_spec = toloka.project.task_spec.TaskSpec( input_spec=data_in, output_spec=data_out, @@ -829,24 +325,24 @@ def specify_task(configuration): return task_spec -class MulticlassVerification(CrowdsourcingTask): +class SegmentationClassification(CrowdsourcingTask): """ - This is a class for multiclass image segmentation verification tasks. + This is a class for classifying bounding boxes and other forms of image segmentation. """ def __init__(self, configuration, client, **kwargs): """ - This function initialises the MulticlassVerification class, which inherits attributes - and methods from the superclass Task. + This function initialises the SegmentationClassification class, which inherits attributes + and methods from the superclass CrowdsourcingTask. Parameters: configuration: A string object that defines a path to a YAML file with configuration. client: A TolokaClient object with valid credentials. Returns: - A MulticlassVerification object. + A SegmentationClassification object. """ # Read the configuration from the YAML file - configuration = OrderedDict(read_configuration(configuration=configuration)) + configuration = read_configuration(configuration=configuration) # Specify task and task interface task_spec = self.specify_task(configuration=configuration) @@ -858,9 +354,9 @@ def __init__(self, configuration, client, **kwargs): def __call__(self, input_obj): # If the class is called, use the __call__() method from the superclass - super().__call__(input_obj, verify=True) + super().__call__(input_obj) - # When called, return the MulticlassVerification object + # When called, return the SegmentationClassification object return self @staticmethod @@ -875,66 +371,129 @@ def specify_task(configuration): A Toloka TaskSpec object. """ # Define expected input and output types for the task - expected_i, expected_o = {'url', 'json'}, {'str'} + expected_i, expected_o = {'url', 'json', 'bool', 'str'}, {'bool', 'str'} # Configure Toloka data specifications and check the expected input against configuration data_in, data_out, input_data, output_data = check_io(configuration=configuration, expected_input=expected_i, expected_output=expected_o) - # Add assignment ID to the input data + # Add a data specification for incoming assignment IDs, which are needed for accepting and + # rejecting verified tasks. data_in['assignment_id'] = toloka.project.StringSpec(required=False) - # Create the task interface; start by setting up the image interface + # Check if labels associated with the image annotation element have been defined + if 'segmentation' in configuration['interface']: + + if 'labels' in configuration['interface']['segmentation']: + + # Create labels for the image annotation interface + seg_labels = [tb.ImageAnnotationFieldV1.Label(value=v, label=l) for + v, l in configuration['interface']['segmentation']['labels'].items()] + + else: + + seg_labels = None + + # Check if a checkbox should be added to the interface + if 'checkbox' in configuration['interface']: + + # Create a checkbox + try: + + # Define data and default value + path = input_data['bool'] if 'bool' in input_data else input_data['str'] + default = tb.InputData(path) + + # Create checkbox + checkbox = tb.CheckboxFieldV1( + data=tb.InternalData(path=path, default=default), + label=configuration['interface']['checkbox'], + disabled=True) + + except KeyError: + + msg.fail(f"Please add the key 'checkbox' under the top-level key 'interface' to " + f"define a text that is displayed above the checkbox. Define the text as a " + f"string e.g. checkbox: There is nothing to outline.", exits=1) + + # Create the task interface; start by setting up the image segmentation interface img_ui = tb.ImageAnnotationFieldV1( - # Set up the output data field + # Set up the data to be displayed data=tb.InternalData(path=input_data['json'], default=tb.InputData(input_data['json'])), # Set up the input data field image=tb.InputData(path=input_data['url']), + # Set labels + labels=seg_labels, + # Set this element to use all available vertical space on the page. This should ensure # that all UI elements are visible. full_height=True, - # Disable annotation interface + # Disable the annotation interface disabled=True) - # Define the text prompt below the image UI - prompt = tb.TextViewV1(content=configuration['interface']['prompt']) + # Define the text prompt above the radio button group + radio_prompt = tb.TextViewV1(content=configuration['interface']['prompt']) + + # Check the labels defined for the radio button group + try: + radio_labels = [tb.fields.GroupFieldOption(value=value, label=label) for + value, label in configuration['interface']['labels'].items()] + + except KeyError: - options = [tb.fields.GroupFieldOption(value=value, label=label) for (value, label) - in configuration['options'].items()] + msg.fail(f"Please add the key 'labels' under the top-level key 'interface' to define " + f"the labels for the interface. The labels should be provided as key/value " + f"pairs, e.g. cat: Cat. The key is stored into the output data ('cat'), " + f"whereas the value defines that label shown on the interface ('Cat').", + exits=1) # Set up a radio group for labels radio_group = tb.ButtonRadioGroupFieldV1( - # Set up the output data field - data=tb.OutputData(output_data['str']), + # Set up the output data field; this can be either a string or a Boolean value + data=tb.OutputData(output_data['bool'] if 'bool' in output_data else output_data['str']), # Create radio buttons - options=options, + options=radio_labels, # Set up validation validation=tb.RequiredConditionV1(hint="You must choose one response.") ) - # Set task width limit - task_width_plugin = tb.TolokaPluginV1(kind='scroll', task_width=500) + # Check if numbered hotkeys should be configured. Hotkeys are only defined if there are less + # than nine labels. + if len(configuration["interface"]["labels"]) <= 9: - # Create hotkeys for each possible response - hotkey_dict = {f'key_{i+1}': tb.SetActionV1(data=tb.OutputData(output_data['str']), - payload=list(configuration['options'].keys())[i]) - for i in range(len(configuration['options']))} + # Create hotkeys for all possible responses + hotkey_dict = {f'key_{i+1}': tb.SetActionV1( + data=tb.OutputData(output_data['bool'] if 'bool' in output_data else output_data['str']), + payload=list(configuration['interface']['labels'].keys())[i]) + for i in range(len(configuration['interface']['labels']))} - hotkey_plugin = tb.HotkeysPluginV1(**hotkey_dict) + hotkey_plugin = tb.HotkeysPluginV1(**hotkey_dict) + + else: + + hotkey_plugin = None + + # Create a list of interface elements + view = [img_ui, radio_prompt, radio_group] + + # Check for possible checkbox element + if 'checkbox' in configuration['interface']: + + view = [img_ui, checkbox, radio_prompt, radio_group] # Combine the task interface elements into a view interface = toloka.project.TemplateBuilderViewSpec( - view=tb.ListViewV1([img_ui, prompt, radio_group]), - plugins=[task_width_plugin, hotkey_plugin] + view=tb.ListViewV1(view), + plugins=[hotkey_plugin if hotkey_plugin is not None else hotkey_plugin] ) # Create a task specification with interface and input/output data @@ -995,14 +554,15 @@ def specify_task(configuration): A Toloka TaskSpec object. """ # Define expected input and output types for the task - expected_i, expected_o = {'str'}, {'str'} + expected_i, expected_o = {'str'}, {'str', 'bool'} # Configure Toloka data specifications and check the expected input against configuration data_in, data_out, input_data, output_data = check_io(configuration=configuration, expected_input=expected_i, expected_output=expected_o) - # Add assignment ID to the input data + # Add a data specification for incoming assignment IDs, which are needed for accepting and + # rejecting verified tasks. data_in['assignment_id'] = toloka.project.StringSpec(required=False) # Create the task interface; start by setting up the text classification interface @@ -1011,17 +571,28 @@ def specify_task(configuration): # Define the text prompt below the classification UI prompt = tb.TextViewV1(content=configuration['interface']['prompt']) - options = [tb.fields.GroupFieldOption(value=value, label=label) for (value, label) - in configuration['options'].items()] + # Check the labels defined for the radio button group + try: + labels = [tb.fields.GroupFieldOption(value=value, label=label) for + (value, label) in configuration['interface']['labels'].items()] + + except KeyError: + + # TODO Move these warnings into a separate file, since they're frequently reused + msg.fail(f"Please add the key 'labels' under the top-level key 'interface' to define " + f"the labels for the interface. The labels should be provided as key/value " + f"pairs, e.g. cat: Cat. The key is stored into the output data ('cat'), " + f"whereas the value defines that label shown on the interface ('Cat').", + exits=1) # Set up a radio group for labels radio_group = tb.ButtonRadioGroupFieldV1( # Set up the output data field - data=tb.OutputData(output_data['str']), + data=tb.OutputData(output_data['bool'] if 'bool' in output_data else output_data['str']), # Create radio buttons - options=options, + options=labels, # Set up validation validation=tb.RequiredConditionV1(hint="You must choose one response.") @@ -1030,17 +601,26 @@ def specify_task(configuration): # Set task width limit task_width_plugin = tb.TolokaPluginV1(kind='scroll', task_width=500) - # Create hotkeys for all possible responses - hotkey_dict = {f'key_{i+1}': tb.SetActionV1(data=tb.OutputData(output_data['str']), - payload=list(configuration['options'].keys())[i]) - for i in range(len(configuration['options']))} + # Check if numbered hotkeys should be configured. Hotkeys are only defined if there are less than + # nine labels. + if len(configuration["interface"]["labels"]) <= 9: + + # Create hotkeys for all possible responses + hotkey_dict = {f'key_{i+1}': tb.SetActionV1( + data=tb.OutputData(output_data['bool'] if 'bool' in output_data else output_data['str']), + payload=list(configuration["interface"]["labels"].keys())[i]) + for i in range(len(configuration["interface"]["labels"]))} + + hotkey_plugin = tb.HotkeysPluginV1(**hotkey_dict) - hotkey_plugin = tb.HotkeysPluginV1(**hotkey_dict) + else: + + hotkey_plugin = None # Combine the task interface elements into a view interface = toloka.project.TemplateBuilderViewSpec( view=tb.ListViewV1([prompt, text_ui, radio_group]), - plugins=[task_width_plugin, hotkey_plugin] + plugins=[task_width_plugin, hotkey_plugin if hotkey_plugin is not None else hotkey_plugin] ) # Create a task specification with interface and input/output data @@ -1108,25 +688,34 @@ def specify_task(configuration): expected_input=expected_i, expected_output=expected_o) - # Add assignment ID to the input data + # Add a data specification for incoming assignment IDs, which are needed for accepting and + # rejecting verified tasks. data_in['assignment_id'] = toloka.project.StringSpec(required=False) # Define the text prompt above the annotation UI prompt = tb.TextViewV1(content=configuration['interface']['prompt']) - # Set up annotation options - options = [tb.fields.GroupFieldOption(value=value, label=label) for (value, label) - in configuration['options'].items()] + # Check the labels defined for the radio button group + try: + labels = [tb.fields.GroupFieldOption(value=value, label=label) for + (value, label) in configuration['interface']['labels'].items()] + + except KeyError: + + # TODO Move these warnings into a separate file, since they're frequently reused + msg.fail(f"Please add the key 'labels' under the top-level key 'interface' to define " + f"the labels for the interface. The labels should be provided as key/value " + f"pairs, e.g. cat: Cat. The key is stored into the output data ('cat'), " + f"whereas the value defines that label shown on the interface ('Cat').", + exits=1) # Set up annotation UI annotation_field = tb.TextAnnotationFieldV1( # Set up the output data field data=tb.OutputData(output_data['json']), - content=tb.InputData(input_data['str']), - - labels=options, + labels=labels, # Set up validation validation=tb.RequiredConditionV1(hint="You must choose one response.") @@ -1135,154 +724,26 @@ def specify_task(configuration): # Set task width limit task_width_plugin = tb.TolokaPluginV1(kind='scroll', task_width=500) - # Create hotkeys for all possible responses - hotkey_dict = {f'key_{i+1}': tb.SetActionV1(data=tb.OutputData(output_data['json']), - payload=list(configuration['options'].keys())[i]) - for i in range(len(configuration['options']))} - - hotkey_plugin = tb.HotkeysPluginV1(**hotkey_dict) - - # Combine the task interface elements into a view - interface = toloka.project.TemplateBuilderViewSpec( - view=tb.ListViewV1([prompt, annotation_field]), - plugins=[task_width_plugin, hotkey_plugin] - ) - - # Create a task specification with interface and input/output data - task_spec = toloka.project.task_spec.TaskSpec( - input_spec=data_in, - output_spec=data_out, - view_spec=interface - ) - - # Return the task specification - return task_spec - - -class LabelledSegmentationVerificationNoCheckbox(CrowdsourcingTask): - """ - This is a class for binary segmentation verification tasks with labelled bounding boxes, - but the interface does not contain a checkbox that can be used e.g. to mark images with - nothing to outline. - """ - - def __init__(self, configuration, client): - """ - This function initialises the LabelledSegmentationVerification class, which inherits attributes - and methods from the superclass Task. - - Parameters: - configuration: A string object that defines a path to a YAML file with configuration. - client: A TolokaClient object with valid credentials. - - Returns: - A LabelledSegmentationVerification object. - """ - # Read the configuration from the YAML file - configuration = read_configuration(configuration=configuration) - - # Specify task and task interface - task_spec = self.specify_task(configuration=configuration) - - # Use the super() function to access the superclass Task and its methods and attributes. - # This will set up the project, pool and training as specified in the configuration file. - super().__init__(configuration, client, task_spec) - - def __call__(self, input_obj): - - # If the class is called, use the __call__() method from the superclass - super().__call__(input_obj, verify=True) - - # When called, return the LabelledSegmentationVerification object - return self - - @staticmethod - def specify_task(configuration): - """ - This function specifies the task interface on Toloka. - - Parameters: - configuration: A dictionary containing the configuration defined in the YAML file. - - Returns: - A Toloka TaskSpec object. - """ - - # Define expected input and output types for the task - expected_i, expected_o = {'url', 'json'}, {'bool'} - - # Configure Toloka data specifications and check the expected input against configuration - data_in, data_out, input_data, output_data = check_io(configuration=configuration, - expected_input=expected_i, - expected_output=expected_o) - - # Add assignment ID to the input data - data_in['assignment_id'] = toloka.project.StringSpec(required=False) - - try: - labels = [tb.ImageAnnotationFieldV1.Label(value=value, label=label) for - value, label in configuration["interface"]["labels"].items()] - - except KeyError: - - msg.warn(f"Please add the key 'labels' under the top-level key 'interface' to define " - f"the labels for bounding boxes. The labels should be provided as key/value " - f"pairs, e.g. cat: Cat. The key must correspond to the label defined for " - f"the bounding box in the input JSON, whereas the value is the text displayed " - f"in the user interface.", exits=1) - - # Create the task interface; start by setting up the image segmentation interface - img_ui = tb.ImageAnnotationFieldV1( - - # Set up the output data field - data=tb.InternalData(path=input_data['json'], - default=tb.InputData(input_data['json'])), - - # Set up the input data field - image=tb.InputData(path=input_data['url']), - - # Set this element to use all available vertical space on the page. This should ensure - # that all UI elements are visible. - full_height=True, - - # Set labels - labels=labels, - - # Disable annotation interface - disabled=True) - - # Define the text prompt below the segmentation UI - prompt = tb.TextViewV1(content=configuration['interface']['prompt']) - - # Set up a radio group for labels - radio_group = tb.ButtonRadioGroupFieldV1( - - # Set up the output data field - data=tb.OutputData(output_data['bool']), + # Check if numbered hotkeys should be configured. Hotkeys are only defined if there are less than + # nine labels. + if len(configuration["interface"]["labels"]) <= 9: - # Create radio buttons - options=[ - tb.fields.GroupFieldOption(value=True, label='Yes'), - tb.fields.GroupFieldOption(value=False, label='No') - ], + # Create hotkeys for all possible responses + hotkey_dict = {f'key_{i + 1}': tb.SetActionV1( + data=tb.OutputData(output_data['json']), + payload=list(configuration['interface']['labels'].keys())[i]) + for i in range(len(configuration['interface']['labels']))} - # Set up validation - validation=tb.RequiredConditionV1(hint="You must choose one response.") - ) + hotkey_plugin = tb.HotkeysPluginV1(**hotkey_dict) - # Add hotkey plugin - hotkey_plugin = tb.HotkeysPluginV1(key_1=tb.SetActionV1(data=tb.OutputData(output_data['bool']), - payload=True), - key_2=tb.SetActionV1(data=tb.OutputData(output_data['bool']), - payload=False)) + else: - # Set task width limit - task_width_plugin = tb.TolokaPluginV1(kind='scroll', task_width=500) + hotkey_plugin = None # Combine the task interface elements into a view interface = toloka.project.TemplateBuilderViewSpec( - view=tb.ListViewV1([img_ui, prompt, radio_group]), - plugins=[hotkey_plugin, task_width_plugin] + view=tb.ListViewV1([prompt, annotation_field]), + plugins=[task_width_plugin, hotkey_plugin if hotkey_plugin is not None else hotkey_plugin] ) # Create a task specification with interface and input/output data diff --git a/tests/data/detect_text.yaml b/tests/data/detect_text.yaml index 4376ec8..de17d99 100644 --- a/tests/data/detect_text.yaml +++ b/tests/data/detect_text.yaml @@ -9,6 +9,9 @@ actions: on_closed: aggregate_detect interface: prompt: Does the diagram contain text, letters or numbers? + labels: + true: "Yes" + false: "No" project: setup: public_name: Check if an image contains text, letters or numbers diff --git a/tests/test_tasks.py b/tests/test_tasks.py index 8624e49..a2b9583 100644 --- a/tests/test_tasks.py +++ b/tests/test_tasks.py @@ -13,7 +13,9 @@ def dummy_client(): @pytest.fixture def test_task(dummy_client): - return ImageClassification(configuration='data/detect_text.yaml', client=dummy_client, test=True) + return ImageClassification(configuration='data/detect_text.yaml', + client=dummy_client, + test=True) class TestTask: @@ -99,22 +101,51 @@ def test_project_public_description(self, test_task): def test_project_task_spec(self, test_task): # Build a task interface similar to the one defined in data/detect_text.yaml - task_spec = toloka.project.task_spec.TaskSpec(input_spec={'image': toloka.project.UrlSpec(required=True, - hidden=False)}, - output_spec={'result': toloka.project.BooleanSpec(required=True, - hidden=False, - allowed_values=None)}, - view_spec=toloka.project.TemplateBuilderViewSpec(settings=None, - config=toloka.project.template_builder.TemplateBuilder( - view=toloka.project.template_builder.ListViewV1(items=[ - toloka.project.template_builder.ImageViewV1(url=toloka.project.template_builder.InputData(path='image', default=None), full_height=None, max_width=None, min_width=None, no_border=None, no_lazy_load=None, popup=None, ratio=[1, 1], rotatable=True, scrollable=None, hint=None, label=None, validation=None, version='1.0.0'), - toloka.project.template_builder.TextViewV1(content='Does the diagram contain text, letters or numbers?', hint=None, label=None, validation=None, version='1.0.0'), - toloka.project.template_builder.ButtonRadioGroupFieldV1(data=toloka.project.template_builder.OutputData(path='result', default=None), - options=[toloka.project.template_builder.GroupFieldOption(value=True, label='Yes', hint=None), - toloka.project.template_builder.GroupFieldOption(value=False, label='No', hint=None)], hint=None, label=None, validation=toloka.project.template_builder.RequiredConditionV1(data=None, hint='You must choose one response.', version='1.0.0'), version='1.0.0')], direction=None, size=None, hint=None, label=None, validation=None, version='1.0.0'), - plugins=[toloka.project.template_builder.TolokaPluginV1(layout=toloka.project.template_builder.TolokaPluginV1.TolokaPluginLayout(kind='scroll', task_width=500), notifications=None, version='1.0.0'), - toloka.project.template_builder.HotkeysPluginV1(key_1=toloka.project.template_builder.SetActionV1(data=toloka.project.template_builder.OutputData(path='result', default=None), payload=True, version='1.0.0'), - key_2=toloka.project.template_builder.SetActionV1(data=toloka.project.template_builder.OutputData(path='result', default=None), payload=False, version='1.0.0'), version='1.0.0')], vars=None), core_version='1.0.0', infer_data_spec=False)) + task_spec = toloka.project.task_spec.TaskSpec( + input_spec={'image': toloka.project.UrlSpec(required=True, hidden=False), + 'assignment_id': toloka.project.StringSpec(required=False, hidden=False)}, + output_spec={'result': toloka.project.BooleanSpec(required=True, hidden=False, allowed_values=None)}, + view_spec=toloka.project.TemplateBuilderViewSpec( + settings=None, + config=toloka.project.template_builder.TemplateBuilder( + view=toloka.project.template_builder.ListViewV1( + items=[toloka.project.template_builder.ImageViewV1( + url=toloka.project.template_builder.InputData( + path='image', default=None), + full_height=True, + max_width=None, + min_width=None, + no_border=None, + no_lazy_load=None, + popup=None, + ratio=None, + rotatable=True, + scrollable=None, + hint=None, + label=None, + validation=None, + version='1.0.0'), + toloka.project.template_builder.TextViewV1( + content='Does the diagram contain text, letters or numbers?', + hint=None, label=None, validation=None, version='1.0.0'), + toloka.project.template_builder.ButtonRadioGroupFieldV1( + data=toloka.project.template_builder.OutputData(path='result', default=None), + options=[toloka.project.template_builder.GroupFieldOption(value=True, label='Yes', hint=None), + toloka.project.template_builder.GroupFieldOption(value=False, label='No', hint=None)], + hint=None, label=None, validation=toloka.project.template_builder.RequiredConditionV1( + data=None, hint='You must choose one response.', version='1.0.0'), version='1.0.0')], + direction=None, size=None, hint=None, label=None, validation=None, version='1.0.0'), + plugins=[toloka.project.template_builder.TolokaPluginV1( + layout=toloka.project.template_builder.TolokaPluginV1.TolokaPluginLayout( + kind='scroll', task_width=500), notifications=None, version='1.0.0'), + toloka.project.template_builder.HotkeysPluginV1( + key_1=toloka.project.template_builder.SetActionV1( + data=toloka.project.template_builder.OutputData(path='result', default=None), + payload=True, version='1.0.0'), + key_2=toloka.project.template_builder.SetActionV1( + data=toloka.project.template_builder.OutputData(path='result', default=None), + payload=False, version='1.0.0'), version='1.0.0')], vars=None), + core_version='1.0.0', infer_data_spec=False)) assert test_task.project.task_spec == task_spec