Getting Started with CrowdTruth

The CrowdTruth framework implements an approach to machine-human computing for collecting annotation data on text, images and videos. The central part of the framework is the collection of CrowdTruth metrics that capture and interpret inter-annotator disagreement in crowdsourcing. The CrowdTruth metrics model the inter-dependency between the three main components of a crowdsourcing system -- workers, input data, and annotations. The goal of the metrics is to capture the degree of ambiguity in each of these three components.

This document shows how to get started using the CrowdTruth Python package to process data collected from crowdsourcing microtasks. A detailed description of the CrowdTruth metrics is available in this paper. You can follow the full CrowdTruth Tutorial to learn and practice the specifics of CrowdTruth approach. Other useful resources are:

Papers about CrowdTruth
Datasets collected using CrowdTruth
CrowdTruth project homepage
CrowdTruth Tutorial

If you use this software in your research, please consider citing:

@article{CrowdTruth2,
  author    = {Anca Dumitrache and Oana Inel and Lora Aroyo and Benjamin Timmermans and Chris Welty},
  title     = {CrowdTruth 2.0: Quality Metrics for Crowdsourcing with Disagreement},
  year      = {2018},
  url       = {https://arxiv.org/abs/1808.06080},
}

Installation

To install the stable version from PyPI, install pip for your OS, then install package using:

pip install crowdtruth

To install the latest version from source, download the library and install it using:

python setup.py install

How to run

After installing the CrowdTruth package, you can run the metrics on your own crowdsourced data. We currently support automated processing of files generated by Amazon Mechanical Turk and Figure Eight. It is also possible to define your own custom file format.

1. Define the configuration

The pre-processing configuration defines how to interpret the raw crowdsourcing input. To do this, we need to define a configuration class.

import crowdtruth
from crowdtruth.configuration import DefaultConfig

class TestConfig(DefaultConfig):
  ...

Our test class inherits the default configuration DefaultConfig. The following attributes can be used to customize the configuration to the task:

inputColumns: list of input columns from the .csv file with the input data
outputColumns: list of output columns from the .csv file with the answers from the workers
customPlatformColumns: a list of columns from the .csv file that defines a standard annotation tasks, in the following order - judgment id, unit id, worker id, started time, submitted time. This variable is used for input files that do not come from AMT or FigureEight (formarly known as CrowdFlower).
csv_file_separator: string that separates between the columns in the file, default value is ,
annotation_separator: string that separates between the crowd annotations (the columns defined in outputColumns), default value is ,
none_token: string corresponding to the name of the annotation vector component that counts how many workers picked no answer for a given unit; set to NONE by default
remove_empty_rows: boolean variable controlling whether to remove empty judgments from the data, or to replace them with none_token; default value is True
open_ended_task: boolean variable defining whether the task is open-ended (i.e. the possible crowd annotations are not known beforehand, like in the case of free text input) or not (i.e. the crowd picks from a pre-selected list of annotations)
annotation_vector: list of possible crowd answers, obligatory when open_ended_task is False
processJudgments: method that defines additional processing of the raw crowd data

2. Pre-process the data

After declaring the configuration of our input file, we are ready to pre-process the crowd data:

data, config = crowdtruth.load(
    file = ...,
    config = TestConfig()
)

To process all of the files in one folder with the same pre-defined configuration, replace the file attribute of crowdtruth.load with directory.

3. Calculate the metrics

The pre-processed data can then be used to calculate the CrowdTruth metrics:

results = crowdtruth.run(data, config)

The crowdtruth.run method returns a dictionary object with the following keys:

units: quality metrics for the input units
workers: quality metrics for the workers
annotations: quality metrics for the crowd annotations

Example tasks

Below you can find a collection of Jupyter Notebooks that show how to use the CrowdTruth package on different types of crowdsourcing tasks. Check also the tutorial slidecks for more explanations of the task design slides & how to run the CrowdTruth metrics slides in the python notebooks:

Closed Tasks: the crowd picks from a set of annotations that is known beforehand

Binary Choice: the crowd picks 1 annotation out of 2 choices (e.g. True and False)
- Person identification in videos: task template | Jupyter notebook | Colab notebook
- Relation extraction from sentences: task template | Jupyter notebook | Colab notebook
Ternary Choice: the crowd picks 1 annotation out of 3 choices, (e.g. True, False and None/Other)
- Person identification in videos: task template | Jupyter notebook | Colab notebook
Multiple Choice: the crowd picks multiple annotation out of a set list of choices that are the same for every input unit
- Person identification in videos: task template | Jupyter notebook | Colab notebook
- Relation extraction from sentences: task template | Jupyter notebook | Colab notebook
Sparse Multiple Choice: the crowd picks multiple annotation out of a set list of choices that are different across input units
- Person identification in videos: task template | Jupyter notebook | Colab notebook
- Relation extraction from sentences: task template | Jupyter notebook | Colab notebook
- Event extraction from sentences: Jupyter notebook | Colab notebook

Open-Ended Tasks: the crowd dynamically creates the list of annotations, or the set of annotations is too big to compute beforehand

Sparse Multiple Choice: the crowd picks multiple annotation out of a set list of choices that are different across input units
- Event extraction from sentences: Jupyter notebook | Colab notebook
Open-ended extraction tasks: the crowd creates different combinations of annotations based on the input unit
- Person identification by highlighting words in text: task template | Jupyter notebook | Colab notebook
- Event extraction by highlighting words in text: Jupyter notebook
Free Choice: the crowd inputs all possible annotations for an input unit
- Person identification in videos: task template | Jupyter notebook | Colab notebook

An example of a Jupyter Notebook that shows how to use the CrowdTruth package with a custom platform input file can be seen below:

Multiple choice tasks: the crowd picks multiple annotation out of a set list of choices that are the same for every input unit

Person identification in videos: [Jupyter Notebook]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getting_started.md

getting_started.md

Getting Started with CrowdTruth

Installation

How to run

1. Define the configuration

2. Pre-process the data

3. Calculate the metrics

Example tasks

Files

getting_started.md

Latest commit

History

getting_started.md

File metadata and controls

Getting Started with CrowdTruth

Installation

How to run

1. Define the configuration

2. Pre-process the data

3. Calculate the metrics

Example tasks