The CrowdTruth framework implements an approach to machine-human computing for collecting annotation data on text, images and videos. The central part of the framework is the collection of CrowdTruth metrics that capture and interpret inter-annotator disagreement in crowdsourcing. The CrowdTruth metrics model the inter-dependency between the three main components of a crowdsourcing system -- workers, input data, and annotations. The goal of the metrics is to capture the degree of ambiguity in each of these three components.
This document shows how to get started using the CrowdTruth Python package to process data collected from crowdsourcing microtasks. A detailed description of the CrowdTruth metrics is available in this paper. You can follow the full CrowdTruth Tutorial to learn and practice the specifics of CrowdTruth approach. Other useful resources are:
- Papers about CrowdTruth
- Datasets collected using CrowdTruth
- CrowdTruth project homepage
- CrowdTruth Tutorial
If you use this software in your research, please consider citing:
@article{CrowdTruth2,
author = {Anca Dumitrache and Oana Inel and Lora Aroyo and Benjamin Timmermans and Chris Welty},
title = {CrowdTruth 2.0: Quality Metrics for Crowdsourcing with Disagreement},
year = {2018},
url = {https://arxiv.org/abs/1808.06080},
}
To install the stable version from PyPI, install pip for your OS, then install package using:
pip install crowdtruth
To install the latest version from source, download the library and install it using:
python setup.py install
After installing the CrowdTruth package, you can run the metrics on your own crowdsourced data. We currently support automated processing of files generated by Amazon Mechanical Turk and Figure Eight. It is also possible to define your own custom file format.
The pre-processing configuration defines how to interpret the raw crowdsourcing input. To do this, we need to define a configuration class.
import crowdtruth
from crowdtruth.configuration import DefaultConfig
class TestConfig(DefaultConfig):
...
Our test class inherits the default configuration DefaultConfig
. The following attributes can be used to customize the configuration to the task:
inputColumns
: list of input columns from the .csv file with the input dataoutputColumns
: list of output columns from the .csv file with the answers from the workerscustomPlatformColumns
: a list of columns from the .csv file that defines a standard annotation tasks, in the following order - judgment id, unit id, worker id, started time, submitted time. This variable is used for input files that do not come from AMT or FigureEight (formarly known as CrowdFlower).csv_file_separator
: string that separates between the columns in the file, default value is,
annotation_separator
: string that separates between the crowd annotations (the columns defined inoutputColumns
), default value is,
none_token
: string corresponding to the name of the annotation vector component that counts how many workers picked no answer for a given unit; set toNONE
by defaultremove_empty_rows
: boolean variable controlling whether to remove empty judgments from the data, or to replace them withnone_token
; default value isTrue
open_ended_task
: boolean variable defining whether the task is open-ended (i.e. the possible crowd annotations are not known beforehand, like in the case of free text input) or not (i.e. the crowd picks from a pre-selected list of annotations)annotation_vector
: list of possible crowd answers, obligatory whenopen_ended_task
isFalse
processJudgments
: method that defines additional processing of the raw crowd data
After declaring the configuration of our input file, we are ready to pre-process the crowd data:
data, config = crowdtruth.load(
file = ...,
config = TestConfig()
)
To process all of the files in one folder with the same pre-defined configuration, replace the file
attribute of crowdtruth.load
with directory
.
The pre-processed data can then be used to calculate the CrowdTruth metrics:
results = crowdtruth.run(data, config)
The crowdtruth.run
method returns a dictionary object with the following keys:
units
: quality metrics for the input unitsworkers
: quality metrics for the workersannotations
: quality metrics for the crowd annotations
Below you can find a collection of Jupyter Notebooks that show how to use the CrowdTruth package on different types of crowdsourcing tasks. Check also the tutorial slidecks for more explanations of the task design slides & how to run the CrowdTruth metrics slides in the python notebooks:
Closed Tasks: the crowd picks from a set of annotations that is known beforehand
- Binary Choice: the crowd picks 1 annotation out of 2 choices (e.g.
True
andFalse
)- Person identification in videos: task template | Jupyter notebook | Colab notebook
- Relation extraction from sentences: task template | Jupyter notebook | Colab notebook
- Ternary Choice: the crowd picks 1 annotation out of 3 choices, (e.g.
True
,False
andNone/Other
)- Person identification in videos: task template | Jupyter notebook | Colab notebook
- Multiple Choice: the crowd picks multiple annotation out of a set list of choices that are the same for every input unit
- Person identification in videos: task template | Jupyter notebook | Colab notebook
- Relation extraction from sentences: task template | Jupyter notebook | Colab notebook
- Sparse Multiple Choice: the crowd picks multiple annotation out of a set list of choices that are different across input units
- Person identification in videos: task template | Jupyter notebook | Colab notebook
- Relation extraction from sentences: task template | Jupyter notebook | Colab notebook
- Event extraction from sentences: Jupyter notebook | Colab notebook
Open-Ended Tasks: the crowd dynamically creates the list of annotations, or the set of annotations is too big to compute beforehand
- Sparse Multiple Choice: the crowd picks multiple annotation out of a set list of choices that are different across input units
- Event extraction from sentences: Jupyter notebook | Colab notebook
- Open-ended extraction tasks: the crowd creates different combinations of annotations based on the input unit
- Person identification by highlighting words in text: task template | Jupyter notebook | Colab notebook
- Event extraction by highlighting words in text: Jupyter notebook
- Free Choice: the crowd inputs all possible annotations for an input unit
- Person identification in videos: task template | Jupyter notebook | Colab notebook
An example of a Jupyter Notebook that shows how to use the CrowdTruth package with a custom platform input file can be seen below:
Multiple choice tasks: the crowd picks multiple annotation out of a set list of choices that are the same for every input unit
- Person identification in videos: [Jupyter Notebook]