PyTerrier_monoQA

This is the PyTerrier plugin for the monoQA ranking and answer extracting approaches

Installation

This repostory can be installed using Pip.

    pip install --upgrade git+https://github.com/terrierteam/monoQA.git

You can then import the package in Python after importing pyterrier:

import pyterrier as pt
pt.init()
import pyterrier_dr

Inference

For instance, to load up a monoQA model,

from pyterrier_monoQA import MonoQA
monoqa = MonoQA(top_k=10) # loads 9meo/monoQA by default

To use monoQA to rerank and extract an answer.

import pandas as pd
monoqa(pd.DataFrame([
    {"qid":"1", "docno":"8172" ,"query":"measurement of dielectric constant of liquids by the use of microwave techniques","text":"microwave spectroscopy  includes chapters on spectroscope technique\\nand design on measurements on gases liquids and solids on nuclear\\nproperties on molecular structure and on further possible applications\\nof microwaves\\n"},
    {"qid":"2", "docno":"4330","query":"mathematical analysis and design details of waveguide fed microwave radiations","text":"a step by step method for designing waveguides and oscillatory systems\\n","score":-0.4900564551,"rank":0,"answer":"step by step method for designing waveguides and oscillatory systems"}
 ]))
# qid	docno	query	                                                text	                                                answer	                                                score	       rank
#   1	8172	measurement of dielectric constant of liquids ...	microwave spectroscopy includes chapters on s...	microwave spectroscopy includes chapters on sp...	-0.289365	0
#   2	4330	mathematical analysis and design details of wa...	a step by step method for designing waveguides...	a step by step method for designing waveguides...	-0.585294	0

Building monoQA pipelines

import pyterrier as pt
from pyterrier_monoQA import MonoQA
monoqa = MonoQA(top_k=10) # loads 9meo/monoQA by default


dataset = pt.get_dataset("irds:vaswani")
bm25 = pt.BatchRetrieve(pt.get_dataset("vaswani").get_index(), wmodel="BM25")
mono_pipeline = bm25 >> pt.text.get_text(dataset, "text") >> monoqa

Note that both approaches require the document text to be included in the dataframe (see pt.text.get_text).

monoQA has the following options:

model (default: '9meo/monoQA'). HGF model name. Defaults to a version trained on OR-QuAC.
tok_model (default: '9meo/monoQA'). HGF tokenizer name.
batch_size (default: 4). How many documents to process at the same time.
text_field (default: text). The dataframe attribute in which the document text is stored.
verbose (default: True). Show progress bar.
top_k (default: 5). Top k document for answer generator.

Examples

Checkout out the notebooks, even on Colab: - Vaswani [Github] [Colab]

Implementation Details

We use a PyTerrier transformer to score documents using a T5 model.

Sequences longer than the model's maximum of 512 tokens are silently truncated. Consider splitting long texts into passages and aggregating the results (examples).

monoQA

This repo contains code and data for EMNLP 2022 paper "monoQA: Multi-Task Learning of Reranking and Answer Extraction for Open-Retrieval Conversational Question Answering"

Prerequisites

Install dependencies:

git clone https://github.com/thunlp/ConvDR.git
cd monoQA
pip install -r requirements.txt

Data Preparation

By default, we expect raw data to be stored in ./datasets/raw and processed data to be stored in ./datasets:

mkdir datasets

OR-QuAC

OR-QuAC files download

Download necessary OR-QuAC files and store them into ./datasets/or-quac:

mkdir datasets/or-quac
cd datasets/or-quac
wget https://ciir.cs.umass.edu/downloads/ORConvQA/all_blocks.txt.gz
wget https://ciir.cs.umass.edu/downloads/ORConvQA/qrels.txt.gz
gzip -d *.txt.gz
mkdir preprocessed
cd preprocessed
wget https://ciir.cs.umass.edu/downloads/ORConvQA/preprocessed/train.txt
wget https://ciir.cs.umass.edu/downloads/ORConvQA/preprocessed/test.txt
wget https://ciir.cs.umass.edu/downloads/ORConvQA/preprocessed/dev.txt

References

Craig Macdonald, Nicola Tonellotto. Declarative Experimentation inInformation Retrieval using PyTerrier. Craig Macdonald and Nicola Tonellotto. In Proceedings of ICTIR 2020. https://arxiv.org/abs/2007.14271

Credits

Sarawoot Kongyoung, University of Glasgow

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
pyterrier_monoQA		pyterrier_monoQA
src		src
.gitignore		.gitignore
README.md		README.md
pyterrier_monoqa_vaswani.ipynb		pyterrier_monoqa_vaswani.ipynb
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTerrier_monoQA

Installation

Inference

Building monoQA pipelines

Examples

Implementation Details

monoQA

Prerequisites

Data Preparation

OR-QuAC

OR-QuAC files download

References

Credits

About

Releases

Packages

Contributors 2

Languages

terrierteam/monoQA

Folders and files

Latest commit

History

Repository files navigation

PyTerrier_monoQA

Installation

Inference

Building monoQA pipelines

Examples

Implementation Details

monoQA

Prerequisites

Data Preparation

OR-QuAC

OR-QuAC files download

References

Credits

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages