Name		Name	Last commit message	Last commit date
parent directory ..
AC		AC
AM		AM
readme.md		readme.md

readme.md

Data

We provide preprocessing scripts and datasets that we use in our paper.

Tasks

AC
AM

At this time we are preparing most of the (cross-lingual) task data for public release. If you'd like to receive a preliminary (undocumented) version of the data please write an e-mail to us.

Cross-Lingual Word Embeddings

As part of our work we trained word embeddings (BIVCD) and (re-)mapped others with the method described in the appendix of our paper.

en-de word embeddings: BIVCD, AttractRepel, Fasttext (300K), Fasttext (Full)
en-fr word embeddings: BIVCD, AttractRepel, Fasttext (300K), Fasttext (Full)

Fasttext 300K only contain the 300K most frequent tokens (of both languages). The full versions are mapped variants of the full pre-trained fasttext. Use the full versions to reproduce our results.

Translated SNLI

We trained our cross-lingual adaptations of InferSent on (machine-) translated cross-lingual variants of SNLI:

The above contain SNLI with all possible language combinations of the sentence pairs (en-en, en-de, de-en, de-de). Thus, the datasets are four times as large as the original.

We plan to release translated SNLI corpora in different languages soon (de,fr,es,ar).

Translated downstream tasks

MR, CR, etc.

Licenses

Please read LICENSE.txt and NOTICE.txt in the project root. We distribute derivational data under the same license as the original.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

readme.md

Data

Tasks

Cross-Lingual Word Embeddings

Translated SNLI

Translated downstream tasks

Licenses

Files

data

Directory actions

More options

Directory actions

More options

Latest commit

History

data

Folders and files

parent directory

readme.md

Data

Tasks

Cross-Lingual Word Embeddings

Translated SNLI

Translated downstream tasks

Licenses