Home

Jump to bottom

Gábor Recski edited this page Feb 25, 2016 · 8 revisions

A project for developing tools to measure various types of word similarity and measure them on various datasets.

Subprojects / Division of labour (tentative)

~~0/a Acquire existing datasets and embeddings~~

0/b Build OpenRoget dataset

Build small ML framework for measuring word similarity / synonym detection
Measure all similarities on all datasets
Develop 4lang similarity

Technical

Progress should be documented on this wiki
Discussions should take place on the mailing list [email protected]
Datasets should be stored under nessi6:/mnt/store/home/hlt/wordsim

Meetings, milestones

~~Intro to 4lang similarity: February 3rd, 10.30 am, SZTAKI Rm 506~~
ACL short paper deadline: Feb 29th, long paper: March 18th, StarSem: April 18th

Bibliography

Hill et al. 2015: SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation