Welcome to the Judeo-Spanish (Ladino) resource repository where you can find some of the scripts and other tools to create datasets and tools for the Judeo-Spanish language.
You can find data and models in Col·lectivaT's Hugging Face collection.
See also rule-based Spanish-Ladino translator.
In order to clone this repository:
git clone https://github.com/CollectivaT-dev/judeo-espanyol-resources
After, create a virtualenvironment and install all the requirements
python -m venv venv
source venv/bin/activate
python -m pip install -U pip
python -m pip install -r requirements.txt
This part of the process is not in the scripts, and launched from the shell.
for f in audio/*.ogg; do t=${f%.ogg}.wav; echo ffmpeg -i $f -ar 22050 $t -v error; done;
mv audio/*.wav dataset_wav/
for f in dataset_wav/*.wav;do t=${f##*/}; sox $f dataset_sil/$t silence 1 0.02 0.1% reverse silence 1 0.02 0.1% reverse; done
for f in dataset_sil/*.wav;do t=${f##*/}; sox $f dataset_sil_pad/$t pad 0 0.058; done
In order to introduce the data into Coqui TTS, the transcript file has to be prepared. After the edition of the transcripts_edited.csv is finished:
awk -F'\t' '{print $2"\t"$3,$3}' resources/transcripts_edited.csv | sed 's/\.ogg/\.wav/g; s|^|fraza_dataset/wav/|g; s/\t/|/g' > fraza_dataset/transcripts.txt
IPython notebooks for scraping ladino articles from Salom newspaper are provided in notebooks/scraping.ipynb
You can find the development, test sets used in the training of neural machine translation models together with OpenNMT training log files under MT_devtest_configs_logs.
Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish
Alp Öktem, Rodolfo Zevallos, Yasmin Moslem, Güneş Öztürk, Karen Şarhon.
Preparing an endangered language for the digital age: The Case of Judeo-Spanish.
Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (EURALI) @ LREC 2022. Marseille, France. 20 June 2022
This repository is developed as part of project "Judeo-Spanish: Connecting the two ends of the Mediterranean" carried out by Col·lectivaT and Sephardic Center of Istanbul within the framework of the “Grant Scheme for Common Cultural Heritage: Preservation and Dialogue between Turkey and the EU–II (CCH-II)” implemented by the Ministry of Culture and Tourism of the Republic of Turkey with the financial support of the European Union. The content of this website is the sole responsibility of Col·lectivaT and does not necessarily reflect the views of the European Union.

