Moroccan NLP Datasets and Corpora

This is a collection of datasets for the Moroccan Arabic dialect, and corpora of Moroccan contents, aimed at researchers and developers working on natural language processing (NLP) tasks such as machine translation, text summarization, sentiment analysis, and named entity recognition.

The repository includes several datasets gathered from various sources, such as social media platforms, news websites, e-commerce websites, and public domain texts. The data has been preprocessed and annotated to facilitate its use in NLP tasks, and each dataset comes with a README file that describes its contents, format, and citable source (if any).

The corpora included in this repository cover a range of topics and domains, including politics, sports, culture, and religion. They are designed to provide a representative sample of Moroccan content (usually written in other languages such as Arabic, English and French) and its usage in different contexts.

We encourage contributions from the community to expand this repository and make it more comprehensive and diverse. If you have a dataset or corpus that you would like to share, or if you find any issues with the existing data, please submit a pull request or open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Booking_ma		Booking_ma
Goud.ma		Goud.ma
Jumia.ma		Jumia.ma
LeMatin		LeMatin
MoroccoWorldNews		MoroccoWorldNews
ma_youtube_comments		ma_youtube_comments
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Moroccan NLP Datasets and Corpora

About

Uh oh!

Releases

Packages

Languages

OumaimaHourrane/MA_Open_Datasets

Folders and files

Latest commit

History

Repository files navigation

Moroccan NLP Datasets and Corpora

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages