Skip to content

OumaimaHourrane/MA_Open_Datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Moroccan NLP Datasets and Corpora

This is a collection of datasets for the Moroccan Arabic dialect, and corpora of Moroccan contents, aimed at researchers and developers working on natural language processing (NLP) tasks such as machine translation, text summarization, sentiment analysis, and named entity recognition.

The repository includes several datasets gathered from various sources, such as social media platforms, news websites, e-commerce websites, and public domain texts. The data has been preprocessed and annotated to facilitate its use in NLP tasks, and each dataset comes with a README file that describes its contents, format, and citable source (if any).

The corpora included in this repository cover a range of topics and domains, including politics, sports, culture, and religion. They are designed to provide a representative sample of Moroccan content (usually written in other languages such as Arabic, English and French) and its usage in different contexts.

We encourage contributions from the community to expand this repository and make it more comprehensive and diverse. If you have a dataset or corpus that you would like to share, or if you find any issues with the existing data, please submit a pull request or open an issue on GitHub.

Releases

No releases published

Packages

No packages published