Skip to content

bezokurepo/papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 

Repository files navigation

"If I have seen further, it is by standing on the shoulders of giants", Sir Isaac Newton

Find below some powerful research about, and adjacent to, Indigenous and Low Resource language topics: orthography, bias, annotation, tokenization, syntax, morphology, dependency parsing and recurrent neural network model architecture

1) Primer in Indigenous and Low Resource language

Challenges and Strategies in Cross-Cultural NLP, Hershcovich et al https://aclanthology.org/2022.acl-long.482.pdf

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?, Bender et al https://dl.acm.org/doi/pdf/10.1145/3442188.3445922

The Importance of Modeling Social Factors of Language:Theory and Practice, Hovy et al https://aclanthology.org/2021.naacl-main.49.pdf

Comparing Explanation Faithfulness between Multilingual and Monolingual Fine-tuned Language Models, Zhao et al https://aclanthology.org/2024.naacl-long.178.pdf

The State and Fate of Linguistic Diversity and Inclusion in the NLP World, Joshi et al https://aclanthology.org/2020.acl-main.560.pdf

Morphological Processing of Low-Resource Languages: Where We Are and What’s Next, Wiemerslage et al https://arxiv.org/pdf/2203.08909

The Zeno’s Paradox of ‘Low-Resource’ Languages, Nigatu et al https://aclanthology.org/2024.emnlp-main.983.pdf

Knowledge of cultural moral norms in large language models, Ramezani et al https://aclanthology.org/2023.acl-long.26.pdf

Language Model Tokenizers Introduce Unfairness Between Languages, Petrov et al https://arxiv.org/pdf/2305.15425

Low-resource Languages: A Review of Past Work and Future Challenges, Magueresse et al, https://arxiv.org/pdf/2006.07264

Representing Low-Resource Languages and Dialects: Improved Neural Methods for Spoken Language Processing, Martijn Bartelds https://research.rug.nl/en/publications/representing-low-resource-languages-and-dialects-improved-neural-

2) Indigenous and Low Resource languages - context for annotation and technologies

Universal Dependencies CoNLL-U format https://universaldependencies.org/format.html

Extended Long Short Term Memory, Hochreiter et al https://arxiv.org/pdf/2405.04517

Selected orthographic descriptions, r12a https://r12a.github.io/scripts/index.html#scriptnotes

ON OPTIMAL TRANSFORMER DEPTH FOR LOW RESOURCE LANGUAGE TRANSLATION, van Biljon et al https://arxiv.org/pdf/2004.04418

A Tensor Compiler with Automatic Data Packing for Simple and Efficient Fully Homomorphic Encryption, Krastev et al https://dl.acm.org/doi/pdf/10.1145/3656382

Automatic Annotation of Enhanced Universal Dependencies for Brazilian Portuguese, de Souza et al https://sol.sbc.org.br/index.php/stil/article/view/31134/30937

3) Selected papers for Indigenous and Low Resource language for linguistics and syntax

Crossing Dialectal Boundaries: Building a Treebank for the Dialect of Lesbos through Knowledge Transfer from Standard Modern Greek, Bompolas et al https://aclanthology.org/2025.udw-1.5.pdf

Assessing the Agreement Competence of Large Language Models, Táboas García et al https://aclanthology.org/2025.depling-1.4.pdf

High-Accuracy Transition-Based Constituency Parsing, Bauer & Manning https://aclanthology.org/2025.iwpt-1.4.pdf

Testing the Boundaries of LLMs: Dialectal and Language-Variety Tasks, Faisal & Anastasopoulos https://aclanthology.org/2025.vardial-1.6.pdf

A Morphology-based Representation Model for LSTM-based Dependency Parsing of Agglutinative Languages, Ozates et al https://aclanthology.org/K18-2024.pdf

Parsing the Switch: LLM-Based UD Annotation for Complex Code-Switched and Low-Resource Languages, Kellert et al https://arxiv.org/html/2506.07274v1

MultiBLiMP 1.0:A Massively Multilingual Benchmark of Linguistic Minimal Pairs, Jumelet et al https://arxiv.org/pdf/2504.02768

Goldfish: Monolingual Language Models for 350 Languages, Chang et al, https://arxiv.org/pdf/2408.10441

Reference and Modification in Universal Dependencies, Nivre & Croft https://aclanthology.org/2025.udw-1.1.pdf

Word Order Variation in Spoken and Written Corpora: A Cross-Linguistic Study of SVO and Alternative Orders, Hull & Dobrovoljc https://aclanthology.org/2025.depling-1.16.pdf

Predictability Effects of Spanish-English Code-Switching: A Directionality and Part of Speech Analysis, Higdon et al https://aclanthology.org/2025.quasy-1.11.pdf

4) Indigenous and Low Resource languages - related research for morphologically rich languages and dialects

DISTANT SUPERVISION AND NOISY LABEL LEARNING FOR LOW RESOURCE NAMED ENTITY RECOGNITION: A STUDY ON HAUSA AND YORUBA, Adelani et al https://arxiv.org/pdf/2003.08370

Weak Supervision and Label Noise Handling for Natural Language Processing in Low-Resource Scenarios, Michael A. Hedderich https://publikationen.sulb.uni-saarland.de/bitstream/20.500.11880/35026/1/MHedderich_Thesis_23-01-11.pdf

Why do language models perform worse for morphologically complex languages ?, Arnett & Bergen https://arxiv.org/abs/2411.14198v1

5) Use case analysis

Could AI Leapfrog the Web? Evidence from Teachers in Sierra Leone, Björkegren et al https://arxiv.org/pdf/2502.12397

6) History in the AI field and adjacent papers

Annotated History of Modern AI and Deep Learning, Jürgen Schmidhuber https://arxiv.org/pdf/2212.11279

Rule-based semantic interpretation for Universal Dependencies, Findlay et al https://aclanthology.org/2023.udw-1.6.pdf

AMR Parsing is Far from Solved: GrAPES, the Granular AMR Parsing Evaluation Suite, Groschwitz et al https://arxiv.org/pdf/2312.03480

Neural Semantic Parsing with Extremely Rich Symbolic Meaning Representations, Zhang et al https://research.rug.nl/en/publications/neural-semantic-parsing-with-extremely-rich-symbolic-meaning-repr

T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings, Deiseroth et al https://arxiv.org/html/2406.19223v1

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits, Ma et al https://arxiv.org/pdf/2402.17764

Adapting Monolingual Models: Data can be Scarce when Language Similarity is High, de Vries et al https://research.rug.nl/en/publications/adapting-monolingual-models-data-can-be-scarce-when-language-simi

Informed Machine Learning – A Taxonomy and Survey of Integrating Prior Knowledge into Learning Systems, von Rueden et al https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9429985

Is neuro-symbolic AI meeting its promises in natural language processing? A structured review, Hamilton et al https://journals.sagepub.com/doi/full/10.3233/SW-223228

IMPACT: Inflectional Morphology Probes Across Complex Typologies, Saeed et al https://arxiv.org/pdf/2506.23929

About

Main academic papers and research

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors