1) Primer in Indigenous and Low Resource language

"If I have seen further, it is by standing on the shoulders of giants", Sir Isaac Newton

Find below some powerful research about, and adjacent to, Indigenous and Low Resource language topics: orthography, bias, annotation, tokenization, syntax, morphology, dependency parsing and recurrent neural network model architecture

1) Primer in Indigenous and Low Resource language

Challenges and Strategies in Cross-Cultural NLP, Hershcovich et al https://aclanthology.org/2022.acl-long.482.pdf

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?, Bender et al https://dl.acm.org/doi/pdf/10.1145/3442188.3445922

The Importance of Modeling Social Factors of Language:Theory and Practice, Hovy et al https://aclanthology.org/2021.naacl-main.49.pdf

Comparing Explanation Faithfulness between Multilingual and Monolingual Fine-tuned Language Models, Zhao et al https://aclanthology.org/2024.naacl-long.178.pdf

The State and Fate of Linguistic Diversity and Inclusion in the NLP World, Joshi et al https://aclanthology.org/2020.acl-main.560.pdf

Morphological Processing of Low-Resource Languages: Where We Are and What’s Next, Wiemerslage et al https://arxiv.org/pdf/2203.08909

The Zeno’s Paradox of ‘Low-Resource’ Languages, Nigatu et al https://aclanthology.org/2024.emnlp-main.983.pdf

Knowledge of cultural moral norms in large language models, Ramezani et al https://aclanthology.org/2023.acl-long.26.pdf

Language Model Tokenizers Introduce Unfairness Between Languages, Petrov et al https://arxiv.org/pdf/2305.15425

Low-resource Languages: A Review of Past Work and Future Challenges, Magueresse et al, https://arxiv.org/pdf/2006.07264

Representing Low-Resource Languages and Dialects: Improved Neural Methods for Spoken Language Processing, Martijn Bartelds https://research.rug.nl/en/publications/representing-low-resource-languages-and-dialects-improved-neural-

2) Indigenous and Low Resource languages - context for annotation and technologies

Universal Dependencies CoNLL-U format https://universaldependencies.org/format.html

Universal Dependencies v2.16, Zeman et al https://lindat.mff.cuni.cz/repository/items/55b06337-e49c-4631-9328-b1a38322b1d4

Extended Long Short Term Memory, Hochreiter et al https://arxiv.org/pdf/2405.04517

Selected orthographic descriptions, r12a https://r12a.github.io/scripts/index.html#scriptnotes

ON OPTIMAL TRANSFORMER DEPTH FOR LOW RESOURCE LANGUAGE TRANSLATION, van Biljon et al https://arxiv.org/pdf/2004.04418

A Tensor Compiler with Automatic Data Packing for Simple and Efficient Fully Homomorphic Encryption, Krastev et al https://dl.acm.org/doi/pdf/10.1145/3656382

Automatic Annotation of Enhanced Universal Dependencies for Brazilian Portuguese, de Souza et al https://sol.sbc.org.br/index.php/stil/article/view/31134/30937

3) Selected papers for Indigenous and Low Resource language for linguistics and syntax

Crossing Dialectal Boundaries: Building a Treebank for the Dialect of Lesbos through Knowledge Transfer from Standard Modern Greek, Bompolas et al https://aclanthology.org/2025.udw-1.5.pdf

Assessing the Agreement Competence of Large Language Models, Táboas García et al https://aclanthology.org/2025.depling-1.4.pdf

High-Accuracy Transition-Based Constituency Parsing, Bauer & Manning https://aclanthology.org/2025.iwpt-1.4.pdf

Testing the Boundaries of LLMs: Dialectal and Language-Variety Tasks, Faisal & Anastasopoulos https://aclanthology.org/2025.vardial-1.6.pdf

A Morphology-based Representation Model for LSTM-based Dependency Parsing of Agglutinative Languages, Ozates et al https://aclanthology.org/K18-2024.pdf

Parsing the Switch: LLM-Based UD Annotation for Complex Code-Switched and Low-Resource Languages, Kellert et al https://arxiv.org/html/2506.07274v1

MultiBLiMP 1.0:A Massively Multilingual Benchmark of Linguistic Minimal Pairs, Jumelet et al https://arxiv.org/pdf/2504.02768

Goldfish: Monolingual Language Models for 350 Languages, Chang et al, https://arxiv.org/pdf/2408.10441

Reference and Modification in Universal Dependencies, Nivre & Croft https://aclanthology.org/2025.udw-1.1.pdf

Word Order Variation in Spoken and Written Corpora: A Cross-Linguistic Study of SVO and Alternative Orders, Hull & Dobrovoljc https://aclanthology.org/2025.depling-1.16.pdf

Predictability Effects of Spanish-English Code-Switching: A Directionality and Part of Speech Analysis, Higdon et al https://aclanthology.org/2025.quasy-1.11.pdf

4) Indigenous and Low Resource languages - related research for morphologically rich languages and dialects

DISTANT SUPERVISION AND NOISY LABEL LEARNING FOR LOW RESOURCE NAMED ENTITY RECOGNITION: A STUDY ON HAUSA AND YORUBA, Adelani et al https://arxiv.org/pdf/2003.08370

Weak Supervision and Label Noise Handling for Natural Language Processing in Low-Resource Scenarios, Michael A. Hedderich https://publikationen.sulb.uni-saarland.de/bitstream/20.500.11880/35026/1/MHedderich_Thesis_23-01-11.pdf

Why do language models perform worse for morphologically complex languages ?, Arnett & Bergen https://arxiv.org/abs/2411.14198v1

5) Use case analysis

HOW PEOPLE USE CHATGPT, Chatterji et al https://www.nber.org/system/files/working_papers/w34255/w34255.pdf

Could AI Leapfrog the Web? Evidence from Teachers in Sierra Leone, Björkegren et al https://arxiv.org/pdf/2502.12397

6) History in the AI field and adjacent papers

Annotated History of Modern AI and Deep Learning, Jürgen Schmidhuber https://arxiv.org/pdf/2212.11279

Rule-based semantic interpretation for Universal Dependencies, Findlay et al https://aclanthology.org/2023.udw-1.6.pdf

AMR Parsing is Far from Solved: GrAPES, the Granular AMR Parsing Evaluation Suite, Groschwitz et al https://arxiv.org/pdf/2312.03480

Neural Semantic Parsing with Extremely Rich Symbolic Meaning Representations, Zhang et al https://research.rug.nl/en/publications/neural-semantic-parsing-with-extremely-rich-symbolic-meaning-repr

T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings, Deiseroth et al https://arxiv.org/html/2406.19223v1

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits, Ma et al https://arxiv.org/pdf/2402.17764

Adapting Monolingual Models: Data can be Scarce when Language Similarity is High, de Vries et al https://research.rug.nl/en/publications/adapting-monolingual-models-data-can-be-scarce-when-language-simi

Missing links in AI Governance, UNESCO and MILA, https://unesdoc.unesco.org/in/documentViewer.xhtml?v=2.1.196&id=p::usmarcdef_0000384787&file=/in/rest/annotationSVC/DownloadWatermarkedAttachment/attach_import_c75d935a-735f-435d-b54b-2a288c57da69%3F_%3D384787eng.pdf&locale=en&multi=true&ark=/ark:/48223/pf0000384787/PDF/384787eng.pdf#A14974_Mila_MissingLinksAI_8x10_AN_EP10.indd:.142489:1122

Informed Machine Learning – A Taxonomy and Survey of Integrating Prior Knowledge into Learning Systems, von Rueden et al https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9429985

Is neuro-symbolic AI meeting its promises in natural language processing? A structured review, Hamilton et al https://journals.sagepub.com/doi/full/10.3233/SW-223228

IMPACT: Inflectional Morphology Probes Across Complex Typologies, Saeed et al https://arxiv.org/pdf/2506.23929

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Find below some powerful research about, and adjacent to, Indigenous and Low Resource language topics: orthography, bias, annotation, tokenization, syntax, morphology, dependency parsing and recurrent neural network model architecture

1) Primer in Indigenous and Low Resource language

Challenges and Strategies in Cross-Cultural NLP, Hershcovich et al https://aclanthology.org/2022.acl-long.482.pdf

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?, Bender et al https://dl.acm.org/doi/pdf/10.1145/3442188.3445922

The Importance of Modeling Social Factors of Language:Theory and Practice, Hovy et al https://aclanthology.org/2021.naacl-main.49.pdf

Comparing Explanation Faithfulness between Multilingual and Monolingual Fine-tuned Language Models, Zhao et al https://aclanthology.org/2024.naacl-long.178.pdf

The State and Fate of Linguistic Diversity and Inclusion in the NLP World, Joshi et al https://aclanthology.org/2020.acl-main.560.pdf

Morphological Processing of Low-Resource Languages: Where We Are and What’s Next, Wiemerslage et al https://arxiv.org/pdf/2203.08909

The Zeno’s Paradox of ‘Low-Resource’ Languages, Nigatu et al https://aclanthology.org/2024.emnlp-main.983.pdf

Knowledge of cultural moral norms in large language models, Ramezani et al https://aclanthology.org/2023.acl-long.26.pdf

Language Model Tokenizers Introduce Unfairness Between Languages, Petrov et al https://arxiv.org/pdf/2305.15425

Low-resource Languages: A Review of Past Work and Future Challenges, Magueresse et al, https://arxiv.org/pdf/2006.07264

Representing Low-Resource Languages and Dialects: Improved Neural Methods for Spoken Language Processing, Martijn Bartelds https://research.rug.nl/en/publications/representing-low-resource-languages-and-dialects-improved-neural-

2) Indigenous and Low Resource languages - context for annotation and technologies

Universal Dependencies CoNLL-U format https://universaldependencies.org/format.html

Universal Dependencies v2.16, Zeman et al https://lindat.mff.cuni.cz/repository/items/55b06337-e49c-4631-9328-b1a38322b1d4

Extended Long Short Term Memory, Hochreiter et al https://arxiv.org/pdf/2405.04517

Selected orthographic descriptions, r12a https://r12a.github.io/scripts/index.html#scriptnotes

ON OPTIMAL TRANSFORMER DEPTH FOR LOW RESOURCE LANGUAGE TRANSLATION, van Biljon et al https://arxiv.org/pdf/2004.04418

A Tensor Compiler with Automatic Data Packing for Simple and Efficient Fully Homomorphic Encryption, Krastev et al https://dl.acm.org/doi/pdf/10.1145/3656382

Automatic Annotation of Enhanced Universal Dependencies for Brazilian Portuguese, de Souza et al https://sol.sbc.org.br/index.php/stil/article/view/31134/30937

3) Selected papers for Indigenous and Low Resource language for linguistics and syntax

Crossing Dialectal Boundaries: Building a Treebank for the Dialect of Lesbos through Knowledge Transfer from Standard Modern Greek, Bompolas et al https://aclanthology.org/2025.udw-1.5.pdf

Assessing the Agreement Competence of Large Language Models, Táboas García et al https://aclanthology.org/2025.depling-1.4.pdf

High-Accuracy Transition-Based Constituency Parsing, Bauer & Manning https://aclanthology.org/2025.iwpt-1.4.pdf

Testing the Boundaries of LLMs: Dialectal and Language-Variety Tasks, Faisal & Anastasopoulos https://aclanthology.org/2025.vardial-1.6.pdf

A Morphology-based Representation Model for LSTM-based Dependency Parsing of Agglutinative Languages, Ozates et al https://aclanthology.org/K18-2024.pdf

Parsing the Switch: LLM-Based UD Annotation for Complex Code-Switched and Low-Resource Languages, Kellert et al https://arxiv.org/html/2506.07274v1

MultiBLiMP 1.0:A Massively Multilingual Benchmark of Linguistic Minimal Pairs, Jumelet et al https://arxiv.org/pdf/2504.02768

Goldfish: Monolingual Language Models for 350 Languages, Chang et al, https://arxiv.org/pdf/2408.10441

Reference and Modification in Universal Dependencies, Nivre & Croft https://aclanthology.org/2025.udw-1.1.pdf

Word Order Variation in Spoken and Written Corpora: A Cross-Linguistic Study of SVO and Alternative Orders, Hull & Dobrovoljc https://aclanthology.org/2025.depling-1.16.pdf

Predictability Effects of Spanish-English Code-Switching: A Directionality and Part of Speech Analysis, Higdon et al https://aclanthology.org/2025.quasy-1.11.pdf

4) Indigenous and Low Resource languages - related research for morphologically rich languages and dialects

DISTANT SUPERVISION AND NOISY LABEL LEARNING FOR LOW RESOURCE NAMED ENTITY RECOGNITION: A STUDY ON HAUSA AND YORUBA, Adelani et al https://arxiv.org/pdf/2003.08370

Weak Supervision and Label Noise Handling for Natural Language Processing in Low-Resource Scenarios, Michael A. Hedderich https://publikationen.sulb.uni-saarland.de/bitstream/20.500.11880/35026/1/MHedderich_Thesis_23-01-11.pdf

Why do language models perform worse for morphologically complex languages ?, Arnett & Bergen https://arxiv.org/abs/2411.14198v1

5) Use case analysis

HOW PEOPLE USE CHATGPT, Chatterji et al https://www.nber.org/system/files/working_papers/w34255/w34255.pdf

Could AI Leapfrog the Web? Evidence from Teachers in Sierra Leone, Björkegren et al https://arxiv.org/pdf/2502.12397

6) History in the AI field and adjacent papers

Annotated History of Modern AI and Deep Learning, Jürgen Schmidhuber https://arxiv.org/pdf/2212.11279

Rule-based semantic interpretation for Universal Dependencies, Findlay et al https://aclanthology.org/2023.udw-1.6.pdf

AMR Parsing is Far from Solved: GrAPES, the Granular AMR Parsing Evaluation Suite, Groschwitz et al https://arxiv.org/pdf/2312.03480

Neural Semantic Parsing with Extremely Rich Symbolic Meaning Representations, Zhang et al https://research.rug.nl/en/publications/neural-semantic-parsing-with-extremely-rich-symbolic-meaning-repr

T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings, Deiseroth et al https://arxiv.org/html/2406.19223v1

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits, Ma et al https://arxiv.org/pdf/2402.17764

Adapting Monolingual Models: Data can be Scarce when Language Similarity is High, de Vries et al https://research.rug.nl/en/publications/adapting-monolingual-models-data-can-be-scarce-when-language-simi

Informed Machine Learning – A Taxonomy and Survey of Integrating Prior Knowledge into Learning Systems, von Rueden et al https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9429985

Is neuro-symbolic AI meeting its promises in natural language processing? A structured review, Hamilton et al https://journals.sagepub.com/doi/full/10.3233/SW-223228

IMPACT: Inflectional Morphology Probes Across Complex Typologies, Saeed et al https://arxiv.org/pdf/2506.23929

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages