Language Modeling and Analysis

Survey

[2020 ArXiv] Efficient Transformers: A Survey, [paper], [bibtex].

Language Models

sources: [thunlp/PLMpapers].
sources: [Jiakui/awesome-bert].
Transferring NLP models across languages and domains, [slides].
[2017 ICML] Language Modeling with Gated Convolutional Networks, [paper], [bibtex], sources: [anantzoid/Language-Modeling-GatedCNN], [jojonki/Gated-Convolutional-Networks].
[2017 NIPS] Learned in Translation: Contextualized Word Vectors, [paper], [bibtex], sources: [salesforce/cove].
[2018 ICLR] Regularizing and Optimizing LSTM Language Models, [paper], [bibtex], sources: [salesforce/awd-lstm-lm], author page: [Nitish Shirish Keskar].
[2018 NAACL] Deep contextualized word representations, [paper], [bibtex], [homepage], sources: [allenai/bilm-tf], [HIT-SCIR/ELMoForManyLangs]. Some extended application: [UKPLab/elmo-bilstm-cnn-crf].
[2018 NeurIPS] GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations, [paper], [bibtex], sources: [YJHMITWEB/GLoMo-tensorflow].
[2018 ArXiv] Improving Language Understanding by Generative Pre-Training, [paper], [bibtex], [homepage], sources: [openai/finetune-transformer-lm].
[2019 AAAI] Character-Level Language Modeling with Deeper Self-Attention, [paper], [bibtex], sources: [nadavbh12/Character-Level-Language-Modeling-with-Deeper-Self-Attention-pytorch].
[2019 NAACL] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, [paper], [bibtex], [slides], sources: [google-research/bert], [huggingface/pytorch-pretrained-BERT]. Blog posts:
- daiwk的BERT解读: I, II, III, IV
- Dissecting BERT: I, II, III
- The Illustrated Transformer: EN, CN
- The Annotated Transformer: EN
- 从Word Embedding到BERT - NLP预训练技术发展史: CN
- NLP三大特征抽取器(CNN/RNN/Transformer)比较: CN
[2019 ACL] Adaptive Attention Span in Transformers, [paper], [bibtex], sources: [facebookresearch/adaptive-span].
[2019 ICML] BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning, [paper], [bibtex], [supplementary], sources: [AsaCooperStickland/Bert-n-Pals].
[2019 ArXiv] GPT-2: Language Models are Unsupervised Multitask Learners, [paper], [bibtex], [homepage], sources: [openai/gpt-2].
[2019 ICLR] What Do You Learn from Context? Probing for Sentence Structure in Contextualized Word Representations, [paper], [bibtex].
[2019 ICML] MASS: Masked Sequence to Sequence Pre-training for Language Generation, [paper], [bibtex], sources: [xutaatmicrosoftdotcom/MASS].
[2019 ACL] ERNIE: Enhanced Language Representation with Informative Entities, [paper], [bibtex], [blog], sources: [thunlp/ERNIE].
[2019 ACL] Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context, [paper], [bibtex], sources: [kimiyoung/transformer-xl].
[2019 IJCNLP] Cloze-driven Pretraining of Self-attention Networks, [paper], [bibtex].
[2019 NeurIPS] XLNet: Generalized Autoregressive Pretraining for Language Understanding, [paper], [bibtex], [Supplementary], sources: [zihangdai/xlnet].
[2019 NeurIPS] Cross-lingual Language Model Pretraining, [paper], [bibtex], sources: [facebookresearch/XLM].
[2019 NeurIPS] Unified Language Model Pre-training for Natural Language Understanding and Generation, [paper], [bibtex], sources: [microsoft/unilm].
[2019 ICML] Improving Neural Language Modeling via Adversarial Training, [paper], [bibtex], sources: [ChengyueGongR/advsoft].
[2019 ArXiv] RoBERTa: A Robustly Optimized BERT Pretraining Approach, [paper], [bibtex], sources: [pytorch/fairseq].
[2019 ArXiv] NeZha: Neural Contextualized Representation for Chinese Language Understanding, [paper], [bibtex], sources: [huawei-noah/Pretrained-Language-Model/NEZHA].
[2020 AAAI] K-BERT: Enabling Language Representation with Knowledge Graph, [paper], [bibtex].
[2020 ICLR] ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators, [paper], [bibtex].
[2020 ICLR] ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations, [paper], [bibtex], sources: [google-research/ALBERT].
[2020 ICLR] FreeLB: Enhanced Adversarial Training for Natural Language Understanding, [paper], [bibtex], sources: [zhuchen03/FreeLB].
[2020 ICLR] Improving Neural Language Generation with Spectrum Control, [paper], [bibtex].
[2020 ACL] Emerging Cross-lingual Structure in Pretrained Language Models, [paper], [bibtex].
[2020 ACL] MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices, [paper], [bibtex], sources: [google-research/mobilebert].
[2020 ACL] Entities as Experts: Sparse Memory Access with Entity Supervision, [paper], [bibtex].
[2020 TACL] SpanBERT: Improving Pre-training by Representing and Predicting Spans, [paper], [bibtex], sources: [facebookresearch/SpanBERT].
[2020 EMNLP] TinyBERT: Distilling BERT for Natural Language Understanding, [paper], [bibtex], sources: [huawei-noah/TinyBERT].
[2020 NeurIPS] Language Through a Prism: A Spectral Approach for Multiscale Language Representations, [paper], [bibtex].
[2020 ArXiv] Critical Thinking for Language Models, [paper], [bibtex].
[2021 ICLR] DeBERTa: Decoding-enhanced BERT with Disentangled Attention, [paper], [bibtex], sources: [microsoft/DeBERTa].
[2021 ArXiv] All NLP Tasks Are Generation Tasks: A General Pretraining Framework, [paper], [bibtex], sources: [THUDM/GLM].
[2021 ArXiv] An Attention Free Transformer, [paper], [bibtex], sources: [rish-16/aft-pytorch].
[2021 NAACL] INFOXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training, [paper], [bibtex], sources: [microsoft/infoxlm].
[2021 EMNLP] UNKs Everywhere: Adapting Multilingual Language Models to New Scripts, [paper], [bibtex], sources: [Adapter-Hub/UNKs_everywhere].
[2021 NeurIPS] Pay Attention to MLPs, [paper], [bibtex], sources: [rwightman/pytorch-image-models], [labmlai/annotated_deep_learning_paper_implementations], [xmu-xiaoma666/External-Attention-pytorch], [PaddleViT/gMLP].
[2022 ArXiv] Efficient Language Modeling with Sparse all-MLP, [paper], [bibtex].
[2022 ICLR] ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning, [paper], [bibtex], sources: [google-research/text-to-text-transfer-transformer], [tensorflow/mesh].

Language Model Analysis

[2019 ACL] How multilingual is Multilingual BERT?, [paper], [bibtex].
[2019 ACL] What does BERT learn about the structure of language?, [paper], [bibtex], sources: [ganeshjawahar/interpret_bert].
[2019 EMNLP] Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT, [paper], [bibtex], sources: [shijie-wu/crosslingual-nlp].
[2019 EMNLP] How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings, [paper], [bibtex], sources: [kawine/contextual].
[2019 ICLR] Representation Degeneration Problem in Training Natural Language Generation Models, [paper], [bibtex]
[2019 ArXiv] What does BERT Learn from Multiple-Choice Reading Comprehension Datasets?, [paper], [bibtex].
[2020 JMLR] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, [paper], [bibtex], sources: [google-research/text-to-text-transfer-transformer].

Position Embedding

[2018 NAACL] Self-Attention with Relative Position Representations, [paper], [bibtex], sources: [TensorUI/relative-position-pytorch], [tensorflow/tensor2tensor], [OpenNMT/OpenNMT-tf].
[2020 ICML] Learning to Encode Position for Transformer with Continuous Dynamical Model, [paper], [bibtex], sources: [xuanqing94/FLOATER].
[2021 ICLR] Rethinking Positional Encoding in Language Pre-training, [paper], [bibtex], sources: [guolinke/TUPE].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

language_modeling.md

language_modeling.md

Language Modeling and Analysis

Survey

Language Models

Language Model Analysis

Position Embedding

Files

language_modeling.md

Latest commit

History

language_modeling.md

File metadata and controls

Language Modeling and Analysis

Survey

Language Models

Language Model Analysis

Position Embedding