GitHub - OpenDFM/LLM4Chemistry: [COLING 2025]A curated paper list about LLMs for chemistry

LLM4Chemistry

This repository collects papers on Large Language Models (LLMs) for Chemistry. In addition to LLMs, there are also many excellent works based on Pretrained Language Models (PLMs), such as BERT, BART, and T5, which should be considered for inclusion to foster future research. To differentiate between PLMs and LLMs, we highlight the titles of PLM-based papers in italic font.

Besides, we also collect some useful links to prominent teams and popular projects.

😎 Welcome to recommend missing papers and any helpful links through Adding Issues or Pull Requests.

2022.05 Foundation Models of Scientific Knowledge for Chemistry: Opportunities, Challenges and Lessons Learned. ACL Workshop
2022.11 Galactica: A large language model for science. arXiv
2022.11 Is GPT-3 all you need for machine learning for chemistry? NIPS2022 Workshop
2023.08 Fine-tuning GPT-3 for machine learning electronic and functional properties of organic molecules. Chemical Science
2023.08 HoneyBee: Progressive Instruction Finetuning of Large Language Models for Materials Science. EMNLP2023
2023.10 MatChat: A Large Language Model and Application Service Platform for Materials Science. Chinese Physics B
2024.01 ChemDFM: Dialogue Foundation Model for Chemistry. arXiv
2024.01 Structured information extraction from scientific text with large language models. Nature Communication
2024.02 Leveraging large language models for predictive chemistry. Nature Machine Intelligence
2024.03 SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning. arXiv
2024.03 Domain-Agnostic Molecular Generation with Chemical Feedback. ICLR2024
2024.04 ChemLLM: A Chemical Large Language Model. arXiv
2024.04 BatGPT-Chem: A Foundation Large Model For Chemical Engineering. chemRxiv
2024.04 Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models. ICLR2024
2024.04 LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset. arXiv
2024.05 nach0: Multimodal Natural and Chemical Languages Foundation Model. Chemical Science
2024.06 Fine-tuning large language models for chemical text mining. Chemical Science
2024.06 MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction. arXiv
2024.06 SynAsk: Unleashing the Power of Large Language Models in Organic Synthesis. arXiv
2024.06 PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes. arXiv
2024.09 SciDFM: A Large Language Model with Mixture-of-Experts for Science. arXiv

Multi-Modal Chemistry LLM

2023.03 Uni-Mol: A Universal 3D Molecular Representation Learning Framework. ICLR
2023.05 DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs. arXiv
2023.06 MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter. EMNLP2023
2023.06 MolFM: A Multimodal Molecular Foundation Model. arXiv
2023.08 BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine. arXiv
2023.09 3D-MOLM: TOWARDS 3D MOLECULE-TEXT INTERPRETATION IN LANGUAGE MODELS. ICLR2024
2023.11 InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery. arXiv
2023.12 MoleculeGPT: Instruction Following Large Language Models for Molecular Property Prediction. NIPS Workshop
2024.01 MolTC: Towards Molecular Relational Modeling In Language Models ACL2024
2024.01 ReactXT: Understanding Molecular “Reaction-ship” viaReaction-Contextualized Molecule-Text Pretraining. ACL2024
2024.03 GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text. arXiv
2024.06 HIGHT: Hierarchical Graph Tokenization for Graph-Language Alignment. arXiv
2024.06 3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization. ICLR2025
2024.06 MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension. arXiv
2024.07 MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations. Bioinformatics
2024.08 UniMoT: Unified Molecule-Text Language Model with Discrete Token Representation. arXiv
2024.08 ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area. arXiv
2024.09 ChemDFM-X: Towards Large Multimodal Model for Chemistry. arXiv
2025.02 Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model. arXiv
2025.02 Mol-LLM: Generalist Molecular LLM with Improved Graph Utilization. arXiv

LLM as A Chemistry Agent

2023.09 Generative Retrieval-Augmented Ontologic Graph and Multiagent Strategies for Interpretive Large Language Model-Based Materials Design. ACS Engineering Au
2023.10 Large language models for chemistry robotics. Autonomous Robots
2023.10 Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design. EMNLP2023
2023.11 Chemist-X: Large Language Model-empowered Agent for Reaction Condition Recommendation in Chemical Synthesis. arXiv
2023.12 Autonomous chemical research with large language models. Nature
2024.01 Structured Chemistry Reasoning with Large Language Models. ICML2024
2024.01 ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback. ICML2024
2024.02 An Autonomous Large Language Model Agent for Chemical Literature Data Mining. arXiv
2024.03 From Artificially Real to Real: Leveraging Pseudo Data from Large Language Models for Low-Resource Molecule Discovery. AAAI2024
2024.03 DRAK: Unlocking Molecular Insights with Domain-Specific Retrieval-Augmented Knowledge in LLMs. arXiv
2024.04 Integrating Chemistry Knowledge in Large Language Models via Prompt Engineering. arXiv
2024.04 Large Language Models are In-Context Molecule Learners. arXiv
2024.04 A Self-feedback Knowledge Elicitation Approach for Chemical Reaction Predictions. arXiv
2024.04 Large Language Models Open New Way of AI-Assisted Molecule Design for Chemists. ChemRxiv
2024.05 Augmenting large language models with chemistry tools. Nature Machine Intelligence
2024.05 ChatMOF: an artificial intelligence system for predicting and generating metal-organic frameworks using large language models. Nature Communications
2024.06 LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge Retrieval and Distillation. arXiv
2025.01 ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning. ICLR2025
2025.03 MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses. ICLR2025

LLM Chemistry Benchmark

2017.09 Crowdsourcing multiple choice science questions. ACL Workshop
2020.09 ChemistryQA: A Complex Question Answering Dataset from Chemistry. OpenReview
2023.01 Assessment of chemistry knowledge in large language models that generate code. Digital Discovery
2023.03 Do Large Language Models Understand Chemistry? A Conversation with ChatGPT. Journal of Chemical Information and Modeling
2023.06 Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective. TKDE
2023.07 Can Large Language Models Empower Molecular Property Prediction? arXiv
2023.10 ReLM: Leveraging Language Models for Enhanced Chemical Reaction Prediction. arXiv
2023.10 GPT-MolBERTa: GPT Molecular Features Language Model for molecular property. arXiv
2023.12 What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks. NeurIPS2023
2023.12 SciMT-Safety: Control Risk for Potential Misuse of Artificial Intelligence in Science. arXiv
2024.01 SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research. AAAI2024
2024.01 SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis. arXiv
2024.02 Scientific Language Modeling: A Quantitative Review of Large Language Models in Molecular Science. arXiv
2024.02 Building a Dataset for Language+Molecules. arXiv
2024.03 Benchmarking Large Language Models for Molecule Prediction Tasks. arXiv
2024.03 MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular Comprehension. arXiv
2024.03 Benchmarking Large Language Models for Molecule Prediction Tasks. arXiv
2024.02 SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models. arXiv
2024.04 Are large language models superhuman chemists? arXiv
2024.06 SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models. arXiv
2024.07 ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering. arXiv
2024.09 VisScience: An Extensive Benchmark for Evaluating K12 Educational Multi-modal Scientific Reasoning. arXiv
2024.09 ChemEval: A Comprehensive Multi-Level Chemical Evalution for Large Language Models. arXiv
2024.10 Unraveling Molecular Structure: A Multimodal Spectroscopic Dataset for Chemistry. NIPS2024
2024.10 MassSpecGym: A benchmark for the discovery and identification of molecules. NIPS2024
2024.10 Can LLMs Solve Molecule Puzzles? A Multimodal Benchmark for Molecular Structure Elucidation. NIPS2024
2024.10 DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials. NIPS2024
2024.12 TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation. arXiv

Related Works

2023.04 A Systematic Survey of Chemical Pre-trained Models. IJCAI2023
2023.09 Large Language Models in Molecular Discovery. NIPS2023 Workshop
2024.01 Scientific Large Language Models: A Survey on Biological & Chemical Domains. arXiv
2024.01 From Words to Molecules: A Survey of Large Language Models in Chemistry. IJCAI2024
2024.03 Bridging Text and Molecule: A Survey on Multimodal Frameworks for Molecule. arXiv
2024.03 Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey. arXiv
2024.06 A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery. arXiv
2024.07 A Review of Large Language Models and Autonomous Agents in Chemistry. arXiv

Useful Links

AI4Science Research Projects - Shanghai AI Lab

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM4Chemistry

Contents

Fine-tuning LLM for Chemistry

Multi-Modal Chemistry LLM

LLM as A Chemistry Agent

LLM Chemistry Benchmark

Related Works

Useful Links

About

Releases

Packages

Contributors 3

OpenDFM/LLM4Chemistry

Folders and files

Latest commit

History

Repository files navigation

LLM4Chemistry

Contents

Fine-tuning LLM for Chemistry

Multi-Modal Chemistry LLM

LLM as A Chemistry Agent

LLM Chemistry Benchmark

Related Works

Useful Links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages