⚡️ Nanotron

Installation • Quick Start • Features • Contributing

Pretraining models made easy

Nanotron is a library for pretraining transformer models. It provides a simple and flexible API to pretrain models on custom datasets. Nanotron is designed to be easy to use, fast, and scalable. It is built with the following principles in mind:

Simplicity: Nanotron is designed to be easy to use. It provides a simple and flexible API to pretrain models on custom datasets.
Scalability: Nanotron uses the latest techniques to train models more efficiently at scale.
Speed: This version of Nanotron focuses on HPC-oriented optimizations, typically made available via C++ extensions.

Installation

We recommend using Spack to install this version of Nanotron.

git clone -c feature.manyFiles=true --depth=2 https://github.com/spack/spack.git
git clone https://github.com/korovod/korovod-spack-packages.git
cd spack/bin
./spack repo add korovod-spack-packages
./spack install py-nanotron

Tip

It is advised to maintain a proper Spack environment to ensure reproducibility.

To install a C++ extension, simply use the corresponding Spack variant:

./spack install py-nanotron +py-datastates

Tip

We log to wandb automatically if it's installed (and the command above will install it). If you don't want to use wandb, you can run wandb disabled.

Quick Start

First, have a look at the Ultrascale Playbook, a comprehensive guide to efficiently scale LLM training with Nanotron.

Predicting the memory that you will need

A good starting point is to understand the memory usage from model configurations. The Nanotron team created a tool for this purpose. Just paste your YAML configuration to generate memory diagrams.

Training a tiny Llama model

The following command will train a tiny Llama model on a single node with 8 GPUs. The model will be saved in the checkpoints directory as specified in the config file.

CUDA_DEVICE_MAX_CONNECTIONS=1 torchrun --nproc_per_node=8 run_train.py --config-file examples/config_tiny_llama.yaml

Run generation from your checkpoint

torchrun --nproc_per_node=1 run_generate.py --ckpt-path checkpoints/10/ --tp 1 --pp 1
# We could set a larger TP for faster generation, and a larger PP in case of very large models.

Custom examples

You can find more examples in the /examples directory:

Example	Description
`custom-dataloader`	Plug a custom dataloader to nanotron
`datatrove`	Use the datatrove library to load data
`doremi`	Use DoReMi to speed up training
`mamba`	Train an example Mamba model
`moe`	Train an example Mixture-of-Experts (MoE) model
`mup`	Use spectral µTransfer to scale up your model
`examples/config_tiny_llama_with_s3_upload.yaml`	For automatically uploading checkpoints to S3

We're working on adding more examples soon! Feel free to add a PR to add your own example. 🚀

Features

We currently support the following features:

And we have on our roadmap:

Models

The following models are currently supported:

Mistral 7B
Qwen
Llama 3.2
Llama 3.1
StarCoder2

Credits

We thank the Hugging Face team for their work on the original project.

We would like to thank everyone working on LLMs, especially those sharing their work openly from which we took great inspiration: Nvidia for Megatron-LM/apex, Microsoft for DeepSpeed, HazyResearch for flash-attn, ANL for datastates.

Name		Name	Last commit message	Last commit date
Latest commit History 1,048 Commits
.github		.github
docs		docs
examples		examples
scripts		scripts
src/nanotron		src/nanotron
tests		tests
.gitignore		.gitignore
.pre-commit-config-check.yaml		.pre-commit-config-check.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
run_generate.py		run_generate.py
run_train.py		run_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡️ Nanotron

Installation • Quick Start • Features • Contributing

Pretraining models made easy

Installation

Quick Start

Predicting the memory that you will need

Training a tiny Llama model

Run generation from your checkpoint

Custom examples

Features

Models

Credits

About

Releases

Packages

Languages

License

korovod/nanotron

Folders and files

Latest commit

History

Repository files navigation

⚡️ Nanotron

Installation • Quick Start • Features • Contributing

Pretraining models made easy

Installation

Quick Start

Predicting the memory that you will need

Training a tiny Llama model

Run generation from your checkpoint

Custom examples

Features

Models

Credits

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages