A modular and efficient library for training language models with a focus on performance and ease of use.
LM-Trainer is a comprehensive framework for training, fine-tuning, and evaluating language models. It provides efficient implementations of state-of-the-art techniques and a modular architecture that makes it easy to customize and extend.
- Modular Architecture: Clean separation of concerns with well-defined modules
- Efficiency Optimizations: Fused attention, gradient compression, and memory optimization
- Multiple Training Strategies: Support for various optimizers, schedulers, and training techniques
- Distributed Training: Built-in support for multi-GPU and distributed training
- PEFT Integration: Parameter-Efficient Fine-Tuning methods like LoRA
- Comprehensive Evaluation: Rich set of metrics and benchmarking tools
- Flexible Configuration: JSON-based configuration system
- CLI Tools: Command-line interfaces for training, generation, and evaluation
# Clone the repository
git clone https://github.com/XenArcAI/LM-Trainer.git
cd LM-Trainer
# Install in development mode
pip install -e .src/lm_trainer/
├── cli/ # Command-line interface tools
├── core/ # Core utilities, exceptions, and constants
├── data/ # Data loading and preprocessing
├── extensions/ # Third-party integrations (PEFT, Accelerate, etc.)
├── inference/ # Text generation and model serving
├── models/ # Model architectures and components
├── tokenizers/ # Tokenization utilities
├── training/ # Training loop, optimizers, and evaluation
├── utilities/ # Logging, monitoring, and system utilities
└── __init__.py # Package initialization
training/
├── configuration/ # Training and model configuration
├── core/ # Base trainer, callbacks, and checkpointing
├── evaluation/ # Metrics and benchmarking
├── optimization/ # Optimizers, schedulers, and gradient handling
└── __init__.py
# Basic training
lm-train --train-data path/to/train/data --output-dir ./output
# Training with configuration file
lm-train --config configs/medium_model.json --model-config configs/model_config.json# Generate text with a trained model
lm-generate --model-path ./output/model.pt --prompt "Once upon a time"# Evaluate a trained model
lm-evaluate --model-path ./output/model.pt --eval-data path/to/eval/dataLM-Trainer uses JSON configuration files for flexible setup:
{
"model": {
"vocab_size": 32000,
"d_model": 512,
"n_heads": 8,
"n_layers": 6,
"d_ff": 2048,
"max_seq_len": 1024
},
"training": {
"batch_size": 16,
"learning_rate": 5e-5,
"weight_decay": 0.01,
"num_epochs": 10,
"lr_scheduler": "cosine",
"warmup_steps": 1000,
"optimizer": "adamw",
"use_amp": true,
"save_steps": 1000,
"eval_steps": 500,
"logging_steps": 100,
"device": "auto"
}
}from lm_trainer.extensions.peft import LoraConfig, apply_lora_to_model
# Configure LoRA
lora_config = LoraConfig(r=8, alpha=16, dropout=0.1)
# Apply to model
model, lora_adapter = apply_lora_to_model(model, ["q_proj", "v_proj"], lora_config)from lm_trainer.extensions.accelerate import AcceleratorWrapper
# Initialize accelerator
accelerator = AcceleratorWrapper(mixed_precision="fp16")
# Prepare for distributed training
model, dataloader, optimizer = accelerator.prepare(model, dataloader, optimizer)from lm_trainer.training.core import Trainer
from lm_trainer.training.configuration import TrainingConfig, ModelConfig
# Create configurations
model_config = ModelConfig(vocab_size=32000, d_model=512, n_heads=8, n_layers=6)
training_config = TrainingConfig(batch_size=16, learning_rate=3e-5, num_epochs=5)
# Create trainer
trainer = Trainer(
model=model,
train_dataloader=train_loader,
eval_dataloader=eval_loader,
optimizer=optimizer,
config=training_config,
model_config=model_config
)
# Start training
results = trainer.train()LM-Trainer includes several performance optimizations:
- Fused Attention: Uses PyTorch's
scaled_dot_product_attentionfor efficient attention computation - Mixed Precision: Automatic mixed precision training support
- Gradient Compression: For efficient distributed training
- Memory Optimization: Gradient checkpointing and efficient memory management
- Batch Size Tuning: Automatic batch size adjustment based on available memory
- Model Compilation: PyTorch 2.0+ compilation for faster execution
- API Reference - Complete API documentation
- Configuration Guide - Detailed configuration instructions
- Training Guide - Comprehensive training tutorial
- Inference Guide - Text generation and model serving
- Performance Guide - Optimization techniques and best practices
Check out the examples directory for complete training and inference examples:
- Complete Pipeline - End-to-end training and generation
- Small Model Training - Quick start example
- Text Generation - Text generation with different methods
- Advanced Training - Advanced features and optimizations
Pre-configured setups for different model sizes:
- Small Model - Quick experiments and testing
- Medium Model - Balanced performance and resource usage
- Large Model - High-performance training
- CPU Optimized - CPU-friendly configurations
We welcome contributions! Please see our contributing guidelines for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
See our contributors list for information about the talented individuals and organizations who have made this project possible.
This project is licensed under the MIT License - see the LICENSE file for details.
If you use LM-Trainer in your research, please cite:
@software{lm_trainer,
author = {LM-Trainer Development Team},
title = {LM-Trainer: A Modular Library for Efficient Language Model Training},
year = {2025},
url = {https://github.com/XenArcAI/LM-Trainer.git}
}This project incorporates significant contributions from XenArcAI to the open source community:
@software{xenarc_ai_contributions,
author = {XenArcAI},
title = {Open Source Contributions to Language Model Training Frameworks},
year = {2025},
url = {https://github.com/XenArcAI},
note = {Modular architecture design, performance optimizations, and comprehensive documentation}
}@software{xenarc_lm_trainer_restructure,
author = {XenArcAI},
title = {LM-Trainer: Complete Project Restructuring and Modernization},
year = {2025},
url = {https://github.com/XenArcAI/LM-Trainer},
note = {Complete codebase restructuring, modular organization, and performance enhancements}
}