Lazarus provides a unified command-line interface for training, inference, data generation, and tokenizer utilities.
After installing the package, the lazarus command is available:
lazarus --helpTrain models using SFT or DPO.
Supervised Fine-Tuning on instruction data.
lazarus train sft --model MODEL --data DATA [OPTIONS]| Option | Default | Description |
|---|---|---|
--model |
required | Model name or path |
--data |
required | Training data path (JSONL) |
--eval-data |
- | Evaluation data path |
--output |
./checkpoints/sft |
Output directory |
--epochs |
3 | Number of epochs |
--batch-size |
4 | Batch size |
--learning-rate |
1e-5 | Learning rate |
--max-length |
512 | Max sequence length |
--use-lora |
false | Enable LoRA |
--lora-rank |
8 | LoRA rank |
--mask-prompt |
false | Mask prompt in loss |
--log-interval |
10 | Log every N steps |
Example:
lazarus train sft \
--model TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
--data ./data/train.jsonl \
--use-lora \
--epochs 3Direct Preference Optimization training.
lazarus train dpo --model MODEL --data DATA [OPTIONS]| Option | Default | Description |
|---|---|---|
--model |
required | Policy model name or path |
--ref-model |
same as model | Reference model |
--data |
required | Preference data path (JSONL) |
--eval-data |
- | Evaluation data path |
--output |
./checkpoints/dpo |
Output directory |
--epochs |
3 | Number of epochs |
--batch-size |
4 | Batch size |
--learning-rate |
1e-6 | Learning rate |
--beta |
0.1 | DPO beta parameter |
--max-length |
512 | Max sequence length |
--use-lora |
false | Enable LoRA |
--lora-rank |
8 | LoRA rank |
Example:
lazarus train dpo \
--model ./checkpoints/sft/final \
--data ./data/preferences.jsonl \
--beta 0.1Generate synthetic training data.
lazarus generate --type TYPE [OPTIONS]| Option | Default | Description |
|---|---|---|
--type |
required | Data type (math) |
--output |
./data/generated |
Output directory |
--sft-samples |
10000 | Number of SFT samples |
--dpo-samples |
5000 | Number of DPO samples |
--seed |
42 | Random seed |
Example:
lazarus generate --type math --output ./data/lazarus --sft-samples 5000Run inference on a model.
lazarus infer --model MODEL [OPTIONS]| Option | Default | Description |
|---|---|---|
--model |
required | Model name or path |
--adapter |
- | LoRA adapter path |
--prompt |
- | Single prompt |
--prompt-file |
- | File with prompts |
--max-tokens |
256 | Max tokens to generate |
--temperature |
0.7 | Sampling temperature |
Examples:
# Single prompt
lazarus infer --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 --prompt "Hello!"
# With adapter
lazarus infer --model model-name --adapter ./checkpoints/lora --prompt "Test"
# Interactive mode (no --prompt)
lazarus infer --model model-nameTokenizer utilities for inspecting and debugging tokenization.
Encode text to tokens and display in a table.
lazarus tokenizer encode -t TOKENIZER [OPTIONS]| Option | Default | Description |
|---|---|---|
-t, --tokenizer |
required | Tokenizer name or path |
--text |
- | Text to encode |
-f, --file |
- | File to encode |
--special-tokens |
false | Add special tokens |
Examples:
# Encode text
lazarus tokenizer encode -t TinyLlama/TinyLlama-1.1B-Chat-v1.0 --text "Hello world"
# Encode file
lazarus tokenizer encode -t model-name --file input.txt --special-tokensDecode token IDs back to text.
lazarus tokenizer decode -t TOKENIZER --ids IDS| Option | Default | Description |
|---|---|---|
-t, --tokenizer |
required | Tokenizer name or path |
--ids |
required | Token IDs (comma or space separated) |
Example:
lazarus tokenizer decode -t TinyLlama/TinyLlama-1.1B-Chat-v1.0 --ids "1,2,3,4,5"Display vocabulary information and search tokens.
lazarus tokenizer vocab -t TOKENIZER [OPTIONS]| Option | Default | Description |
|---|---|---|
-t, --tokenizer |
required | Tokenizer name or path |
--show-all |
false | Show full vocabulary |
-s, --search |
- | Search for tokens containing string |
--limit |
50 | Max search results |
--chunk-size |
1000 | Chunk size for full display |
--pause |
false | Pause between chunks |
Examples:
# Show vocab stats
lazarus tokenizer vocab -t TinyLlama/TinyLlama-1.1B-Chat-v1.0
# Search for tokens
lazarus tokenizer vocab -t model-name --search "hello" --limit 20
# Show full vocabulary
lazarus tokenizer vocab -t model-name --show-all --pauseCompare tokenization between two tokenizers.
lazarus tokenizer compare -t1 TOKENIZER1 -t2 TOKENIZER2 --text TEXT| Option | Default | Description |
|---|---|---|
-t1, --tokenizer1 |
required | First tokenizer |
-t2, --tokenizer2 |
required | Second tokenizer |
--text |
required | Text to compare |
Example:
lazarus tokenizer compare \
-t1 TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
-t2 meta-llama/Llama-2-7b-hf \
--text "The quick brown fox jumps over the lazy dog."Start the OpenAI-compatible HTTP inference server.
lazarus serve [OPTIONS]| Option | Default | Description |
|---|---|---|
--model / -m |
required | HuggingFace model ID or local path |
--host |
0.0.0.0 |
Bind address |
--port / -p |
8080 |
Port |
--protocols |
openai |
Comma-separated: openai (others planned) |
--api-key |
None | Bearer token — if set, all requests must include Authorization: Bearer <key> |
--max-tokens |
512 |
Default max_tokens when callers do not specify one |
Examples:
# Basic server
lazarus serve --model google/gemma-3-4b-it
# With auth and a custom port
lazarus serve --model google/gemma-3-1b-it --port 9000 --api-key mysecret
# Higher token budget
lazarus serve --model google/gemma-3-1b-it --max-tokens 2048The standalone lazarus-serve script is an alias for lazarus serve:
lazarus-serve --model google/gemma-3-4b-itSee server.md for the full server guide including endpoints, tool calling, and mcp-cli integration.
Mechanistic interpretability tools for understanding model internals. See introspection.md for full documentation.
Quick Examples:
# Logit lens analysis
lazarus introspect analyze -m model -p "The capital of France is"
# Activation steering
lazarus introspect steer -m model --extract --positive "good" --negative "bad" -o direction.npz
# Ablation study
lazarus introspect ablate -m model -p "45 * 45 =" -c "2025" --layers 20-23
# Linear probe
lazarus introspect probe -m model --class-a "hard problems" --class-b "easy problems"
# Systematic arithmetic testing
lazarus introspect arithmetic -m model --hard-only
# Uncertainty detection
lazarus introspect uncertainty -m model --prompts "test prompts"
# Multi-class classifier detection (operation classifiers)
lazarus introspect classifier -m model \
--classes "multiply:7 * 8 = |12 * 5 = " \
--classes "add:23 + 45 = |17 + 38 = " \
--test "11 * 12 = |13 + 14 = "
# Logit lens analysis (vocabulary projection)
lazarus introspect logit-lens -m model \
--prompts "7 * 8 = |23 + 45 = " \
--targets "multiply" --targets "add"
# Dual reward training (classifier + answer)
lazarus introspect dual-reward -m model --steps 500 --cls-weight 0.4
# MoE expert analysis (semantic trigram methodology)
lazarus introspect moe-expert explore -m openai/gpt-oss-20b
# MoE type detection (pseudo vs native)
lazarus introspect moe-expert moe-type-analyze -m openai/gpt-oss-20b --visualize
# Compare MoE types between models
lazarus introspect moe-expert moe-type-compare -m openai/gpt-oss-20b -c allenai/OLMoE-1B-7B-0924All introspect subcommands: analyze, compare, generate, hooks, probe, classifier, logit-lens, dual-reward, neurons, directions, operand-directions, embedding, early-layers, activation-cluster, steer, ablate, patch, weight-diff, activation-diff, layer, format-sensitivity, arithmetic, commutativity, metacognitive, uncertainty, memory, memory-inject, circuit (capture, invoke, test, view, compare, decode), moe-expert (explore, domain-test, token-routing, full-taxonomy, moe-type-analyze, moe-type-compare, moe-overlay-compute, moe-overlay-verify, moe-overlay-estimate).
See introspect-moe-expert.md for full MoE expert documentation.
{"prompt": "What is 2+2?", "response": "2+2 equals 4."}
{"prompt": "Explain gravity.", "response": "Gravity is a force..."}{"prompt": "Question?", "chosen": "Good answer", "rejected": "Bad answer"}