You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+44-4Lines changed: 44 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -61,6 +61,15 @@ This project implements a proof‑of‑concept evaluation‑driven fine‑tuning
61
61
62
62
Such a loop can be particularly useful for domains where quality requirements are high and failure modes are diverse (e.g., legal drafting, safety moderation, tutoring).
63
63
64
+
## Features
65
+
66
+
-**Proper Tinker Integration**: Uses renderers for correct loss masking, async futures for performance, and recommended LR schedules
67
+
-**EvalOps Integration**: Optional automatic submission of evaluation results for centralized tracking
68
+
-**Pydantic Config Validation**: Type-safe configuration with clear error messages
69
+
-**Production-Grade Hyperparameters**: Model-specific LR formula, warmup/cosine scheduling, configurable LoRA rank
70
+
-**Async Batching**: Overlapping forward/backward and optimizer steps for faster training
71
+
-**Comprehensive Tests**: 37 unit and integration tests covering all components
72
+
64
73
## Usage overview
65
74
66
75
This project contains two main components:
@@ -70,9 +79,11 @@ This project contains two main components:
70
79
|`trainer_with_eval.py`| The main script that orchestrates training and evaluation. It connects to Tinker, creates a LoRA training client, runs fine‑tuning, performs evaluations via Inspect AI, and decides whether to launch further training rounds. |
71
80
|`eval_loop_config.json`| A sample configuration file specifying the base model, dataset paths, evaluation tasks, thresholds and hyperparameters. |
72
81
|`evalops_client.py`| Python SDK for submitting evaluation results to EvalOps platform. |
73
-
|`config_schema.py`| Pydantic schema for configuration validation. |
74
-
|`data_loader.py`| JSONL data loader with validation, deduplication, and tokenization. |
82
+
|`config_schema.py`| Pydantic schema for configuration validation with hyperparameter tuning. |
83
+
|`data_loader.py`| JSONL data loader with proper Tinker renderers, loss masking, validation, and deduplication. |
75
84
|`data_selector.py`| Utilities for mining hard examples based on evaluation failures. |
85
+
|`hyperparam_utils.py`| Tinker's recommended LR formula and warmup/cosine scheduler. |
86
+
|`simple_eval.py`| Minimal working evaluator for demo (replace with Inspect AI for production). |
76
87
|`requirements.txt`| Dependencies required to run the script. |
77
88
|`tests/`| Unit and integration tests for all components. |
78
89
@@ -146,9 +157,28 @@ To use EvalOps integration:
146
157
147
158
The client will automatically submit results after each evaluation round, making it easy to track progress over time and compare different fine-tuning runs.
148
159
160
+
## Configuration Options
161
+
162
+
Key configuration parameters in `eval_loop_config.json`:
163
+
164
+
| Parameter | Default | Description |
165
+
|-----------|---------|-------------|
166
+
|`base_model`| - | Model to fine-tune (e.g., "meta-llama/Llama-3.1-8B-Instruct") |
This is a minimal prototype to demonstrate how to build a useful system on top of Tinker. Future extensions could include:
181
+
This is a production-ready prototype demonstrating best practices from Tinker documentation. Future extensions could include:
152
182
153
183
-**Custom data selection** based on evaluation feedback. For example, automatically mine additional examples from your corpora that match prompts where the model performs poorly.
154
184
@@ -171,6 +201,16 @@ The test suite includes:
171
201
-**Integration tests** for the training loop with mocked Tinker/EvalOps services
172
202
-**Coverage** for early stopping, LR decay, and error handling
173
203
204
+
## Implementation Notes
205
+
206
+
**Based on Tinker Documentation:**
207
+
- Uses `renderers.build_supervised_example()` for proper loss masking (trains only on assistant outputs)
208
+
- Implements async futures with `forward_backward_async()` and `optim_step_async()` for performance
209
+
- Uses `save_weights_for_sampler()` for evaluation (not `save_state()` which includes optimizer state)
210
+
- Supports Tinker's recommended LR formula: `LR = 5e-5 × 10 × (2000/H_m)^P_m` with model-specific exponents
211
+
- Includes warmup + cosine decay scheduler for stable training
212
+
- Gracefully falls back when tinker-cookbook unavailable (for testing/development)
213
+
174
214
## Disclaimer
175
215
176
-
This code does not run training jobs by itself; it serves as a scaffold. You'll need an active Tinker API key and appropriate computing quotas to execute the training and evaluation steps. Modify the script to fit your particular needs and model lineup. The hyperparameters and thresholds in the sample config are placeholders and should be adjusted based on your use case and dataset size.
216
+
This code requires an active Tinker API key and appropriate computing quotas to execute training and evaluation. The implementation follows Tinker's documented best practices and is suitable for production use with real evaluation tasks. The simple evaluator is for demo purposes only—replace with Inspect AI integration for production deployments.
0 commit comments