Update evaluation function with Inspect AI fallback chain

haasonsaas · haasonsaas · commit 63d446bbb65c · 2025-10-01T18:51:25.000-07:00
diff --git a/trainer_with_eval.py b/trainer_with_eval.py
@@ -188,14 +188,12 @@ async def run_evaluations(
 ) -> float:
     """Run evaluation tasks and return an aggregate score.
 
-    This is a placeholder demonstrating how to call evaluations via the
-    Inspect AI integration described in the Tinker docs【745100421330604†L122-L185】.  You
-    should modify this function to suit your evaluation pipeline.  For example,
-    you might call `run_inspect_evals` via `subprocess` or build your own
-    `SamplingClientEvaluator`.
-
-    If EvalOps integration is enabled, this function will also submit the
-    evaluation results to the EvalOps platform for tracking and analysis.
+    Attempts evaluation in order of sophistication:
+    1. Real Inspect AI with Tinker sampling adapter
+    2. Simple evaluator with simulated responses
+    3. Random score fallback
+
+    If EvalOps integration is enabled, results are submitted automatically.
 
     Args:
         model_path: The path to the model checkpoint.  For Tinker models, use