feat: support nesting task runs into each other#1126
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant enhancement to the data model by allowing Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds polymorphic, nested TaskRun support: TaskRuns may be children of Tasks or other TaskRuns. Introduces parent-type validators and polymorphic parent loading, DFS search over nested runs, child-path iteration that inspects on-disk model_type, documentation comments, and extensive tests covering new behaviors. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Task
participant ParentRun as TaskRun(parent)
participant FS as FileSystem
participant ChildRun as TaskRun(child)
Client->>Task: find_task_run_by_id_dfs(id)
Task->>Task: iterate self.runs(readonly)
Task->>ParentRun: check run.id
alt id matches
ParentRun-->>Task: return TaskRun
else not matched
ParentRun->>FS: list/load child runs from runs/ subfolder
FS-->>ChildRun: return child TaskRun data
ChildRun->>ChildRun: find_task_run_by_id_dfs(id)
ChildRun-->>ParentRun: return match or None
ParentRun-->>Task: propagate result
end
Task-->>Client: return TaskRun | None
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📊 Coverage ReportOverall Coverage: 91% Diff: origin/leonard/kil-447-feat-stream-multiturn-ai-sdk-openai-protocols...HEAD
Summary
Line-by-lineView line-by-line diff coveragelibs/core/kiln_ai/datamodel/basemodel.pyLines 645-653 645 if parent_types_override is None:
646 # Default: single expected parent type — original behaviour
647 parent = cls.parent_type().load_from_file(parent_path)
648 if parent is None:
! 649 raise ValueError("Parent must be set to load children")
650 else:
651 # Polymorphic parent: read only the model_type field to avoid a full load.
652 with open(parent_path, "r", encoding="utf-8") as fh:
653 actual_parent_type_name = json.loads(fh.read()).get("model_type", "")Lines 664-672 664 if t.type_name() == actual_parent_type_name
665 )
666 parent = parent_type.load_from_file(parent_path)
667 if parent is None:
! 668 raise ValueError("Parent must be set to load children")
669
670 # Ignore type error: this is abstract base class, but children must implement relationship_name
671 relationship_folder = parent_folder / Path(cls.relationship_name()) # type: ignorelibs/core/kiln_ai/datamodel/task_run.pyLines 159-167 159
160 if not isinstance(parent, TaskRun):
161 # the parent is not a TaskRun, but also not a Task, so it is not
162 # a real parent
! 163 return None
164
165 # the parent is a TaskRun, so we just walk up the tree until we find a Task
166 current = parentLines 180-188 180 return [Task, TaskRun]
181
182 def runs(self, readonly: bool = False) -> list["TaskRun"]:
183 """The list of child task runs."""
! 184 return super().runs(readonly=readonly) # type: ignore
185
186 def is_root_task_run(self) -> bool:
187 """Is this the root task run? (not nested under another task run)"""
188 # lazy import to avoid circular dependencyLines 232-240 232 loaded_parent_task = Task.load_from_file(task_path)
233 super().__setattr__("parent", loaded_parent_task)
234 return loaded_parent_task
235
! 236 return None
237
238 @model_validator(mode="after")
239 def check_parent_type(self) -> Self:
240 """Check that the parent is a Task or TaskRun. This overrides the default parent type check
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant feature by enabling the nesting of TaskRun objects, which is crucial for modeling conversations or multi-step tasks. The implementation is robust, with thoughtful changes to the data models, including support for polymorphic parents in KilnBaseModel. The addition of DFS search methods on Task and TaskRun is a valuable utility for navigating the new nested structure, and the test coverage is comprehensive. I have a few suggestions to enhance maintainability and robustness, primarily concerning code duplication and a recursive implementation that could be made iterative. Overall, this is an excellent and well-executed pull request.
|
|
||
| @model_validator(mode="after") | ||
| def check_parent_type(self) -> Self: | ||
| def _check_parent_type( |
There was a problem hiding this comment.
This is just wrapping the internal of the default check_parent_type to allow calling it from the subclass (TaskRun) and passing the multiple parent types we allow.
| return self | ||
|
|
||
| # some models support having multiple parent types, so we allow overriding the expected parent | ||
| if expected_parent_types is not None: |
There was a problem hiding this comment.
This if branch is the new part - we can override the default check_parent_type from the subclass and pass in the types we allow (in the case of TaskRun, we allow Task and TaskRun as valid parents)
| return self | ||
|
|
||
| @model_validator(mode="after") | ||
| def check_parent_type(self) -> Self: |
There was a problem hiding this comment.
Backward compatible - no change there in logic
|
|
||
|
|
||
| class TaskRun(KilnParentedModel): | ||
| class TaskRun(KilnParentedModel, KilnParentModel, parent_of={}): |
There was a problem hiding this comment.
The parent_of is set further down procedurally (cannot declare it here because it references itself)
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@libs/core/kiln_ai/datamodel/basemodel.py`:
- Around line 626-629: Restore the parent-type guard for non-polymorphic child
lookups by reinstating a check that verifies the provided parent is an allowed
type before proceeding (do not rely solely on parent_path.exists()). Add an
overridable hook on KilnParentedModel (e.g., accepted_parent_types or
is_accepted_parent(parent_type)) and use it in all_children_of_parent_path() and
from_id_and_parent_path() to enforce the original defensive validation for
typical models while allowing TaskRun (or other polymorphic cases) to override
the hook to accept multiple parent types; if the parent type is not accepted,
raise the same ValueError("Parent must be set to load children").
In `@libs/core/kiln_ai/datamodel/task_run.py`:
- Around line 198-205: The code currently swallows a ValueError when loading a
sibling task_run file, causing malformed nested runs to appear as root runs;
modify the block that calls TaskRun.load_from_file (referencing
TaskRun.base_filename(), TaskRun.load_from_file, and the "parent" attribute
assignment) to not silently pass on exceptions—instead re-raise or wrap the
caught ValueError with context (e.g., include the problematic task_run_path) so
the failure surfaces immediately; this ensures parent_task(), parent_run(), and
is_root_task_run() cannot mistakenly treat a broken parent as missing.
- Around line 201-213: The lazy parent-loading in TaskRun.parent (during
TaskRun.load_from_file) ignores the readonly flag and always constructs mutable
ancestors; update the loading calls to propagate the original readonly intent by
passing readonly=True when the current TaskRun was loaded readonly (e.g.,
forward the readonly parameter into TaskRun.load_from_file and
Task.load_from_file calls inside the parent resolution path), so
parent_run()/parent_task() return immutable instances and participate in the
shared readonly cache (refer to TaskRun.load_from_file, the parent attribute
assignment, and parent_run()/parent_task() resolution logic).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 406f488e-1721-49ee-a144-b0706b755b64
📒 Files selected for processing (5)
app/web_ui/src/lib/api_schema.d.tslibs/core/kiln_ai/datamodel/basemodel.pylibs/core/kiln_ai/datamodel/task.pylibs/core/kiln_ai/datamodel/task_run.pylibs/core/kiln_ai/datamodel/test_models.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
libs/core/kiln_ai/datamodel/test_models.py (1)
792-976: Consider extracting a helper for repeated nested-run setup.Several tests recreate the same 2–3 level run chain. A small factory/helper would reduce duplication and make intent clearer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/core/kiln_ai/datamodel/test_models.py` around lines 792 - 976, Tests like test_nested_task_runs_multiple_levels, test_parent_task_deeply_nested_task_run, test_find_task_run_by_id_dfs_finds_deeply_nested_run, and others repeatedly build 2–3 level TaskRun chains with TaskRun(..., parent=...) and save_to_file(); extract a small helper (e.g., create_nested_runs or make_run_chain) that accepts the root Task, a list of input values (or depth), creates the chain using TaskRun and save_to_file, returns the created runs (or final run), and then replace the duplicated setup code in those tests to call the helper and assert against its results (use TaskRun, save_to_file, .id, .path, and relevant test assertions unchanged).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@libs/core/kiln_ai/datamodel/test_models.py`:
- Line 3: The import "Union" from typing in test_models.py is unused and causing
a Ruff F401 lint failure; remove "Union" from the import statement (the line
"from typing import Union") or replace the import with only the actually used
names so that the module no longer imports the unused symbol.
---
Nitpick comments:
In `@libs/core/kiln_ai/datamodel/test_models.py`:
- Around line 792-976: Tests like test_nested_task_runs_multiple_levels,
test_parent_task_deeply_nested_task_run,
test_find_task_run_by_id_dfs_finds_deeply_nested_run, and others repeatedly
build 2–3 level TaskRun chains with TaskRun(..., parent=...) and save_to_file();
extract a small helper (e.g., create_nested_runs or make_run_chain) that accepts
the root Task, a list of input values (or depth), creates the chain using
TaskRun and save_to_file, returns the created runs (or final run), and then
replace the duplicated setup code in those tests to call the helper and assert
against its results (use TaskRun, save_to_file, .id, .path, and relevant test
assertions unchanged).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: d68b321f-be31-4161-ab8a-81bbe4afe32c
📒 Files selected for processing (1)
libs/core/kiln_ai/datamodel/test_models.py
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
libs/core/kiln_ai/datamodel/task_run.py (1)
180-182:runs()is defined twice; the later dynamic wiring overwrites the class method.This creates dead code/confusion around which implementation is active. Keep one source of truth for
runs().Also applies to: 369-370
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/core/kiln_ai/datamodel/task_run.py` around lines 180 - 182, There are two definitions for runs() causing the later dynamic wiring to overwrite the class method; pick a single source of truth by removing one duplicate. Either keep the static method signature def runs(self, readonly: bool = False) -> list["TaskRun"] and delete the later dynamic wiring that reassigns/overwrites runs, or remove the static definition and ensure the dynamic wiring provides the full implementation (including type hints and docstring) so only one runs implementation remains; update references to TaskRun.runs accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@libs/core/kiln_ai/datamodel/task_run.py`:
- Around line 188-201: The DFS in find_task_run_by_id_dfs misses the current
node because it seeds stack with list(self.runs(...)) only; update the traversal
to include self in the search (respecting the readonly argument) by either
checking if self.id == task_run_id before the loop or seeding stack with [self]
+ list(self.runs(readonly=readonly)) so the method correctly searches the entire
task run tree starting at this node.
- Around line 166-171: parent_run() currently only checks cached_parent() and
thus can miss a lazily-loadable parent; change it to call the loader (use
self.parent()) so the parent is loaded if necessary, then validate its type
(isinstance(..., TaskRun)) and return it or None; update the logic in the
parent_run method to prefer self.parent() over self.cached_parent() while
keeping the same return behavior.
---
Nitpick comments:
In `@libs/core/kiln_ai/datamodel/task_run.py`:
- Around line 180-182: There are two definitions for runs() causing the later
dynamic wiring to overwrite the class method; pick a single source of truth by
removing one duplicate. Either keep the static method signature def runs(self,
readonly: bool = False) -> list["TaskRun"] and delete the later dynamic wiring
that reassigns/overwrites runs, or remove the static definition and ensure the
dynamic wiring provides the full implementation (including type hints and
docstring) so only one runs implementation remains; update references to
TaskRun.runs accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 7b207c8d-a0de-4ef2-a996-ed571bb0b889
📒 Files selected for processing (1)
libs/core/kiln_ai/datamodel/task_run.py
| return schema_from_json_str(self.input_json_schema, require_object=False) | ||
|
|
||
| # These wrappers help for typechecking. We should fix this in KilnParentModel | ||
| def runs(self, readonly: bool = False) -> list[TaskRun]: |
There was a problem hiding this comment.
Does the nested runs change the meaning of task.runs() here? Does this API need to traverse all the nested runs?
This scans only the runs under the task’s runs/ folder and skips the nested runs.
There was a problem hiding this comment.
It does not change the meaning - it works the same, finds the children of this task runs (not all the descendants). And there are utils to DFS to a specific TaskRun ID given the root TaskRun ID or the Task ID.
The point of nested TaskRun is to allow saving an entire snapshot of a conversation - where each turn produces a new snapshot of the conversation (the path to it is the path from the first message through all the descendant TaskRuns part of this conversation). Each TaskRun in the tree has the entire trace.
…tocols' of github.com:Kiln-AI/Kiln into leonard/kil-461-feat-nesting-task-runs-into-each-other
|
|
||
| return [Task, TaskRun] | ||
|
|
||
| def runs(self, readonly: bool = False) -> list["TaskRun"]: |
There was a problem hiding this comment.
In the future we will have a structure like task/runs/<run1>/runs/<run2>/runs/<run3>/task_run.kiln and some of the APIs still assumes there is only "1 level" of runs.
run_api.py:
- get_runs()
- get_runs_summary()...etc
dataset_split.py
- _get_runs()...etc
In general do we have to worry about those? What about usages in eval / fine-tune...etc or do we assume at those call paths we only care about "immediate children" and not all the decedents?
There was a problem hiding this comment.
We most likely won't have the hierarchy surfacing in the API.
Each TaskRun is a full self-contained snapshot / version of the conversation at some point in time - it does not need to know its ancestry path to be used as it has the full trace. We will likely only ever need to find a TaskRun by its ID. For that we have the method to find_task_run_by_id_dfs.
Collecting all descendant runs of the whole subtree is trivial tree traversal if we ever need it. We can have that in a helper if we ever need to do that, but it does not need to replace the method for getting the children.
And this is only used for multiturn - we don't support a lot of downstream use cases for multiturn at the moment. Eval, finetune, UI, is up in the air. This is at the moment mostly for persistence in a multiturn scenario using Kiln as an SDK.
Once we know what the UI will look like for multiturn, we will probably do another pass to see where we want to surface the whole tree, where we want only the leaves or only the roots
| return None | ||
|
|
||
| # the parent is a TaskRun, so we just walk up the tree until we find a Task | ||
| current = parent # type: ignore |
There was a problem hiding this comment.
Should we avoid # type: ignore by checking directly with TaskRun class type?
There was a problem hiding this comment.
Was using the class name for consistency with how we do it in the other datamodels, but isinstance works too - swapped them out here: e26e52d
|
|
||
| def is_root_task_run(self) -> bool: | ||
| """Is this the root task run? (not nested under another task run)""" | ||
| return self.parent is None or self.parent.__class__.__name__ == "Task" |
There was a problem hiding this comment.
Should we use self.cached_parent() otherwise it might not be valid until load_parent is called?
There was a problem hiding this comment.
self.parent calls the get_attributes override (which overrides the dot syntax accessor) on the KilnBaseModel (the class this inherits from).
That override lazy loads the parent from disk if it is not cached
f9af9d1
into
leonard/kil-447-feat-stream-multiturn-ai-sdk-openai-protocols
What does this PR do?
PR going into #1107 (multiturn + streaming)
Allow nesting
TaskRuninto each other. This allows forking and each step in the conversation to be a complete and immutable snapshot with its own ID - avoids headaches with concurrent writes and so on.Our current modeling only allows for one parent type per model, so this required refactoring and dropping a check on the
BaseModel- if a parent is of the wrong type up the tree, we won't find out, but it will break elsewhere anyway, so seems like a reasonable tradeoff to make.Also added helper query wrappers on the
TaskandTaskRun- since now given a TaskRun->id, we will need to DFS over all the descendants of the Task - unless we now any one of its ancestors' ID (in which case we can query starting with that ancestor as root for our query).Checklists
Summary by CodeRabbit
New Features
Documentation
Tests