Skip to content

refactor: support multiturn chat#1088

Closed
leonardmq wants to merge 6 commits intomainfrom
leonard/kil-421-adapter-support-multiturn
Closed

refactor: support multiturn chat#1088
leonardmq wants to merge 6 commits intomainfrom
leonard/kil-421-adapter-support-multiturn

Conversation

@leonardmq
Copy link
Collaborator

@leonardmq leonardmq commented Feb 26, 2026

What does this PR do?

Issue: we currently only support single turn chars (system->user->assistant - and a couple of slight variants we internally implement like two message CoT).

This PR adds support for multiturn conversation, letting the history of conversation accumulate. It currently works as follows:

  • create a run as usual; this produces a TaskRun, that contains the trace, and the output and intermediate_outputs are extracted from the latest message (coming from the assistant); no change there
  • to continue a conversation, we can call the same method but pass in an existing TaskRun->id; we internally load the existing TaskRun->trace, create a new run (appending the new user message to the existing trace), and replace the output and intermediate_outputs with the new latest message

This is fine for SDK usage to just do inference; but this opens up downstream side-effects on our app:

  • TaskRun are no longer immutable; a single turn run may turn into a multiturn run as a result of the user adding new messages into the run; an existing multiturn run may get new messages appended to it; thereby changing the latest message
  • Evals are dependent on TaskRun, but if the task runs change, evals cannot be compared to previous runs; or at least we cannot guarantee that through the modeling
  • etc.

We will need to implement restrictions on the UI / user side to prevent users from causing downstream side effects we do not want.

Checklists

  • Tests have been run locally and passed
  • New tests have been added to any work in /lib

Summary by CodeRabbit

Release Notes

  • New Features

    • Session continuation: Tasks can be resumed from previous runs with full conversation history preserved, enabling users to extend multi-turn interactions and build upon prior exchanges without losing context.
  • Tests

    • Added comprehensive test coverage for session continuation and multi-turn conversation formatting.
  • Limitations

    • MCP adapter: Does not support session continuation and maintains single-turn behavior.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @leonardmq, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the AI system's conversational capabilities by enabling multi-turn interactions. It refactors the core logic to allow existing TaskRun instances to be updated with new messages, effectively maintaining conversation history. This change introduces a specialized formatter for managing the flow of multi-turn dialogues and ensures that certain adapters, like the MCP adapter, correctly handle their single-turn nature. The overall impact is a more dynamic and stateful conversational experience for users.

Highlights

  • Multi-turn Chat Support: Introduced the ability to continue existing chat sessions by passing a task_run_id, allowing new user messages to be appended to a run's trace.
  • Mutable TaskRun Objects: Modified TaskRun objects to be mutable, enabling updates to their output, trace, usage, and intermediate outputs when a session is continued.
  • MultiturnFormatter: Added a new MultiturnFormatter class to handle the formatting of messages for multi-turn conversations, seeding new turns with the prior conversation trace.
  • MCP Adapter Limitations: Explicitly marked the MCP (Multi-Component Pipeline) adapter as not supporting multi-turn conversations, raising NotImplementedError if a task_run_id or prior_trace is provided.
  • API Schema Update: Updated the API schema to include an optional task_run_id field in the RunTaskRequest for initiating multi-turn interactions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • app/web_ui/src/lib/api_schema.d.ts
    • Added task_run_id property to the components.schemas.RunTaskRequest interface, allowing clients to specify an existing run to continue a session.
  • libs/core/kiln_ai/adapters/chat/init.py
    • Imported MultiturnFormatter.
    • Exported MultiturnFormatter in __all__.
  • libs/core/kiln_ai/adapters/chat/chat_formatter.py
    • Imported ChatCompletionMessageParam type.
    • Added initial_messages abstract method to ChatFormatter base class.
    • Implemented MultiturnFormatter class to handle multi-turn conversation logic, initializing with a prior_trace and appending new user input.
  • libs/core/kiln_ai/adapters/chat/test_chat_formatter.py
    • Imported MultiturnFormatter for testing.
    • Added test_multiturn_formatter_initial_messages to verify initial message seeding.
    • Added test_multiturn_formatter_next_turn to test the multi-turn message progression.
  • libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
    • Imported MultiturnFormatter and ID_TYPE.
    • Modified invoke and _run_returning_run_output methods to accept an optional task_run_id parameter.
    • Added logic to load an existing TaskRun based on task_run_id and retrieve its trace.
    • Updated _run abstract method signature to include an optional prior_trace parameter.
    • Modified build_chat_formatter to return a MultiturnFormatter if prior_trace is provided.
    • Refactored generate_run to accept an optional existing_run and merge new output, usage, and intermediate outputs into it if present, otherwise creating a new TaskRun.
  • libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
    • Updated _run method signature to accept an optional prior_trace parameter.
    • Modified _run to use chat_formatter.initial_messages() to seed the conversation, which will include the prior_trace if provided.
  • libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py
    • Updated _run, invoke, and invoke_returning_run_output method signatures to accept optional prior_trace or task_run_id parameters.
    • Added NotImplementedError checks in _run, invoke, and invoke_returning_run_output to prevent multi-turn conversations with the MCP adapter.
  • libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py
    • Updated _run method signature in MockAdapter to accept prior_trace.
    • Updated mock_run function signatures in various tests to accept prior_trace.
    • Added test_build_chat_formatter_with_prior_trace_returns_multiturn_formatter to verify correct formatter selection.
    • Added test_task_run_id_task_path_none_raises to ensure error handling for missing task path.
    • Added test_invoke_returning_run_output_passes_task_run_id_to_run to confirm task_run_id is passed correctly.
  • libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
    • Imported ModelTurnResult for testing.
    • Added test_run_with_prior_trace_uses_multiturn_formatter to verify multi-turn behavior in LiteLLM adapter.
  • libs/core/kiln_ai/adapters/model_adapters/test_mcp_adapter.py
    • Added test_mcp_adapter_rejects_multiturn_invoke_returning_run_output to confirm multi-turn rejection.
    • Added test_mcp_adapter_rejects_multiturn_invoke to confirm multi-turn rejection.
    • Added test_mcp_adapter_rejects_prior_trace_in_run to confirm multi-turn rejection.
  • libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
    • Imported MagicMock for testing.
    • Updated _run method signature in MockAdapter to accept prior_trace.
    • Added test_invoke_continue_session to verify session continuation and run updates.
    • Added test_invoke_continue_invalid_task_run_id for invalid run ID handling.
    • Added test_invoke_continue_run_without_trace for runs missing trace history.
    • Added test_generate_run_with_existing_run_merges_usage_and_intermediate_outputs to test data merging.
  • libs/core/kiln_ai/adapters/model_adapters/test_structured_output.py
    • Updated _run method signature in MockAdapter to accept prior_trace.
  • libs/core/kiln_ai/adapters/test_prompt_builders.py
    • Updated _run method signature in MockAdapter to accept prior_trace.
  • libs/core/kiln_ai/datamodel/test_basemodel.py
    • Updated _run method signature in MockAdapter to accept prior_trace.
  • libs/server/kiln_server/run_api.py
    • Added task_run_id field to RunTaskRequest Pydantic model.
    • Modified run_task endpoint to pass request.task_run_id to adapter.invoke.
  • libs/server/kiln_server/test_run_api.py
    • Imported os, Path, KilnBuiltInToolId, and Config for new tests and fixtures.
    • Added test_run_task_with_task_run_id_continues_session to verify API endpoint passes task_run_id for session continuation.
    • Added adapter_sanity_check_setup and adapter_sanity_check_math_tools_setup fixtures for integration testing.
    • Added _assert_math_tools_response helper function for math tool test assertions.
    • Added test_run_task_adapter_sanity_math_tools for multi-turn math tool interaction with gpt_4o_mini.
Activity
  • Tests have been run locally and passed.
  • New tests have been added to cover the multi-turn chat functionality in /lib.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR implements multi-turn conversation continuation by introducing a MultiturnFormatter class that resumes from prior message traces. The base adapter now accepts existing_run parameters to retrieve prior conversation history, passing it through the chat formatting pipeline, while model-specific adapters (LiteLLM supports continuation; MCP rejects it) handle the resumed trace appropriately.

Changes

Cohort / File(s) Summary
Chat Formatter Core
libs/core/kiln_ai/adapters/chat/__init__.py, libs/core/kiln_ai/adapters/chat/chat_formatter.py
Added ChatCompletionMessageIncludingLiteLLM type alias and new MultiturnFormatter class that accepts prior traces and resumes conversation from previous messages. Added initial_messages() method to base ChatFormatter class. Exported both new entities as public API.
BaseAdapter Enhancement
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
Extended invoke, _run_returning_run_output, and invoke_returning_run_output with optional existing_run parameter. Added prior_trace parameter to _run and build_chat_formatter. Modified generate_run to merge outputs and usage when continuing existing runs, extracting prior trace from existing run's message history.
LiteLLMAdapter Continuation Support
libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
Added prior_trace parameter to _run method. Changed initial messages initialization to use chat_formatter.initial_messages() instead of empty list, enabling continuation of prior traces through the same model turn loop.
MCPAdapter Single-Turn Enforcement
libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py
Added prior_trace and existing_run parameters to _run, invoke, and invoke_returning_run_output methods; raises NotImplementedError when these are provided to enforce single-turn-only behavior.
Formatter Tests
libs/core/kiln_ai/adapters/chat/test_chat_formatter.py
Added tests for MultiturnFormatter: validates initial_messages returns prior trace and next_turn handles first turn with final call and returns None on subsequent turns.
BaseAdapter Tests
libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py
Extended MockAdapter._run signature with prior_trace parameter. Added tests validating MultiturnFormatter creation with prior trace, continuation error handling (missing trace), and prior trace threading through run invocations.
LiteLLMAdapter & MCP Tests
libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py, libs/core/kiln_ai/adapters/model_adapters/test_mcp_adapter.py
Added test for multi-turn formatter usage with prior trace in LiteLLM. Added tests validating MCP adapter rejects continuation attempts via NotImplementedError.
Session Continuation Tests
libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
Added tests for session continuation: validating trace updates, output merging, run reference preservation, and error handling when continuing without trace history.
Remaining Model Tests
libs/core/kiln_ai/adapters/model_adapters/test_structured_output.py, libs/core/kiln_ai/adapters/test_prompt_builders.py, libs/core/kiln_ai/datamodel/test_basemodel.py
Updated MockAdapter._run signature across test files to accept optional prior_trace parameter for consistency with new base adapter protocol.
Server Integration Test
libs/server/kiln_server/test_run_api.py
Added sanity test for math tools via OpenRouter with trace extension support; includes helper fixtures and assertion utilities for end-to-end adapter validation.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant BaseAdapter
    participant ChatFormatter
    participant LiteLLMAdapter
    participant LLM

    User->>BaseAdapter: invoke(input, existing_run)
    Note over BaseAdapter: Detect prior_trace from<br/>existing_run.trace
    BaseAdapter->>ChatFormatter: build_chat_formatter(input, prior_trace)
    alt prior_trace provided
        ChatFormatter-->>ChatFormatter: Create MultiturnFormatter
    else no prior_trace
        ChatFormatter-->>ChatFormatter: Create standard Formatter
    end
    ChatFormatter-->>BaseAdapter: Formatter instance
    BaseAdapter->>BaseAdapter: _run_returning_run_output(input, prior_trace)
    BaseAdapter->>LiteLLMAdapter: _run(input, prior_trace)
    LiteLLMAdapter->>LiteLLMAdapter: messages = initial_messages()
    alt prior_trace provided
        LiteLLMAdapter->>LiteLLMAdapter: messages includes prior conversation
    end
    LiteLLMAdapter->>LiteLLMAdapter: next_turn() processes new input
    LiteLLMAdapter->>LLM: Call LLM with full message history
    LLM-->>LiteLLMAdapter: Response
    LiteLLMAdapter-->>BaseAdapter: RunOutput with merged trace
    BaseAdapter->>BaseAdapter: generate_run(merged_outputs)
    BaseAdapter-->>User: TaskRun (continued session)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • scosman
  • sfierro

Poem

🐰 bounces with glee
Prior words are kept, no need to forget!
The traces live on, a conversation set,
Each turn remembers what came before,
MultiturnFormatter opens the door!
Hop through the context, message by message bright,
Where multi-turn chats reach their full flight! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'refactor: support multiturn chat' accurately and concisely summarizes the main change—adding multiturn conversation support to the chat system.
Description check ✅ Passed The PR description provides a comprehensive explanation of what the PR does, related issues context, and completed checklists matching the template requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch leonard/kil-421-adapter-support-multiturn

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (1)

131-136: Type inconsistency: task_run_id parameter type.

In invoke() the parameter is typed as ID_TYPE | None, but in mcp_adapter.py (lines 88, 105) it's typed as str | None. For consistency and to leverage the type alias, consider using ID_TYPE throughout or verifying ID_TYPE is indeed str.

#!/bin/bash
# Verify ID_TYPE definition and usage consistency
ast-grep --pattern 'ID_TYPE = $_'
rg -n "task_run_id:" --type=py | head -20
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/model_adapters/base_adapter.py` around lines 131 -
136, The task_run_id parameter type is inconsistent between base_adapter.invoke
(uses ID_TYPE | None) and mcp_adapter (uses str | None); update the code to use
the same type alias everywhere by replacing explicit str | None with ID_TYPE |
None (or alternatively ensure the ID_TYPE alias is defined as str and update its
definition) so invoke(), invoke_returning_run_output, and the mcp adapter
methods share a consistent type for task_run_id; locate symbols invoke,
invoke_returning_run_output, and the task_run_id annotations in mcp_adapter to
apply the change.
libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py (1)

88-88: Consider using ID_TYPE for consistency with BaseAdapter.

The task_run_id parameter is typed as str | None here but as ID_TYPE | None in base_adapter.py. For type consistency across the codebase, consider aligning with the type alias.

Also applies to: 105-105

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py` at line 88, Replace
the explicit str | None typing for the task_run_id parameter with the shared
alias ID_TYPE | None to match BaseAdapter; update each function/method in
mcp_adapter.py where the parameter task_run_id appears (both occurrences
flagged) so their signatures use ID_TYPE | None and adjust any related imports
to import ID_TYPE from the module where it is defined.
libs/server/kiln_server/test_run_api.py (1)

1725-1732: Unused helper functions — consider removing.

_adapter_sanity_check_output_path() and _append_to_sanity_check() are defined but not called anywhere in this file. If these are leftover from development, they should be removed to avoid confusion.

🗑️ Remove unused code
-def _adapter_sanity_check_output_path() -> Path:
-    return Path(__file__).resolve().parent / "adapter_sanity_check.txt"
-
-
-def _append_to_sanity_check(content: str, output_path: Path) -> None:
-    with open(output_path, "a", encoding="utf-8") as f:
-        f.write(content)
-        f.write("\n")
-
-
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/server/kiln_server/test_run_api.py` around lines 1725 - 1732, Remove the
two unused helper functions _adapter_sanity_check_output_path() and
_append_to_sanity_check() from the file: delete their definitions (the functions
named _adapter_sanity_check_output_path and _append_to_sanity_check) since they
are not referenced anywhere in the test_run_api.py module to avoid dead code and
confusion; if they are intended for future use, alternatively add a clear TODO
comment and a test that exercises them, but otherwise simply remove both
function definitions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py`:
- Around line 433-451: The test test_task_run_id_task_path_none_raises fails
because model_provider() is invoked during MockAdapter initialization and raises
an API key error; update the test to mock/stub model_provider (or the OpenAI
provider init) so it doesn't perform real API key validation before exercising
the path check. Specifically, in the test_task_run_id_task_path_none_raises
where MockAdapter is constructed with KilnAgentRunConfigProperties and
adapter.invoke is awaited, patch the model_provider function (or the
class/method used by MockAdapter to obtain the provider) to return a benign/mock
provider object so the ValueError from adapter.invoke("input",
task_run_id="some-id") (about task.path) is the raised exception as expected.

In `@libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py`:
- Around line 1309-1353: The test
test_run_with_prior_trace_uses_multiturn_formatter calls adapter._run which
internally calls model_provider() before _run_model_turn, causing an API key
lookup; update the test to stub/mocking the adapter.model_provider method (or
attribute) to return a simple dummy provider object (e.g., a dict or a small
stub class) so no external key lookup happens, then proceed to set
adapter._run_model_turn as already done; ensure the mock is assigned on the same
adapter instance created from LiteLlmAdapter so build_chat_formatter and _run
use the stubbed provider.

In `@libs/server/kiln_server/test_run_api.py`:
- Around line 1735-1778: The adapter_sanity_check_setup fixture contains a
hardcoded user path, an unused tmp_path parameter, and appears unused; either
delete the adapter_sanity_check_setup fixture entirely (and remove its tmp_path
parameter) or refactor it to mirror adapter_sanity_check_math_tools_setup by
using tmp_path for project_path (e.g., project_path = tmp_path /
"adapter_sanity_project" / "project.kiln"), create the directory and
Project/Task there via Project(name=..., path=str(project_path)) and Task(...,
parent=project), register the temporary project into
Config.shared()._settings["projects"], yield the {"project","task"} dict, and
restore original_projects after the yield; ensure you update references to
adapter_sanity_check_setup or remove any unused fixture imports.

---

Nitpick comments:
In `@libs/core/kiln_ai/adapters/model_adapters/base_adapter.py`:
- Around line 131-136: The task_run_id parameter type is inconsistent between
base_adapter.invoke (uses ID_TYPE | None) and mcp_adapter (uses str | None);
update the code to use the same type alias everywhere by replacing explicit str
| None with ID_TYPE | None (or alternatively ensure the ID_TYPE alias is defined
as str and update its definition) so invoke(), invoke_returning_run_output, and
the mcp adapter methods share a consistent type for task_run_id; locate symbols
invoke, invoke_returning_run_output, and the task_run_id annotations in
mcp_adapter to apply the change.

In `@libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py`:
- Line 88: Replace the explicit str | None typing for the task_run_id parameter
with the shared alias ID_TYPE | None to match BaseAdapter; update each
function/method in mcp_adapter.py where the parameter task_run_id appears (both
occurrences flagged) so their signatures use ID_TYPE | None and adjust any
related imports to import ID_TYPE from the module where it is defined.

In `@libs/server/kiln_server/test_run_api.py`:
- Around line 1725-1732: Remove the two unused helper functions
_adapter_sanity_check_output_path() and _append_to_sanity_check() from the file:
delete their definitions (the functions named _adapter_sanity_check_output_path
and _append_to_sanity_check) since they are not referenced anywhere in the
test_run_api.py module to avoid dead code and confusion; if they are intended
for future use, alternatively add a clear TODO comment and a test that exercises
them, but otherwise simply remove both function definitions.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 05f9d4d and baba621.

📒 Files selected for processing (16)
  • app/web_ui/src/lib/api_schema.d.ts
  • libs/core/kiln_ai/adapters/chat/__init__.py
  • libs/core/kiln_ai/adapters/chat/chat_formatter.py
  • libs/core/kiln_ai/adapters/chat/test_chat_formatter.py
  • libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_mcp_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
  • libs/core/kiln_ai/adapters/model_adapters/test_structured_output.py
  • libs/core/kiln_ai/adapters/test_prompt_builders.py
  • libs/core/kiln_ai/datamodel/test_basemodel.py
  • libs/server/kiln_server/run_api.py
  • libs/server/kiln_server/test_run_api.py

Comment on lines +433 to +451
@pytest.mark.asyncio
async def test_task_run_id_task_path_none_raises(base_project):
task = Task(
name="test_task",
instruction="test_instruction",
parent=base_project,
)
assert task.path is None
adapter = MockAdapter(
task=task,
run_config=KilnAgentRunConfigProperties(
model_name="test_model",
model_provider_name="openai",
prompt_id="simple_prompt_builder",
structured_output_mode="json_schema",
),
)
with pytest.raises(ValueError, match="task has no path"):
await adapter.invoke("input", task_run_id="some-id")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Pipeline failure: Test raises wrong exception due to missing mocks.

The test expects a ValueError with message "task has no path", but the model_provider() call happens first and raises an API key error because OpenAI provider initialization isn't mocked.

🐛 Proposed fix: Mock model_provider to avoid API key check
 `@pytest.mark.asyncio`
 async def test_task_run_id_task_path_none_raises(base_project):
     task = Task(
         name="test_task",
         instruction="test_instruction",
         parent=base_project,
     )
     assert task.path is None
     adapter = MockAdapter(
         task=task,
         run_config=KilnAgentRunConfigProperties(
             model_name="test_model",
             model_provider_name="openai",
             prompt_id="simple_prompt_builder",
             structured_output_mode="json_schema",
         ),
     )
+
+    # Mock model_provider to avoid API key validation
+    provider = MagicMock()
+    provider.parser = None
+    provider.formatter = None
+    provider.reasoning_capable = False
+    adapter.model_provider = MagicMock(return_value=provider)
+
     with pytest.raises(ValueError, match="task has no path"):
         await adapter.invoke("input", task_run_id="some-id")
🧰 Tools
🪛 GitHub Actions: Build and Test

[error] 450-451: Regex pattern 'task has no path' did not match. The test expected a ValueError with message 'task has no path' but OpenAI API key initialization raised a different error: 'Attempted to use OpenAI without an API key set. Get your API key from https://platform.openai.com/account/api-keys'.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py` around lines
433 - 451, The test test_task_run_id_task_path_none_raises fails because
model_provider() is invoked during MockAdapter initialization and raises an API
key error; update the test to mock/stub model_provider (or the OpenAI provider
init) so it doesn't perform real API key validation before exercising the path
check. Specifically, in the test_task_run_id_task_path_none_raises where
MockAdapter is constructed with KilnAgentRunConfigProperties and adapter.invoke
is awaited, patch the model_provider function (or the class/method used by
MockAdapter to obtain the provider) to return a benign/mock provider object so
the ValueError from adapter.invoke("input", task_run_id="some-id") (about
task.path) is the raised exception as expected.

Comment on lines +1309 to +1353
@pytest.mark.asyncio
async def test_run_with_prior_trace_uses_multiturn_formatter(config, mock_task):
prior_trace = [
{"role": "user", "content": "hi"},
{"role": "assistant", "content": "hello"},
]
adapter = LiteLlmAdapter(config=config, kiln_task=mock_task)

build_chat_formatter_calls = []

original_build = adapter.build_chat_formatter

def capturing_build(input, prior_trace_arg=None):
build_chat_formatter_calls.append((input, prior_trace_arg))
return original_build(input, prior_trace_arg)

adapter.build_chat_formatter = capturing_build

async def mock_run_model_turn(
provider, prior_messages, top_logprobs, skip_response_format
):
extended = list(prior_messages)
extended.append({"role": "assistant", "content": "How can I help?"})
return ModelTurnResult(
assistant_message="How can I help?",
all_messages=extended,
model_response=None,
model_choice=None,
usage=Usage(),
)

adapter._run_model_turn = mock_run_model_turn

run_output, _ = await adapter._run("follow-up", prior_trace=prior_trace)

assert len(build_chat_formatter_calls) == 1
assert build_chat_formatter_calls[0][0] == "follow-up"
assert build_chat_formatter_calls[0][1] == prior_trace

assert run_output.trace is not None
assert len(run_output.trace) == 4
assert run_output.trace[0]["content"] == "hi"
assert run_output.trace[1]["content"] == "hello"
assert run_output.trace[2]["content"] == "follow-up"
assert run_output.trace[3]["content"] == "How can I help?"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test requires additional mocking to avoid API key lookup.

The pipeline failure indicates model_provider() is called before _run_model_turn, triggering the OpenRouter API key check. The test should mock model_provider() to bypass this.

🐛 Proposed fix to mock model_provider
 `@pytest.mark.asyncio`
 async def test_run_with_prior_trace_uses_multiturn_formatter(config, mock_task):
     prior_trace = [
         {"role": "user", "content": "hi"},
         {"role": "assistant", "content": "hello"},
     ]
     adapter = LiteLlmAdapter(config=config, kiln_task=mock_task)
 
     build_chat_formatter_calls = []
 
     original_build = adapter.build_chat_formatter
 
     def capturing_build(input, prior_trace_arg=None):
         build_chat_formatter_calls.append((input, prior_trace_arg))
         return original_build(input, prior_trace_arg)
 
     adapter.build_chat_formatter = capturing_build
 
+    # Mock model_provider to avoid API key lookup
+    mock_provider = Mock()
+    mock_provider.model_id = "test-model"
+    adapter.model_provider = Mock(return_value=mock_provider)
+
     async def mock_run_model_turn(
         provider, prior_messages, top_logprobs, skip_response_format
     ):
🧰 Tools
🪛 GitHub Actions: Build and Test

[error] 1342-1342: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py` around
lines 1309 - 1353, The test test_run_with_prior_trace_uses_multiturn_formatter
calls adapter._run which internally calls model_provider() before
_run_model_turn, causing an API key lookup; update the test to stub/mocking the
adapter.model_provider method (or attribute) to return a simple dummy provider
object (e.g., a dict or a small stub class) so no external key lookup happens,
then proceed to set adapter._run_model_turn as already done; ensure the mock is
assigned on the same adapter instance created from LiteLlmAdapter so
build_chat_formatter and _run use the stubbed provider.

Comment on lines +1735 to +1778
@pytest.fixture
def adapter_sanity_check_setup(tmp_path):
"""Setup for paid adapter sanity check tests - real project/task, no adapter mocking."""
# if project at the path does not exist, create it, otherwise reuse
project_path = (
Path("/Users/leonardmarcq/Downloads/")
/ "adapter_sanity_project"
/ "project.kiln"
)
if not project_path.exists():
project_path.parent.mkdir()

project = Project(name="Adapter Sanity Project", path=str(project_path))
project.save_to_file()

task = Task(
name="Adapter Sanity Task",
instruction="You are a helpful assistant. Respond concisely.",
description="Task for adapter sanity checking",
parent=project,
)
task.save_to_file()

else:
project = Project.load_from_file(project_path)
task = next(
(
t
for t in project.tasks(readonly=True)
if t.name == "Adapter Sanity Task"
),
None,
)
if task is None:
raise ValueError("Task not found")

config = Config.shared()
original_projects = list(config.projects) if config.projects else []
config._settings["projects"] = [*original_projects, str(project.path)]

yield {"project": project, "task": task}

config._settings["projects"] = original_projects

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Search for usages of adapter_sanity_check_setup fixture
rg -n "adapter_sanity_check_setup" --type py

Repository: Kiln-AI/Kiln

Length of output: 142


🏁 Script executed:

# Also check the structure of the test file to understand fixture usage
rg -n "def test_" libs/server/kiln_server/test_run_api.py | head -20

Repository: Kiln-AI/Kiln

Length of output: 1130


🏁 Script executed:

# Look for the math tools fixture mentioned in the comment for comparison
rg -n "adapter_sanity_check_math_tools_setup" --type py

Repository: Kiln-AI/Kiln

Length of output: 458


🏁 Script executed:

# Get the math_tools fixture implementation for comparison
sed -n '1781,1840p' libs/server/kiln_server/test_run_api.py

Repository: Kiln-AI/Kiln

Length of output: 2403


🏁 Script executed:

# Get the test that uses the math_tools fixture to see the pattern
sed -n '1839,1860p' libs/server/kiln_server/test_run_api.py

Repository: Kiln-AI/Kiln

Length of output: 934


Remove unused fixture or refactor to use tmp_path instead of hardcoded path.

This fixture has three critical issues:

  1. Hardcoded user-specific path/Users/leonardmarcq/Downloads/adapter_sanity_project/project.kiln will fail on CI and other developer machines.

  2. Fixture is unused — No test in this file calls adapter_sanity_check_setup. The similar math tools fixture (adapter_sanity_check_math_tools_setup) is actively used and properly implements the pattern with tmp_path.

  3. Unused parameter — The tmp_path fixture parameter is accepted but never referenced.

Either remove this fixture or refactor it to use tmp_path like the working adapter_sanity_check_math_tools_setup fixture does.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/server/kiln_server/test_run_api.py` around lines 1735 - 1778, The
adapter_sanity_check_setup fixture contains a hardcoded user path, an unused
tmp_path parameter, and appears unused; either delete the
adapter_sanity_check_setup fixture entirely (and remove its tmp_path parameter)
or refactor it to mirror adapter_sanity_check_math_tools_setup by using tmp_path
for project_path (e.g., project_path = tmp_path / "adapter_sanity_project" /
"project.kiln"), create the directory and Project/Task there via
Project(name=..., path=str(project_path)) and Task(..., parent=project),
register the temporary project into Config.shared()._settings["projects"], yield
the {"project","task"} dict, and restore original_projects after the yield;
ensure you update references to adapter_sanity_check_setup or remove any unused
fixture imports.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-implemented feature to support multi-turn conversations. The core logic change involves passing a task_run_id to continue an existing chat session, which then mutates the TaskRun to append the new conversation turn. The changes are extensive, touching core adapters, API schemas, and tests. The implementation is robust, with good error handling for unsupported adapters and comprehensive test coverage for the new functionality. I've identified a high-severity issue in a test file related to a hardcoded path that needs to be addressed to ensure portability. Additionally, I've suggested a medium-severity improvement for type hinting to enhance code quality and maintainability. Overall, this is a solid contribution.

Comment on lines +1739 to +1743
project_path = (
Path("/Users/leonardmarcq/Downloads/")
/ "adapter_sanity_project"
/ "project.kiln"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This test fixture uses a hardcoded local path (/Users/leonardmarcq/Downloads/), which will cause the test to fail on any other developer's machine or in a CI environment. Please use the tmp_path fixture provided by pytest to create temporary directories for tests, ensuring they are portable and isolated.

    project_path = (
        tmp_path
        / "adapter_sanity_project"
        / "project.kiln"
    )

Comment on lines +96 to +99
def initial_messages(self) -> list[Any]:
"""Messages to seed the conversation. Empty for fresh runs; prior trace for continuation."""
# TODO: fix the type somehow
return []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The type hint for initial_messages can be more specific. Instead of list[Any], it can be list[ChatCompletionMessageParam], which is already imported in this file. This improves type safety and allows removing the TODO.

Suggested change
def initial_messages(self) -> list[Any]:
"""Messages to seed the conversation. Empty for fresh runs; prior trace for continuation."""
# TODO: fix the type somehow
return []
def initial_messages(self) -> list[ChatCompletionMessageParam]:
"""Messages to seed the conversation. Empty for fresh runs; prior trace for continuation."""
return []

Comment on lines +267 to +271
def initial_messages(self) -> list[Any]:
"""Messages to seed the conversation (prior trace)."""
# TODO: use the type we need, but trace is untyped, and we cannot import from litellm adapter here
# or we get circular imports
return list(self._prior_trace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The type hint for this overridden method should also be updated from list[Any] to list[ChatCompletionMessageParam] to improve type safety. The _prior_trace attribute is already typed as list[ChatCompletionMessageParam]. The comment about circular imports seems to be outdated or incorrect, as ChatCompletionMessageParam is available from kiln_ai.utils.open_ai_types.

Suggested change
def initial_messages(self) -> list[Any]:
"""Messages to seed the conversation (prior trace)."""
# TODO: use the type we need, but trace is untyped, and we cannot import from litellm adapter here
# or we get circular imports
return list(self._prior_trace)
def initial_messages(self) -> list[ChatCompletionMessageParam]:
"""Messages to seed the conversation (prior trace)."""
return list(self._prior_trace)

@leonardmq leonardmq force-pushed the leonard/kil-421-adapter-support-multiturn branch from ee13268 to 3f83187 Compare February 26, 2026 11:44
@leonardmq leonardmq marked this pull request as draft February 26, 2026 11:45
@github-actions
Copy link

github-actions bot commented Feb 26, 2026

📊 Coverage Report

Overall Coverage: 91%

Diff: origin/main...HEAD

  • libs/core/kiln_ai/adapters/chat/chat_formatter.py (92.0%): Missing lines 288,293
  • libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (100%)
  • libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py (100%)
  • libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py (87.5%): Missing lines 96

Summary

  • Total: 64 lines
  • Missing: 3 lines
  • Coverage: 95%

Line-by-line

View line-by-line diff coverage

libs/core/kiln_ai/adapters/chat/chat_formatter.py

Lines 284-297

  284             return ChatTurn(messages=[user_msg], final_call=True)
  285 
  286         if self._state == "awaiting_final":
  287             if previous_output is None:
! 288                 raise ValueError("previous_output required for final step")
  289             self._messages.append(BasicChatMessage("assistant", previous_output))
  290             self._state = "done"
  291             return None
  292 
! 293         return None
  294 
  295 
  296 def get_chat_formatter(
  297     strategy: ChatStrategy,

libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py

Lines 92-100

   92                 "Session continuation is not supported for MCP adapter. "
   93                 "MCP tools are single-turn and do not maintain conversation state."
   94             )
   95 
!  96         run_output, _ = await self.invoke_returning_run_output(
   97             input, input_source, existing_run
   98         )
   99         return run_output


Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (1)

253-261: ⚠️ Potential issue | 🟠 Major

Continuation can return a non-persisted updated run with a persisted ID.

When existing_run is used, Line 259 skips ID clearing, but the save is skipped if autosave/allow_saving conditions fail. That returns a mutated run object that looks persisted while disk state remains stale. Next continuation can reload old trace/output and lose conversational state.

💡 Suggested fix
-        if (
-            self.base_adapter_config.allow_saving
-            and Config.shared().autosave_runs
-            and self.task.path is not None
-        ):
+        if existing_run is not None and (
+            not self.base_adapter_config.allow_saving
+            or not Config.shared().autosave_runs
+            or self.task.path is None
+        ):
+            raise ValueError(
+                "Session continuation requires persistence (allow_saving, autosave_runs, and task.path)."
+            )
+
+        if (
+            self.base_adapter_config.allow_saving
+            and Config.shared().autosave_runs
+            and self.task.path is not None
+        ):
             run.save_to_file()
         elif existing_run is None:
             # Clear the ID to indicate it's not persisted
             run.id = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/model_adapters/base_adapter.py` around lines 253 -
261, The current continuation logic can return a run with a persisted-looking ID
when save was skipped; update the block around base_adapter_config.allow_saving
/ Config.shared().autosave_runs / self.task.path so that whenever
run.save_to_file() is NOT called you clear run.id (e.g., set run.id = None)
regardless of existing_run, and only preserve the ID when save_to_file()
actually succeeded; adjust the branch around run.save_to_file() and the
existing_run check to ensure run.id reflects actual persistence state.
♻️ Duplicate comments (1)
libs/server/kiln_server/test_run_api.py (1)

1736-1778: ⚠️ Potential issue | 🟠 Major

Remove the machine-specific fixture path (and unused setup if not referenced).

Line 1740 hardcodes /Users/leonardmarcq/Downloads/..., which is non-portable and will fail outside one machine. Also, tmp_path is unused in this fixture, and this setup appears unused in this file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/server/kiln_server/test_run_api.py` around lines 1736 - 1778, The
fixture adapter_sanity_check_setup hardcodes a machine-specific path in
project_path and never uses the tmp_path param; replace the hardcoded
Path("/Users/leonardmarcq/Downloads/…") with a portable path using the provided
tmp_path (e.g., tmp_path / "adapter_sanity_project" / "project.kiln") or remove
the fixture entirely if unused; update references inside
adapter_sanity_check_setup (project_path, project.save_to_file(),
Project.load_from_file(), and config._settings modification) to operate on the
tmp_path-based location, and remove the unused tmp_path parameter only if you
delete the fixture after confirming it is not referenced elsewhere.
🧹 Nitpick comments (2)
app/web_ui/src/lib/api_schema.d.ts (1)

6235-6239: Clarify run mutability in the task_run_id API description.

Line 6237 explains trace appending, but it does not explicitly state that continuing a session can mutate the referenced TaskRun’s latest output/intermediate_outputs. Please expand the backend OpenAPI field description to include this side effect, then regenerate this file.

Based on learnings: app/web_ui/src/lib/api_schema.d.ts is generated by openapi-typescript; do not propose manual edits. Schema changes should be made in the FastAPI backend (e.g., app/desktop/studio_server/data_gen_api.py or libs/server/kiln_server/*), then re-generate the TS types.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/web_ui/src/lib/api_schema.d.ts` around lines 6235 - 6239, The OpenAPI
description for the query/body field task_run_id must explicitly state that
continuing a session will append to and may mutate the referenced TaskRun’s
latest output and intermediate_outputs; update the FastAPI schema/Field
description that defines task_run_id (look for the parameter/Field named
"task_run_id" used by the endpoint in data_gen_api.py or the Pydantic model
representing TaskRun in the kiln_server modules) to include a sentence like
“Continuing a session will append to the run’s trace and may update the
TaskRun’s latest output and intermediate_outputs,” then re-generate the
TypeScript types (openapi-typescript) so app/web_ui/src/lib/api_schema.d.ts is
updated.
libs/core/kiln_ai/adapters/chat/chat_formatter.py (1)

96-99: Resolve the initial_messages TODO typing debt before merge.

Line 98 and Line 269 leave Any + TODOs in the core formatter interface. This weakens type safety across the new multiturn path and keeps pipeline warnings unresolved.

I can draft a small follow-up patch that introduces a shared concrete message-seed type alias and removes both TODOs.

Also applies to: 267-271

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/chat/chat_formatter.py` around lines 96 - 99,
Introduce a concrete seed-message type and use it instead of Any: define a
shared type alias/TypedDict (e.g., MessageSeed or ChatSeedMessage with required
fields like role: str, content: str and optional metadata: dict[str, Any]) and
replace occurrences of list[Any] in initial_messages with list[MessageSeed];
update the signature of the initial_messages method in chat_formatter.py (and
the matching spot around lines 267-271) to return list[MessageSeed], remove the
TODO comments, and update any imports/annotations to reference the new alias so
the multiturn path uses the concrete message-seed type.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@libs/core/kiln_ai/adapters/model_adapters/base_adapter.py`:
- Around line 253-261: The current continuation logic can return a run with a
persisted-looking ID when save was skipped; update the block around
base_adapter_config.allow_saving / Config.shared().autosave_runs /
self.task.path so that whenever run.save_to_file() is NOT called you clear
run.id (e.g., set run.id = None) regardless of existing_run, and only preserve
the ID when save_to_file() actually succeeded; adjust the branch around
run.save_to_file() and the existing_run check to ensure run.id reflects actual
persistence state.

---

Duplicate comments:
In `@libs/server/kiln_server/test_run_api.py`:
- Around line 1736-1778: The fixture adapter_sanity_check_setup hardcodes a
machine-specific path in project_path and never uses the tmp_path param; replace
the hardcoded Path("/Users/leonardmarcq/Downloads/…") with a portable path using
the provided tmp_path (e.g., tmp_path / "adapter_sanity_project" /
"project.kiln") or remove the fixture entirely if unused; update references
inside adapter_sanity_check_setup (project_path, project.save_to_file(),
Project.load_from_file(), and config._settings modification) to operate on the
tmp_path-based location, and remove the unused tmp_path parameter only if you
delete the fixture after confirming it is not referenced elsewhere.

---

Nitpick comments:
In `@app/web_ui/src/lib/api_schema.d.ts`:
- Around line 6235-6239: The OpenAPI description for the query/body field
task_run_id must explicitly state that continuing a session will append to and
may mutate the referenced TaskRun’s latest output and intermediate_outputs;
update the FastAPI schema/Field description that defines task_run_id (look for
the parameter/Field named "task_run_id" used by the endpoint in data_gen_api.py
or the Pydantic model representing TaskRun in the kiln_server modules) to
include a sentence like “Continuing a session will append to the run’s trace and
may update the TaskRun’s latest output and intermediate_outputs,” then
re-generate the TypeScript types (openapi-typescript) so
app/web_ui/src/lib/api_schema.d.ts is updated.

In `@libs/core/kiln_ai/adapters/chat/chat_formatter.py`:
- Around line 96-99: Introduce a concrete seed-message type and use it instead
of Any: define a shared type alias/TypedDict (e.g., MessageSeed or
ChatSeedMessage with required fields like role: str, content: str and optional
metadata: dict[str, Any]) and replace occurrences of list[Any] in
initial_messages with list[MessageSeed]; update the signature of the
initial_messages method in chat_formatter.py (and the matching spot around lines
267-271) to return list[MessageSeed], remove the TODO comments, and update any
imports/annotations to reference the new alias so the multiturn path uses the
concrete message-seed type.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee13268 and 3f83187.

📒 Files selected for processing (16)
  • app/web_ui/src/lib/api_schema.d.ts
  • libs/core/kiln_ai/adapters/chat/__init__.py
  • libs/core/kiln_ai/adapters/chat/chat_formatter.py
  • libs/core/kiln_ai/adapters/chat/test_chat_formatter.py
  • libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_mcp_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
  • libs/core/kiln_ai/adapters/model_adapters/test_structured_output.py
  • libs/core/kiln_ai/adapters/test_prompt_builders.py
  • libs/core/kiln_ai/datamodel/test_basemodel.py
  • libs/server/kiln_server/run_api.py
  • libs/server/kiln_server/test_run_api.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • libs/core/kiln_ai/adapters/model_adapters/test_structured_output.py
  • libs/core/kiln_ai/adapters/chat/init.py
  • libs/server/kiln_server/run_api.py
  • libs/core/kiln_ai/adapters/chat/test_chat_formatter.py

@leonardmq leonardmq marked this pull request as ready for review February 27, 2026 10:40
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
libs/server/kiln_server/test_run_api.py (1)

1816-1826: ⚠️ Potential issue | 🟠 Major

Remove the hardcoded local path from the sanity fixture.

Line 1820 hardcodes /Users/leonardmarcq/..., which is non-portable, and this fixture still takes tmp_path without using it. This will fail outside one machine setup and adds brittle test infra.

Also applies to: 1838-1850

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/server/kiln_server/test_run_api.py` around lines 1816 - 1826, The
fixture adapter_sanity_check_setup contains a hardcoded absolute path
(/Users/...) and ignores the tmp_path fixture; replace that hardcoded Path usage
with a test-local path derived from the tmp_path fixture (e.g., tmp_path /
"adapter_sanity_project" / "project.kiln"), ensure parent directories are
created and the project file is initialized as before, and do the same change
for the duplicate block around the adapter_sanity_check_setup sibling (the
second sanity fixture in the file); update any references that expect
project_path so they use the new tmp_path-based Path.
🧹 Nitpick comments (1)
libs/server/kiln_server/test_run_api.py (1)

1990-1993: Prefer normalized/structured assertion over exact LLM string equality.

Line 1993 asserts exact text ("[4, 12, 59]"), which is fragile for provider formatting variance (spacing/newlines). Parse or normalize before asserting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/server/kiln_server/test_run_api.py` around lines 1990 - 1993, The test
is brittle because it asserts exact LLM output string equality; change the
assertion to parse/normalize the output before comparing: take
res4["output"]["output"] (where response4/res4 and task_run_id are used) and
convert it into a structured Python list (e.g., via json.loads or
ast.literal_eval after stripping whitespace/newlines) and then assert that the
resulting list equals [4, 12, 59] and that res4["id"] == task_run_id remains.
This ensures formatting differences (spaces/newlines) won't break the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@libs/core/kiln_ai/adapters/model_adapters/base_adapter.py`:
- Around line 406-414: The current code merges previous and new intermediate
outputs (using merged_intermediate and existing_run.intermediate_outputs), which
can retain stale keys; instead, replace the merge with a direct assignment so
the latest assistant message wins: set existing_run.intermediate_outputs =
dict(run_output.intermediate_outputs or {}) (or None if you prefer explicit
clearing when there are no intermediate outputs) and remove the
merged_intermediate construction and usage so only
run_output.intermediate_outputs is stored.

In `@libs/server/kiln_server/run_api.py`:
- Around line 295-309: The continuation path reads and then mutates TaskRun (via
TaskRun.from_id_and_parent_path and later adapter.invoke using existing_run)
without synchronization, so concurrent continuations can clobber appended
trace/output; wrap the read/modify/write sequence with a guard (e.g., a per-run
async lock keyed by request.task_run_id or a DB transaction/row-level lock) to
serialize continuations for the same run before calling adapter.invoke, ensuring
only one coroutine can mutate existing_run at a time and releasing the lock
after the update completes or errors.
- Line 309: Wrap the call to adapter.invoke (the line returning await
adapter.invoke(input, existing_run=existing_run)) in a try/except that catches
NotImplementedError and re-raises a FastAPI HTTPException so clients receive a
4xx response; e.g. catch NotImplementedError as e and raise
HTTPException(status_code=400, detail=str(e)) (or use
status.HTTP_400_BAD_REQUEST) from e, and add the necessary import for
HTTPException (and status if used).

---

Duplicate comments:
In `@libs/server/kiln_server/test_run_api.py`:
- Around line 1816-1826: The fixture adapter_sanity_check_setup contains a
hardcoded absolute path (/Users/...) and ignores the tmp_path fixture; replace
that hardcoded Path usage with a test-local path derived from the tmp_path
fixture (e.g., tmp_path / "adapter_sanity_project" / "project.kiln"), ensure
parent directories are created and the project file is initialized as before,
and do the same change for the duplicate block around the
adapter_sanity_check_setup sibling (the second sanity fixture in the file);
update any references that expect project_path so they use the new
tmp_path-based Path.

---

Nitpick comments:
In `@libs/server/kiln_server/test_run_api.py`:
- Around line 1990-1993: The test is brittle because it asserts exact LLM output
string equality; change the assertion to parse/normalize the output before
comparing: take res4["output"]["output"] (where response4/res4 and task_run_id
are used) and convert it into a structured Python list (e.g., via json.loads or
ast.literal_eval after stripping whitespace/newlines) and then assert that the
resulting list equals [4, 12, 59] and that res4["id"] == task_run_id remains.
This ensures formatting differences (spaces/newlines) won't break the test.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f83187 and 04748e4.

📒 Files selected for processing (10)
  • libs/core/kiln_ai/adapters/chat/__init__.py
  • libs/core/kiln_ai/adapters/chat/chat_formatter.py
  • libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/litellm_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_mcp_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
  • libs/server/kiln_server/run_api.py
  • libs/server/kiln_server/test_run_api.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • libs/core/kiln_ai/adapters/model_adapters/test_mcp_adapter.py

Comment on lines +295 to +309
existing_run = TaskRun.from_id_and_parent_path(
request.task_run_id, task.path
)
if existing_run is None:
raise HTTPException(
status_code=404,
detail="Run not found. Cannot continue session.",
)
if not existing_run.trace or len(existing_run.trace) == 0:
raise HTTPException(
status_code=400,
detail="Run has no trace. Cannot continue session without conversation history.",
)

return await adapter.invoke(input, existing_run=existing_run)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard continuation writes with a lock to prevent lost updates.

Line 31 already documents this load/update/write pattern as non-atomic, but Lines 295-309 perform continuation mutation without synchronization. Concurrent continuation calls for the same run can overwrite each other’s appended trace/output.

🔧 Suggested fix
-        return await adapter.invoke(input, existing_run=existing_run)
+        if existing_run is not None:
+            async with update_run_lock:
+                return await adapter.invoke(input, existing_run=existing_run)
+        return await adapter.invoke(input, existing_run=None)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/server/kiln_server/run_api.py` around lines 295 - 309, The continuation
path reads and then mutates TaskRun (via TaskRun.from_id_and_parent_path and
later adapter.invoke using existing_run) without synchronization, so concurrent
continuations can clobber appended trace/output; wrap the read/modify/write
sequence with a guard (e.g., a per-run async lock keyed by request.task_run_id
or a DB transaction/row-level lock) to serialize continuations for the same run
before calling adapter.invoke, ensuring only one coroutine can mutate
existing_run at a time and releasing the lock after the update completes or
errors.

detail="Run has no trace. Cannot continue session without conversation history.",
)

return await adapter.invoke(input, existing_run=existing_run)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Map unsupported continuation to a client-facing HTTP error.

On Line 309, adapter-level NotImplementedError (for example MCP continuation) can bubble as a server error. Convert it to an explicit HTTPException so clients get a clear 4xx response.

🔧 Suggested fix
-        return await adapter.invoke(input, existing_run=existing_run)
+        try:
+            return await adapter.invoke(input, existing_run=existing_run)
+        except NotImplementedError as e:
+            raise HTTPException(status_code=400, detail=str(e)) from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return await adapter.invoke(input, existing_run=existing_run)
try:
return await adapter.invoke(input, existing_run=existing_run)
except NotImplementedError as e:
raise HTTPException(status_code=400, detail=str(e)) from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/server/kiln_server/run_api.py` at line 309, Wrap the call to
adapter.invoke (the line returning await adapter.invoke(input,
existing_run=existing_run)) in a try/except that catches NotImplementedError and
re-raises a FastAPI HTTPException so clients receive a 4xx response; e.g. catch
NotImplementedError as e and raise HTTPException(status_code=400, detail=str(e))
(or use status.HTTP_400_BAD_REQUEST) from e, and add the necessary import for
HTTPException (and status if used).

@leonardmq leonardmq requested a review from scosman February 27, 2026 10:57
@leonardmq leonardmq mentioned this pull request Mar 3, 2026
2 tasks
@scosman
Copy link
Collaborator

scosman commented Mar 5, 2026

TaskRun are no longer immutable

This makes sense. But it also breaks one of the core ideas behind the collaboration with git concept where the files are small, mostly immutable, and unlikely to be edited by others, so unlikely to hit merge conflicts.

We could add a tree/fork concept where new runs are branched off a parent. That way 2 people editing end up with 2 chains. This adds complexity when rendering the dataset list (need to filter items that have children, only show the leaf nodes), but seems manageable?

As this PR is SDK only, and persisting is off by default in SDK I say leave it as is for now, debate above later when we add to app=?


return await adapter.invoke(input)
existing_run: TaskRun | None = None
if request.task_run_id is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just remove the run_api support for now? Wait until we add it to app, and can design the API then?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most SDK users won't have runs saved, so this will just error (off by default now).

P2: this works too, but might want different design later when doing app so might just make sense to wait.

self,
input: InputType,
input_source: DataSource | None = None,
existing_run: TaskRun | None = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming maybe one of: continue_from? prior_task_run? parent_task_run?

input, input_source, parsed_output, usage, run_output.trace
)
# Create the run and output - merge if there is an existing run
if existing_run is not None:
Copy link
Collaborator

@scosman scosman Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment here: #1088 (comment)

I'm leaning more towards always saving a new task_run, and just setting a new parent_id field on the child? Keep it immutable.

Will add a bit of work in UI, but much more robust for collisions.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
libs/server/kiln_server/test_run_api.py (1)

1686-1729: ⚠️ Potential issue | 🟠 Major

Remove unused fixture with hardcoded user-specific path.

This fixture contains a hardcoded path (/Users/leonardmarcq/Downloads/...) that will fail on CI and other machines. The tmp_path parameter is accepted but never used. No test in this file references adapter_sanity_check_setup.

Either remove this fixture entirely, or refactor it to use tmp_path like adapter_sanity_check_math_tools_setup does.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/server/kiln_server/test_run_api.py` around lines 1686 - 1729, The
fixture adapter_sanity_check_setup contains a hard-coded user path and never
uses the tmp_path param (and is unused by tests); remove this fixture entirely
or refactor it to use tmp_path: replace the hardcoded Path("/Users/...") with
tmp_path / "adapter_sanity_project" / "project.kiln", ensure Project/Task are
created/loaded from that temp path, update Config.shared() modifications to use
the temp project.path, and keep the teardown that restores
config._settings["projects"]; if no tests reference adapter_sanity_check_setup
simply delete the fixture to avoid CI breakage.
🧹 Nitpick comments (2)
libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py (1)

13-17: Consider adding type hint for prior_trace parameter.

The base class defines prior_trace: list[ChatCompletionMessageParam] | None, but the mock omits the type. Adding it improves consistency.

♻️ Suggested fix
+from kiln_ai.utils.open_ai_types import ChatCompletionMessageParam
+
 class MockAdapter(BaseAdapter):
     async def _run(
         self,
         input: InputType,
-        prior_trace=None,
+        prior_trace: list[ChatCompletionMessageParam] | None = None,
     ) -> tuple[RunOutput, Usage | None]:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py`
around lines 13 - 17, The mock async method _run is missing the explicit type
for the prior_trace parameter; update its signature to match the base class by
annotating prior_trace as list[ChatCompletionMessageParam] | None (or the
appropriate alias used in the codebase) so the mock matches the base typing for
_run and tools like static checkers; ensure any required imports or forward
references for ChatCompletionMessageParam are present.
libs/server/kiln_server/test_run_api.py (1)

1676-1684: Remove unused helper functions.

_adapter_sanity_check_output_path() and _append_to_sanity_check() are defined but never called. Additionally, _adapter_sanity_check_output_path() writes to the source directory rather than a temporary location, which would pollute the source tree.

Suggested removal
-def _adapter_sanity_check_output_path() -> Path:
-    return Path(__file__).resolve().parent / "adapter_sanity_check.txt"
-
-
-def _append_to_sanity_check(content: str, output_path: Path) -> None:
-    with open(output_path, "a", encoding="utf-8") as f:
-        f.write(content)
-        f.write("\n")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/server/kiln_server/test_run_api.py` around lines 1676 - 1684, Remove the
unused helper functions _adapter_sanity_check_output_path() and
_append_to_sanity_check() from the file: these helpers are never called and the
path helper writes into the source tree; delete both definitions and update any
tests that relied on them to instead use a temporary path (e.g., pytest
tmp_path) or test-local fixtures if output needs to be captured. Ensure there
are no remaining imports or references to _adapter_sanity_check_output_path or
_append_to_sanity_check in the module after removal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py`:
- Around line 353-397: The test
test_generate_run_with_existing_run_merges_usage_and_intermediate_outputs
expects merging of intermediate_outputs but the PR changes specify that
intermediate_outputs should be taken from the latest assistant message (i.e.,
replaced, not merged); update the assertions in this test (which calls
adapter.generate_run with existing_run=initial_run and
RunOutput.intermediate_outputs on the second call) to assert that
result.intermediate_outputs equals only the latest RunOutput
intermediate_outputs ({"new_key":"new_val"}) instead of containing the old
"chain_of_thought" key, keeping other assertions (id change, input, usage
accumulation, and output) as-is.

---

Duplicate comments:
In `@libs/server/kiln_server/test_run_api.py`:
- Around line 1686-1729: The fixture adapter_sanity_check_setup contains a
hard-coded user path and never uses the tmp_path param (and is unused by tests);
remove this fixture entirely or refactor it to use tmp_path: replace the
hardcoded Path("/Users/...") with tmp_path / "adapter_sanity_project" /
"project.kiln", ensure Project/Task are created/loaded from that temp path,
update Config.shared() modifications to use the temp project.path, and keep the
teardown that restores config._settings["projects"]; if no tests reference
adapter_sanity_check_setup simply delete the fixture to avoid CI breakage.

---

Nitpick comments:
In `@libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py`:
- Around line 13-17: The mock async method _run is missing the explicit type for
the prior_trace parameter; update its signature to match the base class by
annotating prior_trace as list[ChatCompletionMessageParam] | None (or the
appropriate alias used in the codebase) so the mock matches the base typing for
_run and tools like static checkers; ensure any required imports or forward
references for ChatCompletionMessageParam are present.

In `@libs/server/kiln_server/test_run_api.py`:
- Around line 1676-1684: Remove the unused helper functions
_adapter_sanity_check_output_path() and _append_to_sanity_check() from the file:
these helpers are never called and the path helper writes into the source tree;
delete both definitions and update any tests that relied on them to instead use
a temporary path (e.g., pytest tmp_path) or test-local fixtures if output needs
to be captured. Ensure there are no remaining imports or references to
_adapter_sanity_check_output_path or _append_to_sanity_check in the module after
removal.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ece44a3f-dca5-4fae-a601-325afa5c5f54

📥 Commits

Reviewing files that changed from the base of the PR and between 04748e4 and e71efe6.

📒 Files selected for processing (3)
  • libs/core/kiln_ai/adapters/model_adapters/base_adapter.py
  • libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
  • libs/server/kiln_server/test_run_api.py

@leonardmq leonardmq requested a review from scosman March 6, 2026 10:01
@leonardmq
Copy link
Collaborator Author

Taken over by #1107

1 similar comment
@leonardmq
Copy link
Collaborator Author

Taken over by #1107

@leonardmq leonardmq closed this Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants