Conversation
Summary of ChangesHello @leonardmq, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the AI system's conversational capabilities by enabling multi-turn interactions. It refactors the core logic to allow existing Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR implements multi-turn conversation continuation by introducing a Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant BaseAdapter
participant ChatFormatter
participant LiteLLMAdapter
participant LLM
User->>BaseAdapter: invoke(input, existing_run)
Note over BaseAdapter: Detect prior_trace from<br/>existing_run.trace
BaseAdapter->>ChatFormatter: build_chat_formatter(input, prior_trace)
alt prior_trace provided
ChatFormatter-->>ChatFormatter: Create MultiturnFormatter
else no prior_trace
ChatFormatter-->>ChatFormatter: Create standard Formatter
end
ChatFormatter-->>BaseAdapter: Formatter instance
BaseAdapter->>BaseAdapter: _run_returning_run_output(input, prior_trace)
BaseAdapter->>LiteLLMAdapter: _run(input, prior_trace)
LiteLLMAdapter->>LiteLLMAdapter: messages = initial_messages()
alt prior_trace provided
LiteLLMAdapter->>LiteLLMAdapter: messages includes prior conversation
end
LiteLLMAdapter->>LiteLLMAdapter: next_turn() processes new input
LiteLLMAdapter->>LLM: Call LLM with full message history
LLM-->>LiteLLMAdapter: Response
LiteLLMAdapter-->>BaseAdapter: RunOutput with merged trace
BaseAdapter->>BaseAdapter: generate_run(merged_outputs)
BaseAdapter-->>User: TaskRun (continued session)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (1)
131-136: Type inconsistency:task_run_idparameter type.In
invoke()the parameter is typed asID_TYPE | None, but inmcp_adapter.py(lines 88, 105) it's typed asstr | None. For consistency and to leverage the type alias, consider usingID_TYPEthroughout or verifyingID_TYPEis indeedstr.#!/bin/bash # Verify ID_TYPE definition and usage consistency ast-grep --pattern 'ID_TYPE = $_' rg -n "task_run_id:" --type=py | head -20🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/core/kiln_ai/adapters/model_adapters/base_adapter.py` around lines 131 - 136, The task_run_id parameter type is inconsistent between base_adapter.invoke (uses ID_TYPE | None) and mcp_adapter (uses str | None); update the code to use the same type alias everywhere by replacing explicit str | None with ID_TYPE | None (or alternatively ensure the ID_TYPE alias is defined as str and update its definition) so invoke(), invoke_returning_run_output, and the mcp adapter methods share a consistent type for task_run_id; locate symbols invoke, invoke_returning_run_output, and the task_run_id annotations in mcp_adapter to apply the change.libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py (1)
88-88: Consider usingID_TYPEfor consistency withBaseAdapter.The
task_run_idparameter is typed asstr | Nonehere but asID_TYPE | Noneinbase_adapter.py. For type consistency across the codebase, consider aligning with the type alias.Also applies to: 105-105
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py` at line 88, Replace the explicit str | None typing for the task_run_id parameter with the shared alias ID_TYPE | None to match BaseAdapter; update each function/method in mcp_adapter.py where the parameter task_run_id appears (both occurrences flagged) so their signatures use ID_TYPE | None and adjust any related imports to import ID_TYPE from the module where it is defined.libs/server/kiln_server/test_run_api.py (1)
1725-1732: Unused helper functions — consider removing.
_adapter_sanity_check_output_path()and_append_to_sanity_check()are defined but not called anywhere in this file. If these are leftover from development, they should be removed to avoid confusion.🗑️ Remove unused code
-def _adapter_sanity_check_output_path() -> Path: - return Path(__file__).resolve().parent / "adapter_sanity_check.txt" - - -def _append_to_sanity_check(content: str, output_path: Path) -> None: - with open(output_path, "a", encoding="utf-8") as f: - f.write(content) - f.write("\n") - -🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/server/kiln_server/test_run_api.py` around lines 1725 - 1732, Remove the two unused helper functions _adapter_sanity_check_output_path() and _append_to_sanity_check() from the file: delete their definitions (the functions named _adapter_sanity_check_output_path and _append_to_sanity_check) since they are not referenced anywhere in the test_run_api.py module to avoid dead code and confusion; if they are intended for future use, alternatively add a clear TODO comment and a test that exercises them, but otherwise simply remove both function definitions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py`:
- Around line 433-451: The test test_task_run_id_task_path_none_raises fails
because model_provider() is invoked during MockAdapter initialization and raises
an API key error; update the test to mock/stub model_provider (or the OpenAI
provider init) so it doesn't perform real API key validation before exercising
the path check. Specifically, in the test_task_run_id_task_path_none_raises
where MockAdapter is constructed with KilnAgentRunConfigProperties and
adapter.invoke is awaited, patch the model_provider function (or the
class/method used by MockAdapter to obtain the provider) to return a benign/mock
provider object so the ValueError from adapter.invoke("input",
task_run_id="some-id") (about task.path) is the raised exception as expected.
In `@libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py`:
- Around line 1309-1353: The test
test_run_with_prior_trace_uses_multiturn_formatter calls adapter._run which
internally calls model_provider() before _run_model_turn, causing an API key
lookup; update the test to stub/mocking the adapter.model_provider method (or
attribute) to return a simple dummy provider object (e.g., a dict or a small
stub class) so no external key lookup happens, then proceed to set
adapter._run_model_turn as already done; ensure the mock is assigned on the same
adapter instance created from LiteLlmAdapter so build_chat_formatter and _run
use the stubbed provider.
In `@libs/server/kiln_server/test_run_api.py`:
- Around line 1735-1778: The adapter_sanity_check_setup fixture contains a
hardcoded user path, an unused tmp_path parameter, and appears unused; either
delete the adapter_sanity_check_setup fixture entirely (and remove its tmp_path
parameter) or refactor it to mirror adapter_sanity_check_math_tools_setup by
using tmp_path for project_path (e.g., project_path = tmp_path /
"adapter_sanity_project" / "project.kiln"), create the directory and
Project/Task there via Project(name=..., path=str(project_path)) and Task(...,
parent=project), register the temporary project into
Config.shared()._settings["projects"], yield the {"project","task"} dict, and
restore original_projects after the yield; ensure you update references to
adapter_sanity_check_setup or remove any unused fixture imports.
---
Nitpick comments:
In `@libs/core/kiln_ai/adapters/model_adapters/base_adapter.py`:
- Around line 131-136: The task_run_id parameter type is inconsistent between
base_adapter.invoke (uses ID_TYPE | None) and mcp_adapter (uses str | None);
update the code to use the same type alias everywhere by replacing explicit str
| None with ID_TYPE | None (or alternatively ensure the ID_TYPE alias is defined
as str and update its definition) so invoke(), invoke_returning_run_output, and
the mcp adapter methods share a consistent type for task_run_id; locate symbols
invoke, invoke_returning_run_output, and the task_run_id annotations in
mcp_adapter to apply the change.
In `@libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.py`:
- Line 88: Replace the explicit str | None typing for the task_run_id parameter
with the shared alias ID_TYPE | None to match BaseAdapter; update each
function/method in mcp_adapter.py where the parameter task_run_id appears (both
occurrences flagged) so their signatures use ID_TYPE | None and adjust any
related imports to import ID_TYPE from the module where it is defined.
In `@libs/server/kiln_server/test_run_api.py`:
- Around line 1725-1732: Remove the two unused helper functions
_adapter_sanity_check_output_path() and _append_to_sanity_check() from the file:
delete their definitions (the functions named _adapter_sanity_check_output_path
and _append_to_sanity_check) since they are not referenced anywhere in the
test_run_api.py module to avoid dead code and confusion; if they are intended
for future use, alternatively add a clear TODO comment and a test that exercises
them, but otherwise simply remove both function definitions.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
app/web_ui/src/lib/api_schema.d.tslibs/core/kiln_ai/adapters/chat/__init__.pylibs/core/kiln_ai/adapters/chat/chat_formatter.pylibs/core/kiln_ai/adapters/chat/test_chat_formatter.pylibs/core/kiln_ai/adapters/model_adapters/base_adapter.pylibs/core/kiln_ai/adapters/model_adapters/litellm_adapter.pylibs/core/kiln_ai/adapters/model_adapters/mcp_adapter.pylibs/core/kiln_ai/adapters/model_adapters/test_base_adapter.pylibs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.pylibs/core/kiln_ai/adapters/model_adapters/test_mcp_adapter.pylibs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.pylibs/core/kiln_ai/adapters/model_adapters/test_structured_output.pylibs/core/kiln_ai/adapters/test_prompt_builders.pylibs/core/kiln_ai/datamodel/test_basemodel.pylibs/server/kiln_server/run_api.pylibs/server/kiln_server/test_run_api.py
| @pytest.mark.asyncio | ||
| async def test_task_run_id_task_path_none_raises(base_project): | ||
| task = Task( | ||
| name="test_task", | ||
| instruction="test_instruction", | ||
| parent=base_project, | ||
| ) | ||
| assert task.path is None | ||
| adapter = MockAdapter( | ||
| task=task, | ||
| run_config=KilnAgentRunConfigProperties( | ||
| model_name="test_model", | ||
| model_provider_name="openai", | ||
| prompt_id="simple_prompt_builder", | ||
| structured_output_mode="json_schema", | ||
| ), | ||
| ) | ||
| with pytest.raises(ValueError, match="task has no path"): | ||
| await adapter.invoke("input", task_run_id="some-id") |
There was a problem hiding this comment.
Pipeline failure: Test raises wrong exception due to missing mocks.
The test expects a ValueError with message "task has no path", but the model_provider() call happens first and raises an API key error because OpenAI provider initialization isn't mocked.
🐛 Proposed fix: Mock model_provider to avoid API key check
`@pytest.mark.asyncio`
async def test_task_run_id_task_path_none_raises(base_project):
task = Task(
name="test_task",
instruction="test_instruction",
parent=base_project,
)
assert task.path is None
adapter = MockAdapter(
task=task,
run_config=KilnAgentRunConfigProperties(
model_name="test_model",
model_provider_name="openai",
prompt_id="simple_prompt_builder",
structured_output_mode="json_schema",
),
)
+
+ # Mock model_provider to avoid API key validation
+ provider = MagicMock()
+ provider.parser = None
+ provider.formatter = None
+ provider.reasoning_capable = False
+ adapter.model_provider = MagicMock(return_value=provider)
+
with pytest.raises(ValueError, match="task has no path"):
await adapter.invoke("input", task_run_id="some-id")🧰 Tools
🪛 GitHub Actions: Build and Test
[error] 450-451: Regex pattern 'task has no path' did not match. The test expected a ValueError with message 'task has no path' but OpenAI API key initialization raised a different error: 'Attempted to use OpenAI without an API key set. Get your API key from https://platform.openai.com/account/api-keys'.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@libs/core/kiln_ai/adapters/model_adapters/test_base_adapter.py` around lines
433 - 451, The test test_task_run_id_task_path_none_raises fails because
model_provider() is invoked during MockAdapter initialization and raises an API
key error; update the test to mock/stub model_provider (or the OpenAI provider
init) so it doesn't perform real API key validation before exercising the path
check. Specifically, in the test_task_run_id_task_path_none_raises where
MockAdapter is constructed with KilnAgentRunConfigProperties and adapter.invoke
is awaited, patch the model_provider function (or the class/method used by
MockAdapter to obtain the provider) to return a benign/mock provider object so
the ValueError from adapter.invoke("input", task_run_id="some-id") (about
task.path) is the raised exception as expected.
| @pytest.mark.asyncio | ||
| async def test_run_with_prior_trace_uses_multiturn_formatter(config, mock_task): | ||
| prior_trace = [ | ||
| {"role": "user", "content": "hi"}, | ||
| {"role": "assistant", "content": "hello"}, | ||
| ] | ||
| adapter = LiteLlmAdapter(config=config, kiln_task=mock_task) | ||
|
|
||
| build_chat_formatter_calls = [] | ||
|
|
||
| original_build = adapter.build_chat_formatter | ||
|
|
||
| def capturing_build(input, prior_trace_arg=None): | ||
| build_chat_formatter_calls.append((input, prior_trace_arg)) | ||
| return original_build(input, prior_trace_arg) | ||
|
|
||
| adapter.build_chat_formatter = capturing_build | ||
|
|
||
| async def mock_run_model_turn( | ||
| provider, prior_messages, top_logprobs, skip_response_format | ||
| ): | ||
| extended = list(prior_messages) | ||
| extended.append({"role": "assistant", "content": "How can I help?"}) | ||
| return ModelTurnResult( | ||
| assistant_message="How can I help?", | ||
| all_messages=extended, | ||
| model_response=None, | ||
| model_choice=None, | ||
| usage=Usage(), | ||
| ) | ||
|
|
||
| adapter._run_model_turn = mock_run_model_turn | ||
|
|
||
| run_output, _ = await adapter._run("follow-up", prior_trace=prior_trace) | ||
|
|
||
| assert len(build_chat_formatter_calls) == 1 | ||
| assert build_chat_formatter_calls[0][0] == "follow-up" | ||
| assert build_chat_formatter_calls[0][1] == prior_trace | ||
|
|
||
| assert run_output.trace is not None | ||
| assert len(run_output.trace) == 4 | ||
| assert run_output.trace[0]["content"] == "hi" | ||
| assert run_output.trace[1]["content"] == "hello" | ||
| assert run_output.trace[2]["content"] == "follow-up" | ||
| assert run_output.trace[3]["content"] == "How can I help?" |
There was a problem hiding this comment.
Test requires additional mocking to avoid API key lookup.
The pipeline failure indicates model_provider() is called before _run_model_turn, triggering the OpenRouter API key check. The test should mock model_provider() to bypass this.
🐛 Proposed fix to mock model_provider
`@pytest.mark.asyncio`
async def test_run_with_prior_trace_uses_multiturn_formatter(config, mock_task):
prior_trace = [
{"role": "user", "content": "hi"},
{"role": "assistant", "content": "hello"},
]
adapter = LiteLlmAdapter(config=config, kiln_task=mock_task)
build_chat_formatter_calls = []
original_build = adapter.build_chat_formatter
def capturing_build(input, prior_trace_arg=None):
build_chat_formatter_calls.append((input, prior_trace_arg))
return original_build(input, prior_trace_arg)
adapter.build_chat_formatter = capturing_build
+ # Mock model_provider to avoid API key lookup
+ mock_provider = Mock()
+ mock_provider.model_id = "test-model"
+ adapter.model_provider = Mock(return_value=mock_provider)
+
async def mock_run_model_turn(
provider, prior_messages, top_logprobs, skip_response_format
):🧰 Tools
🪛 GitHub Actions: Build and Test
[error] 1342-1342: ValueError: Attempted to use OpenRouter without an API key set. Get your API key from https://openrouter.ai/settings/keys
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@libs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.py` around
lines 1309 - 1353, The test test_run_with_prior_trace_uses_multiturn_formatter
calls adapter._run which internally calls model_provider() before
_run_model_turn, causing an API key lookup; update the test to stub/mocking the
adapter.model_provider method (or attribute) to return a simple dummy provider
object (e.g., a dict or a small stub class) so no external key lookup happens,
then proceed to set adapter._run_model_turn as already done; ensure the mock is
assigned on the same adapter instance created from LiteLlmAdapter so
build_chat_formatter and _run use the stubbed provider.
| @pytest.fixture | ||
| def adapter_sanity_check_setup(tmp_path): | ||
| """Setup for paid adapter sanity check tests - real project/task, no adapter mocking.""" | ||
| # if project at the path does not exist, create it, otherwise reuse | ||
| project_path = ( | ||
| Path("/Users/leonardmarcq/Downloads/") | ||
| / "adapter_sanity_project" | ||
| / "project.kiln" | ||
| ) | ||
| if not project_path.exists(): | ||
| project_path.parent.mkdir() | ||
|
|
||
| project = Project(name="Adapter Sanity Project", path=str(project_path)) | ||
| project.save_to_file() | ||
|
|
||
| task = Task( | ||
| name="Adapter Sanity Task", | ||
| instruction="You are a helpful assistant. Respond concisely.", | ||
| description="Task for adapter sanity checking", | ||
| parent=project, | ||
| ) | ||
| task.save_to_file() | ||
|
|
||
| else: | ||
| project = Project.load_from_file(project_path) | ||
| task = next( | ||
| ( | ||
| t | ||
| for t in project.tasks(readonly=True) | ||
| if t.name == "Adapter Sanity Task" | ||
| ), | ||
| None, | ||
| ) | ||
| if task is None: | ||
| raise ValueError("Task not found") | ||
|
|
||
| config = Config.shared() | ||
| original_projects = list(config.projects) if config.projects else [] | ||
| config._settings["projects"] = [*original_projects, str(project.path)] | ||
|
|
||
| yield {"project": project, "task": task} | ||
|
|
||
| config._settings["projects"] = original_projects | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Search for usages of adapter_sanity_check_setup fixture
rg -n "adapter_sanity_check_setup" --type pyRepository: Kiln-AI/Kiln
Length of output: 142
🏁 Script executed:
# Also check the structure of the test file to understand fixture usage
rg -n "def test_" libs/server/kiln_server/test_run_api.py | head -20Repository: Kiln-AI/Kiln
Length of output: 1130
🏁 Script executed:
# Look for the math tools fixture mentioned in the comment for comparison
rg -n "adapter_sanity_check_math_tools_setup" --type pyRepository: Kiln-AI/Kiln
Length of output: 458
🏁 Script executed:
# Get the math_tools fixture implementation for comparison
sed -n '1781,1840p' libs/server/kiln_server/test_run_api.pyRepository: Kiln-AI/Kiln
Length of output: 2403
🏁 Script executed:
# Get the test that uses the math_tools fixture to see the pattern
sed -n '1839,1860p' libs/server/kiln_server/test_run_api.pyRepository: Kiln-AI/Kiln
Length of output: 934
Remove unused fixture or refactor to use tmp_path instead of hardcoded path.
This fixture has three critical issues:
-
Hardcoded user-specific path —
/Users/leonardmarcq/Downloads/adapter_sanity_project/project.kilnwill fail on CI and other developer machines. -
Fixture is unused — No test in this file calls
adapter_sanity_check_setup. The similar math tools fixture (adapter_sanity_check_math_tools_setup) is actively used and properly implements the pattern withtmp_path. -
Unused parameter — The
tmp_pathfixture parameter is accepted but never referenced.
Either remove this fixture or refactor it to use tmp_path like the working adapter_sanity_check_math_tools_setup fixture does.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@libs/server/kiln_server/test_run_api.py` around lines 1735 - 1778, The
adapter_sanity_check_setup fixture contains a hardcoded user path, an unused
tmp_path parameter, and appears unused; either delete the
adapter_sanity_check_setup fixture entirely (and remove its tmp_path parameter)
or refactor it to mirror adapter_sanity_check_math_tools_setup by using tmp_path
for project_path (e.g., project_path = tmp_path / "adapter_sanity_project" /
"project.kiln"), create the directory and Project/Task there via
Project(name=..., path=str(project_path)) and Task(..., parent=project),
register the temporary project into Config.shared()._settings["projects"], yield
the {"project","task"} dict, and restore original_projects after the yield;
ensure you update references to adapter_sanity_check_setup or remove any unused
fixture imports.
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-implemented feature to support multi-turn conversations. The core logic change involves passing a task_run_id to continue an existing chat session, which then mutates the TaskRun to append the new conversation turn. The changes are extensive, touching core adapters, API schemas, and tests. The implementation is robust, with good error handling for unsupported adapters and comprehensive test coverage for the new functionality. I've identified a high-severity issue in a test file related to a hardcoded path that needs to be addressed to ensure portability. Additionally, I've suggested a medium-severity improvement for type hinting to enhance code quality and maintainability. Overall, this is a solid contribution.
| project_path = ( | ||
| Path("/Users/leonardmarcq/Downloads/") | ||
| / "adapter_sanity_project" | ||
| / "project.kiln" | ||
| ) |
There was a problem hiding this comment.
This test fixture uses a hardcoded local path (/Users/leonardmarcq/Downloads/), which will cause the test to fail on any other developer's machine or in a CI environment. Please use the tmp_path fixture provided by pytest to create temporary directories for tests, ensuring they are portable and isolated.
project_path = (
tmp_path
/ "adapter_sanity_project"
/ "project.kiln"
)| def initial_messages(self) -> list[Any]: | ||
| """Messages to seed the conversation. Empty for fresh runs; prior trace for continuation.""" | ||
| # TODO: fix the type somehow | ||
| return [] |
There was a problem hiding this comment.
The type hint for initial_messages can be more specific. Instead of list[Any], it can be list[ChatCompletionMessageParam], which is already imported in this file. This improves type safety and allows removing the TODO.
| def initial_messages(self) -> list[Any]: | |
| """Messages to seed the conversation. Empty for fresh runs; prior trace for continuation.""" | |
| # TODO: fix the type somehow | |
| return [] | |
| def initial_messages(self) -> list[ChatCompletionMessageParam]: | |
| """Messages to seed the conversation. Empty for fresh runs; prior trace for continuation.""" | |
| return [] | |
| def initial_messages(self) -> list[Any]: | ||
| """Messages to seed the conversation (prior trace).""" | ||
| # TODO: use the type we need, but trace is untyped, and we cannot import from litellm adapter here | ||
| # or we get circular imports | ||
| return list(self._prior_trace) |
There was a problem hiding this comment.
The type hint for this overridden method should also be updated from list[Any] to list[ChatCompletionMessageParam] to improve type safety. The _prior_trace attribute is already typed as list[ChatCompletionMessageParam]. The comment about circular imports seems to be outdated or incorrect, as ChatCompletionMessageParam is available from kiln_ai.utils.open_ai_types.
| def initial_messages(self) -> list[Any]: | |
| """Messages to seed the conversation (prior trace).""" | |
| # TODO: use the type we need, but trace is untyped, and we cannot import from litellm adapter here | |
| # or we get circular imports | |
| return list(self._prior_trace) | |
| def initial_messages(self) -> list[ChatCompletionMessageParam]: | |
| """Messages to seed the conversation (prior trace).""" | |
| return list(self._prior_trace) | |
ee13268 to
3f83187
Compare
📊 Coverage ReportOverall Coverage: 91% Diff: origin/main...HEAD
Summary
Line-by-lineView line-by-line diff coveragelibs/core/kiln_ai/adapters/chat/chat_formatter.pyLines 284-297 284 return ChatTurn(messages=[user_msg], final_call=True)
285
286 if self._state == "awaiting_final":
287 if previous_output is None:
! 288 raise ValueError("previous_output required for final step")
289 self._messages.append(BasicChatMessage("assistant", previous_output))
290 self._state = "done"
291 return None
292
! 293 return None
294
295
296 def get_chat_formatter(
297 strategy: ChatStrategy,libs/core/kiln_ai/adapters/model_adapters/mcp_adapter.pyLines 92-100 92 "Session continuation is not supported for MCP adapter. "
93 "MCP tools are single-turn and do not maintain conversation state."
94 )
95
! 96 run_output, _ = await self.invoke_returning_run_output(
97 input, input_source, existing_run
98 )
99 return run_output
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
libs/core/kiln_ai/adapters/model_adapters/base_adapter.py (1)
253-261:⚠️ Potential issue | 🟠 MajorContinuation can return a non-persisted updated run with a persisted ID.
When
existing_runis used, Line 259 skips ID clearing, but the save is skipped if autosave/allow_saving conditions fail. That returns a mutated run object that looks persisted while disk state remains stale. Next continuation can reload old trace/output and lose conversational state.💡 Suggested fix
- if ( - self.base_adapter_config.allow_saving - and Config.shared().autosave_runs - and self.task.path is not None - ): + if existing_run is not None and ( + not self.base_adapter_config.allow_saving + or not Config.shared().autosave_runs + or self.task.path is None + ): + raise ValueError( + "Session continuation requires persistence (allow_saving, autosave_runs, and task.path)." + ) + + if ( + self.base_adapter_config.allow_saving + and Config.shared().autosave_runs + and self.task.path is not None + ): run.save_to_file() elif existing_run is None: # Clear the ID to indicate it's not persisted run.id = None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/core/kiln_ai/adapters/model_adapters/base_adapter.py` around lines 253 - 261, The current continuation logic can return a run with a persisted-looking ID when save was skipped; update the block around base_adapter_config.allow_saving / Config.shared().autosave_runs / self.task.path so that whenever run.save_to_file() is NOT called you clear run.id (e.g., set run.id = None) regardless of existing_run, and only preserve the ID when save_to_file() actually succeeded; adjust the branch around run.save_to_file() and the existing_run check to ensure run.id reflects actual persistence state.
♻️ Duplicate comments (1)
libs/server/kiln_server/test_run_api.py (1)
1736-1778:⚠️ Potential issue | 🟠 MajorRemove the machine-specific fixture path (and unused setup if not referenced).
Line 1740 hardcodes
/Users/leonardmarcq/Downloads/..., which is non-portable and will fail outside one machine. Also,tmp_pathis unused in this fixture, and this setup appears unused in this file.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/server/kiln_server/test_run_api.py` around lines 1736 - 1778, The fixture adapter_sanity_check_setup hardcodes a machine-specific path in project_path and never uses the tmp_path param; replace the hardcoded Path("/Users/leonardmarcq/Downloads/…") with a portable path using the provided tmp_path (e.g., tmp_path / "adapter_sanity_project" / "project.kiln") or remove the fixture entirely if unused; update references inside adapter_sanity_check_setup (project_path, project.save_to_file(), Project.load_from_file(), and config._settings modification) to operate on the tmp_path-based location, and remove the unused tmp_path parameter only if you delete the fixture after confirming it is not referenced elsewhere.
🧹 Nitpick comments (2)
app/web_ui/src/lib/api_schema.d.ts (1)
6235-6239: Clarify run mutability in thetask_run_idAPI description.Line 6237 explains trace appending, but it does not explicitly state that continuing a session can mutate the referenced
TaskRun’s latestoutput/intermediate_outputs. Please expand the backend OpenAPI field description to include this side effect, then regenerate this file.Based on learnings:
app/web_ui/src/lib/api_schema.d.tsis generated by openapi-typescript; do not propose manual edits. Schema changes should be made in the FastAPI backend (e.g.,app/desktop/studio_server/data_gen_api.pyorlibs/server/kiln_server/*), then re-generate the TS types.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/web_ui/src/lib/api_schema.d.ts` around lines 6235 - 6239, The OpenAPI description for the query/body field task_run_id must explicitly state that continuing a session will append to and may mutate the referenced TaskRun’s latest output and intermediate_outputs; update the FastAPI schema/Field description that defines task_run_id (look for the parameter/Field named "task_run_id" used by the endpoint in data_gen_api.py or the Pydantic model representing TaskRun in the kiln_server modules) to include a sentence like “Continuing a session will append to the run’s trace and may update the TaskRun’s latest output and intermediate_outputs,” then re-generate the TypeScript types (openapi-typescript) so app/web_ui/src/lib/api_schema.d.ts is updated.libs/core/kiln_ai/adapters/chat/chat_formatter.py (1)
96-99: Resolve theinitial_messagesTODO typing debt before merge.Line 98 and Line 269 leave
Any+ TODOs in the core formatter interface. This weakens type safety across the new multiturn path and keeps pipeline warnings unresolved.I can draft a small follow-up patch that introduces a shared concrete message-seed type alias and removes both TODOs.
Also applies to: 267-271
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/core/kiln_ai/adapters/chat/chat_formatter.py` around lines 96 - 99, Introduce a concrete seed-message type and use it instead of Any: define a shared type alias/TypedDict (e.g., MessageSeed or ChatSeedMessage with required fields like role: str, content: str and optional metadata: dict[str, Any]) and replace occurrences of list[Any] in initial_messages with list[MessageSeed]; update the signature of the initial_messages method in chat_formatter.py (and the matching spot around lines 267-271) to return list[MessageSeed], remove the TODO comments, and update any imports/annotations to reference the new alias so the multiturn path uses the concrete message-seed type.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@libs/core/kiln_ai/adapters/model_adapters/base_adapter.py`:
- Around line 253-261: The current continuation logic can return a run with a
persisted-looking ID when save was skipped; update the block around
base_adapter_config.allow_saving / Config.shared().autosave_runs /
self.task.path so that whenever run.save_to_file() is NOT called you clear
run.id (e.g., set run.id = None) regardless of existing_run, and only preserve
the ID when save_to_file() actually succeeded; adjust the branch around
run.save_to_file() and the existing_run check to ensure run.id reflects actual
persistence state.
---
Duplicate comments:
In `@libs/server/kiln_server/test_run_api.py`:
- Around line 1736-1778: The fixture adapter_sanity_check_setup hardcodes a
machine-specific path in project_path and never uses the tmp_path param; replace
the hardcoded Path("/Users/leonardmarcq/Downloads/…") with a portable path using
the provided tmp_path (e.g., tmp_path / "adapter_sanity_project" /
"project.kiln") or remove the fixture entirely if unused; update references
inside adapter_sanity_check_setup (project_path, project.save_to_file(),
Project.load_from_file(), and config._settings modification) to operate on the
tmp_path-based location, and remove the unused tmp_path parameter only if you
delete the fixture after confirming it is not referenced elsewhere.
---
Nitpick comments:
In `@app/web_ui/src/lib/api_schema.d.ts`:
- Around line 6235-6239: The OpenAPI description for the query/body field
task_run_id must explicitly state that continuing a session will append to and
may mutate the referenced TaskRun’s latest output and intermediate_outputs;
update the FastAPI schema/Field description that defines task_run_id (look for
the parameter/Field named "task_run_id" used by the endpoint in data_gen_api.py
or the Pydantic model representing TaskRun in the kiln_server modules) to
include a sentence like “Continuing a session will append to the run’s trace and
may update the TaskRun’s latest output and intermediate_outputs,” then
re-generate the TypeScript types (openapi-typescript) so
app/web_ui/src/lib/api_schema.d.ts is updated.
In `@libs/core/kiln_ai/adapters/chat/chat_formatter.py`:
- Around line 96-99: Introduce a concrete seed-message type and use it instead
of Any: define a shared type alias/TypedDict (e.g., MessageSeed or
ChatSeedMessage with required fields like role: str, content: str and optional
metadata: dict[str, Any]) and replace occurrences of list[Any] in
initial_messages with list[MessageSeed]; update the signature of the
initial_messages method in chat_formatter.py (and the matching spot around lines
267-271) to return list[MessageSeed], remove the TODO comments, and update any
imports/annotations to reference the new alias so the multiturn path uses the
concrete message-seed type.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
app/web_ui/src/lib/api_schema.d.tslibs/core/kiln_ai/adapters/chat/__init__.pylibs/core/kiln_ai/adapters/chat/chat_formatter.pylibs/core/kiln_ai/adapters/chat/test_chat_formatter.pylibs/core/kiln_ai/adapters/model_adapters/base_adapter.pylibs/core/kiln_ai/adapters/model_adapters/litellm_adapter.pylibs/core/kiln_ai/adapters/model_adapters/mcp_adapter.pylibs/core/kiln_ai/adapters/model_adapters/test_base_adapter.pylibs/core/kiln_ai/adapters/model_adapters/test_litellm_adapter.pylibs/core/kiln_ai/adapters/model_adapters/test_mcp_adapter.pylibs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.pylibs/core/kiln_ai/adapters/model_adapters/test_structured_output.pylibs/core/kiln_ai/adapters/test_prompt_builders.pylibs/core/kiln_ai/datamodel/test_basemodel.pylibs/server/kiln_server/run_api.pylibs/server/kiln_server/test_run_api.py
🚧 Files skipped from review as they are similar to previous changes (4)
- libs/core/kiln_ai/adapters/model_adapters/test_structured_output.py
- libs/core/kiln_ai/adapters/chat/init.py
- libs/server/kiln_server/run_api.py
- libs/core/kiln_ai/adapters/chat/test_chat_formatter.py
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
libs/server/kiln_server/test_run_api.py (1)
1816-1826:⚠️ Potential issue | 🟠 MajorRemove the hardcoded local path from the sanity fixture.
Line 1820 hardcodes
/Users/leonardmarcq/..., which is non-portable, and this fixture still takestmp_pathwithout using it. This will fail outside one machine setup and adds brittle test infra.Also applies to: 1838-1850
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/server/kiln_server/test_run_api.py` around lines 1816 - 1826, The fixture adapter_sanity_check_setup contains a hardcoded absolute path (/Users/...) and ignores the tmp_path fixture; replace that hardcoded Path usage with a test-local path derived from the tmp_path fixture (e.g., tmp_path / "adapter_sanity_project" / "project.kiln"), ensure parent directories are created and the project file is initialized as before, and do the same change for the duplicate block around the adapter_sanity_check_setup sibling (the second sanity fixture in the file); update any references that expect project_path so they use the new tmp_path-based Path.
🧹 Nitpick comments (1)
libs/server/kiln_server/test_run_api.py (1)
1990-1993: Prefer normalized/structured assertion over exact LLM string equality.Line 1993 asserts exact text (
"[4, 12, 59]"), which is fragile for provider formatting variance (spacing/newlines). Parse or normalize before asserting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/server/kiln_server/test_run_api.py` around lines 1990 - 1993, The test is brittle because it asserts exact LLM output string equality; change the assertion to parse/normalize the output before comparing: take res4["output"]["output"] (where response4/res4 and task_run_id are used) and convert it into a structured Python list (e.g., via json.loads or ast.literal_eval after stripping whitespace/newlines) and then assert that the resulting list equals [4, 12, 59] and that res4["id"] == task_run_id remains. This ensures formatting differences (spaces/newlines) won't break the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@libs/core/kiln_ai/adapters/model_adapters/base_adapter.py`:
- Around line 406-414: The current code merges previous and new intermediate
outputs (using merged_intermediate and existing_run.intermediate_outputs), which
can retain stale keys; instead, replace the merge with a direct assignment so
the latest assistant message wins: set existing_run.intermediate_outputs =
dict(run_output.intermediate_outputs or {}) (or None if you prefer explicit
clearing when there are no intermediate outputs) and remove the
merged_intermediate construction and usage so only
run_output.intermediate_outputs is stored.
In `@libs/server/kiln_server/run_api.py`:
- Around line 295-309: The continuation path reads and then mutates TaskRun (via
TaskRun.from_id_and_parent_path and later adapter.invoke using existing_run)
without synchronization, so concurrent continuations can clobber appended
trace/output; wrap the read/modify/write sequence with a guard (e.g., a per-run
async lock keyed by request.task_run_id or a DB transaction/row-level lock) to
serialize continuations for the same run before calling adapter.invoke, ensuring
only one coroutine can mutate existing_run at a time and releasing the lock
after the update completes or errors.
- Line 309: Wrap the call to adapter.invoke (the line returning await
adapter.invoke(input, existing_run=existing_run)) in a try/except that catches
NotImplementedError and re-raises a FastAPI HTTPException so clients receive a
4xx response; e.g. catch NotImplementedError as e and raise
HTTPException(status_code=400, detail=str(e)) (or use
status.HTTP_400_BAD_REQUEST) from e, and add the necessary import for
HTTPException (and status if used).
---
Duplicate comments:
In `@libs/server/kiln_server/test_run_api.py`:
- Around line 1816-1826: The fixture adapter_sanity_check_setup contains a
hardcoded absolute path (/Users/...) and ignores the tmp_path fixture; replace
that hardcoded Path usage with a test-local path derived from the tmp_path
fixture (e.g., tmp_path / "adapter_sanity_project" / "project.kiln"), ensure
parent directories are created and the project file is initialized as before,
and do the same change for the duplicate block around the
adapter_sanity_check_setup sibling (the second sanity fixture in the file);
update any references that expect project_path so they use the new
tmp_path-based Path.
---
Nitpick comments:
In `@libs/server/kiln_server/test_run_api.py`:
- Around line 1990-1993: The test is brittle because it asserts exact LLM output
string equality; change the assertion to parse/normalize the output before
comparing: take res4["output"]["output"] (where response4/res4 and task_run_id
are used) and convert it into a structured Python list (e.g., via json.loads or
ast.literal_eval after stripping whitespace/newlines) and then assert that the
resulting list equals [4, 12, 59] and that res4["id"] == task_run_id remains.
This ensures formatting differences (spaces/newlines) won't break the test.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
libs/core/kiln_ai/adapters/chat/__init__.pylibs/core/kiln_ai/adapters/chat/chat_formatter.pylibs/core/kiln_ai/adapters/model_adapters/base_adapter.pylibs/core/kiln_ai/adapters/model_adapters/litellm_adapter.pylibs/core/kiln_ai/adapters/model_adapters/mcp_adapter.pylibs/core/kiln_ai/adapters/model_adapters/test_base_adapter.pylibs/core/kiln_ai/adapters/model_adapters/test_mcp_adapter.pylibs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.pylibs/server/kiln_server/run_api.pylibs/server/kiln_server/test_run_api.py
🚧 Files skipped from review as they are similar to previous changes (1)
- libs/core/kiln_ai/adapters/model_adapters/test_mcp_adapter.py
libs/server/kiln_server/run_api.py
Outdated
| existing_run = TaskRun.from_id_and_parent_path( | ||
| request.task_run_id, task.path | ||
| ) | ||
| if existing_run is None: | ||
| raise HTTPException( | ||
| status_code=404, | ||
| detail="Run not found. Cannot continue session.", | ||
| ) | ||
| if not existing_run.trace or len(existing_run.trace) == 0: | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail="Run has no trace. Cannot continue session without conversation history.", | ||
| ) | ||
|
|
||
| return await adapter.invoke(input, existing_run=existing_run) |
There was a problem hiding this comment.
Guard continuation writes with a lock to prevent lost updates.
Line 31 already documents this load/update/write pattern as non-atomic, but Lines 295-309 perform continuation mutation without synchronization. Concurrent continuation calls for the same run can overwrite each other’s appended trace/output.
🔧 Suggested fix
- return await adapter.invoke(input, existing_run=existing_run)
+ if existing_run is not None:
+ async with update_run_lock:
+ return await adapter.invoke(input, existing_run=existing_run)
+ return await adapter.invoke(input, existing_run=None)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@libs/server/kiln_server/run_api.py` around lines 295 - 309, The continuation
path reads and then mutates TaskRun (via TaskRun.from_id_and_parent_path and
later adapter.invoke using existing_run) without synchronization, so concurrent
continuations can clobber appended trace/output; wrap the read/modify/write
sequence with a guard (e.g., a per-run async lock keyed by request.task_run_id
or a DB transaction/row-level lock) to serialize continuations for the same run
before calling adapter.invoke, ensuring only one coroutine can mutate
existing_run at a time and releasing the lock after the update completes or
errors.
libs/server/kiln_server/run_api.py
Outdated
| detail="Run has no trace. Cannot continue session without conversation history.", | ||
| ) | ||
|
|
||
| return await adapter.invoke(input, existing_run=existing_run) |
There was a problem hiding this comment.
Map unsupported continuation to a client-facing HTTP error.
On Line 309, adapter-level NotImplementedError (for example MCP continuation) can bubble as a server error. Convert it to an explicit HTTPException so clients get a clear 4xx response.
🔧 Suggested fix
- return await adapter.invoke(input, existing_run=existing_run)
+ try:
+ return await adapter.invoke(input, existing_run=existing_run)
+ except NotImplementedError as e:
+ raise HTTPException(status_code=400, detail=str(e)) from e📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return await adapter.invoke(input, existing_run=existing_run) | |
| try: | |
| return await adapter.invoke(input, existing_run=existing_run) | |
| except NotImplementedError as e: | |
| raise HTTPException(status_code=400, detail=str(e)) from e |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@libs/server/kiln_server/run_api.py` at line 309, Wrap the call to
adapter.invoke (the line returning await adapter.invoke(input,
existing_run=existing_run)) in a try/except that catches NotImplementedError and
re-raises a FastAPI HTTPException so clients receive a 4xx response; e.g. catch
NotImplementedError as e and raise HTTPException(status_code=400, detail=str(e))
(or use status.HTTP_400_BAD_REQUEST) from e, and add the necessary import for
HTTPException (and status if used).
|
This makes sense. But it also breaks one of the core ideas behind the collaboration with git concept where the files are small, mostly immutable, and unlikely to be edited by others, so unlikely to hit merge conflicts. We could add a tree/fork concept where new runs are branched off a parent. That way 2 people editing end up with 2 chains. This adds complexity when rendering the dataset list (need to filter items that have children, only show the leaf nodes), but seems manageable? As this PR is SDK only, and persisting is off by default in SDK I say leave it as is for now, debate above later when we add to app=? |
libs/server/kiln_server/run_api.py
Outdated
|
|
||
| return await adapter.invoke(input) | ||
| existing_run: TaskRun | None = None | ||
| if request.task_run_id is not None: |
There was a problem hiding this comment.
maybe just remove the run_api support for now? Wait until we add it to app, and can design the API then?
There was a problem hiding this comment.
Most SDK users won't have runs saved, so this will just error (off by default now).
P2: this works too, but might want different design later when doing app so might just make sense to wait.
| self, | ||
| input: InputType, | ||
| input_source: DataSource | None = None, | ||
| existing_run: TaskRun | None = None, |
There was a problem hiding this comment.
naming maybe one of: continue_from? prior_task_run? parent_task_run?
| input, input_source, parsed_output, usage, run_output.trace | ||
| ) | ||
| # Create the run and output - merge if there is an existing run | ||
| if existing_run is not None: |
There was a problem hiding this comment.
see comment here: #1088 (comment)
I'm leaning more towards always saving a new task_run, and just setting a new parent_id field on the child? Keep it immutable.
Will add a bit of work in UI, but much more robust for collisions.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
libs/server/kiln_server/test_run_api.py (1)
1686-1729:⚠️ Potential issue | 🟠 MajorRemove unused fixture with hardcoded user-specific path.
This fixture contains a hardcoded path (
/Users/leonardmarcq/Downloads/...) that will fail on CI and other machines. Thetmp_pathparameter is accepted but never used. No test in this file referencesadapter_sanity_check_setup.Either remove this fixture entirely, or refactor it to use
tmp_pathlikeadapter_sanity_check_math_tools_setupdoes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/server/kiln_server/test_run_api.py` around lines 1686 - 1729, The fixture adapter_sanity_check_setup contains a hard-coded user path and never uses the tmp_path param (and is unused by tests); remove this fixture entirely or refactor it to use tmp_path: replace the hardcoded Path("/Users/...") with tmp_path / "adapter_sanity_project" / "project.kiln", ensure Project/Task are created/loaded from that temp path, update Config.shared() modifications to use the temp project.path, and keep the teardown that restores config._settings["projects"]; if no tests reference adapter_sanity_check_setup simply delete the fixture to avoid CI breakage.
🧹 Nitpick comments (2)
libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py (1)
13-17: Consider adding type hint forprior_traceparameter.The base class defines
prior_trace: list[ChatCompletionMessageParam] | None, but the mock omits the type. Adding it improves consistency.♻️ Suggested fix
+from kiln_ai.utils.open_ai_types import ChatCompletionMessageParam + class MockAdapter(BaseAdapter): async def _run( self, input: InputType, - prior_trace=None, + prior_trace: list[ChatCompletionMessageParam] | None = None, ) -> tuple[RunOutput, Usage | None]:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py` around lines 13 - 17, The mock async method _run is missing the explicit type for the prior_trace parameter; update its signature to match the base class by annotating prior_trace as list[ChatCompletionMessageParam] | None (or the appropriate alias used in the codebase) so the mock matches the base typing for _run and tools like static checkers; ensure any required imports or forward references for ChatCompletionMessageParam are present.libs/server/kiln_server/test_run_api.py (1)
1676-1684: Remove unused helper functions.
_adapter_sanity_check_output_path()and_append_to_sanity_check()are defined but never called. Additionally,_adapter_sanity_check_output_path()writes to the source directory rather than a temporary location, which would pollute the source tree.Suggested removal
-def _adapter_sanity_check_output_path() -> Path: - return Path(__file__).resolve().parent / "adapter_sanity_check.txt" - - -def _append_to_sanity_check(content: str, output_path: Path) -> None: - with open(output_path, "a", encoding="utf-8") as f: - f.write(content) - f.write("\n")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@libs/server/kiln_server/test_run_api.py` around lines 1676 - 1684, Remove the unused helper functions _adapter_sanity_check_output_path() and _append_to_sanity_check() from the file: these helpers are never called and the path helper writes into the source tree; delete both definitions and update any tests that relied on them to instead use a temporary path (e.g., pytest tmp_path) or test-local fixtures if output needs to be captured. Ensure there are no remaining imports or references to _adapter_sanity_check_output_path or _append_to_sanity_check in the module after removal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py`:
- Around line 353-397: The test
test_generate_run_with_existing_run_merges_usage_and_intermediate_outputs
expects merging of intermediate_outputs but the PR changes specify that
intermediate_outputs should be taken from the latest assistant message (i.e.,
replaced, not merged); update the assertions in this test (which calls
adapter.generate_run with existing_run=initial_run and
RunOutput.intermediate_outputs on the second call) to assert that
result.intermediate_outputs equals only the latest RunOutput
intermediate_outputs ({"new_key":"new_val"}) instead of containing the old
"chain_of_thought" key, keeping other assertions (id change, input, usage
accumulation, and output) as-is.
---
Duplicate comments:
In `@libs/server/kiln_server/test_run_api.py`:
- Around line 1686-1729: The fixture adapter_sanity_check_setup contains a
hard-coded user path and never uses the tmp_path param (and is unused by tests);
remove this fixture entirely or refactor it to use tmp_path: replace the
hardcoded Path("/Users/...") with tmp_path / "adapter_sanity_project" /
"project.kiln", ensure Project/Task are created/loaded from that temp path,
update Config.shared() modifications to use the temp project.path, and keep the
teardown that restores config._settings["projects"]; if no tests reference
adapter_sanity_check_setup simply delete the fixture to avoid CI breakage.
---
Nitpick comments:
In `@libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py`:
- Around line 13-17: The mock async method _run is missing the explicit type for
the prior_trace parameter; update its signature to match the base class by
annotating prior_trace as list[ChatCompletionMessageParam] | None (or the
appropriate alias used in the codebase) so the mock matches the base typing for
_run and tools like static checkers; ensure any required imports or forward
references for ChatCompletionMessageParam are present.
In `@libs/server/kiln_server/test_run_api.py`:
- Around line 1676-1684: Remove the unused helper functions
_adapter_sanity_check_output_path() and _append_to_sanity_check() from the file:
these helpers are never called and the path helper writes into the source tree;
delete both definitions and update any tests that relied on them to instead use
a temporary path (e.g., pytest tmp_path) or test-local fixtures if output needs
to be captured. Ensure there are no remaining imports or references to
_adapter_sanity_check_output_path or _append_to_sanity_check in the module after
removal.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: ece44a3f-dca5-4fae-a601-325afa5c5f54
📒 Files selected for processing (3)
libs/core/kiln_ai/adapters/model_adapters/base_adapter.pylibs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.pylibs/server/kiln_server/test_run_api.py
libs/core/kiln_ai/adapters/model_adapters/test_saving_adapter_results.py
Show resolved
Hide resolved
|
Taken over by #1107 |
1 similar comment
|
Taken over by #1107 |
What does this PR do?
Issue: we currently only support single turn chars (
system->user->assistant- and a couple of slight variants we internally implement like two message CoT).This PR adds support for multiturn conversation, letting the history of conversation accumulate. It currently works as follows:
TaskRun, that contains the trace, and theoutputandintermediate_outputsare extracted from the latest message (coming from the assistant); no change thereTaskRun->id; we internally load the existingTaskRun->trace, create a new run (appending the new user message to the existing trace), and replace theoutputandintermediate_outputswith the new latest messageThis is fine for SDK usage to just do inference; but this opens up downstream side-effects on our app:
TaskRunare no longer immutable; a single turn run may turn into a multiturn run as a result of the user adding new messages into the run; an existing multiturn run may get new messages appended to it; thereby changing the latest messageTaskRun, but if the task runs change, evals cannot be compared to previous runs; or at least we cannot guarantee that through the modelingWe will need to implement restrictions on the UI / user side to prevent users from causing downstream side effects we do not want.
Checklists
Summary by CodeRabbit
Release Notes
New Features
Tests
Limitations