feat(ai): defer marimo-pair references and improve code mode prompts#9985
Conversation
Load skill reference docs as on-demand pydantic-ai capabilities to keep code mode context lean, streamline the system prompt, and expose code mode in the chat UI. Co-authored-by: Cursor <cursoragent@cursor.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
No issues found across 13 files
Architecture diagram
sequenceDiagram
participant Client as Chat Client (Frontend)
participant API as API Route (stream_completion)
participant Provider as AI Provider (stream_completion_harness)
participant Agent as pydantic-ai Agent
participant Capability as Deferred Capability
participant Tool as execute_code Tool
participant Session as Kernel Session
participant SkillLoader as Skill/Reference Loader
Note over Client,SkillLoader: Code Mode Chat Flow - NEW: Chat Mode Picker Unrestricted
Client->>API: POST /api/ai/completion (mode=code_mode)
Note over API,Session: NEW: Code mode no longer receives cell code injection
Provider->>Provider: Build system prompt (no language rules, no other cell code)
Provider->>SkillLoader: Load SKILL.md (marimo-pair)
SkillLoader-->>Provider: SKILL.md content (with load_capability instructions)
Provider->>Provider: Create pydantic-ai model
Provider->>Capability: NEW: references_capability()
Capability->>SkillLoader: Load gotchas.md (deferred)
Capability->>SkillLoader: Load notebook-improvements.md (deferred)
Capability->>SkillLoader: Load rich-representations.md (deferred)
SkillLoader-->>Capability: Reference content (cached, not inlined)
Provider->>Agent: Create Agent(toolsets=[build_execute_code_toolset], capabilities=3 deferred refs, instructions=system_prompt)
Provider->>Agent: Submit message
alt Model needs reference material
Agent->>Capability: load_capability("gotchas")
Capability-->>Agent: Gotchas content (instructions)
Agent->>Capability: load_capability("rich-representations")
Capability-->>Agent: Rich representations content
end
alt Model needs to inspect/run code
Agent->>Tool: execute_code(code)
Tool->>Session: Run scratchpad code with credentials
Session-->>Tool: Execution result
Tool-->>Agent: CodeExecutionResult
end
Agent-->>Provider: Stream response tokens
Provider->>API: StreamingResponse
API-->>Client: Stream chat response
Note over Client,Agent: NEW: Reference docs loaded on-demand, not inline in prompt
## 📝 Summary <!-- If this PR closes any issues, list them here by number (e.g., Closes #123). Detail the specific changes made in this pull request. Explain the problem addressed and how it was resolved. If applicable, provide before and after comparisons, screenshots, or any relevant details to help reviewers understand the changes easily. --> Previously, every /chat turn rebuilt AiCompletionContext from current notebook state and appended it to the system prompt. Two problems: - Prompt caching: dynamic context in the (otherwise static) system prefix changed the cacheable prefix every turn, defeating provider prompt caching. - Correctness/cost: variables mentioned in turn 1 was re-resolved against turn-N's live values and resent in full each request. It's also possible to do this for /completion endpoints but because those endpoints are not multi-turn convos today, the benefits are not material. ### Approach Adopt the canonical AI SDK pattern for custom structured context: - Frontend resolves @-mentions once at send time (resolveChatContext) and embeds the rendered string in the user message as a data-marimo-context part (typed via the SDK's DataUIPart). Because it lives in chat history, it's replayed verbatim on later turns — a point-in-time snapshot of the rendered context, not a live re-resolution. - Server lowers that data part into a <context>…</context> text part placed below the user's message before validation (_expand_marimo_context_part), mirroring the SDK's convertDataPart. This is required because pydantic-ai's VercelAIAdapter drops DataUIParts entirely. ## 📋 Pre-Review Checklist <!-- These checks need to be completed before a PR is reviewed --> - [ ] For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on [Discord](https://marimo.io/discord?ref=pr), or the community [discussions](https://github.com/marimo-team/marimo/discussions) (Please provide a link if applicable). - [x] Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it. - [ ] Video or media evidence is provided for any visual changes (optional). <!-- PR is more likely to be merged if evidence is provided for changes made --> ## ✅ Merge Checklist - [x] I have read the [contributor guidelines](https://github.com/marimo-team/marimo/blob/main/CONTRIBUTING.md). - [ ] Documentation has been updated where applicable, including docstrings for API changes. - [x] Tests have been added for the changes made. --------- Co-authored-by: Cursor <cursoragent@cursor.com>
| def references_capability() -> list[Capability]: | ||
| from pydantic_ai.capabilities import Capability | ||
|
|
||
| gotchas_capability: Capability = Capability( |
There was a problem hiding this comment.
i wonder if you can make this dynamic from reading the references file. not necessary now, but maybe in future if we vendor this from marimo-pair
There was a problem hiding this comment.
2 issues found across 12 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="frontend/src/components/chat/chat-panel.tsx">
<violation number="1" location="frontend/src/components/chat/chat-panel.tsx:537">
P2: Potential loss of legacy context resolution for old persisted chats after removing `buildCompletionRequestBody`. The old helper re-scanned all historical message text for `@` references on every completion request (regenerate/retry). The new code only preserves context for messages that already contain the new `data-marimo-context` part. Old persisted chats without that part will silently lose `@` context on regeneration.</violation>
</file>
<file name="frontend/src/components/editor/ai/completion-utils.ts">
<violation number="1" location="frontend/src/components/editor/ai/completion-utils.ts:95">
P2: New `resolveChatContext` duplicates core parsing/attachment logic from `getAICompletionBodyWithAttachments`, creating drift risk. Only the new path stamps `providerMetadata.marimo.source = "context"`, so `isContextAttachment`-based filtering behaves inconsistently across chat and completion attachments. Extract a shared helper for the common registry parsing/attachment fetching logic.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| const completionBody = await buildCompletionRequestBody( | ||
| options.messages, | ||
| ); | ||
| const completionBody = { |
There was a problem hiding this comment.
P2: Potential loss of legacy context resolution for old persisted chats after removing buildCompletionRequestBody. The old helper re-scanned all historical message text for @ references on every completion request (regenerate/retry). The new code only preserves context for messages that already contain the new data-marimo-context part. Old persisted chats without that part will silently lose @ context on regeneration.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At frontend/src/components/chat/chat-panel.tsx, line 537:
<comment>Potential loss of legacy context resolution for old persisted chats after removing `buildCompletionRequestBody`. The old helper re-scanned all historical message text for `@` references on every completion request (regenerate/retry). The new code only preserves context for messages that already contain the new `data-marimo-context` part. Old persisted chats without that part will silently lose `@` context on regeneration.</comment>
<file context>
@@ -532,9 +534,10 @@ const ChatPanelBody = () => {
- const completionBody = await buildCompletionRequestBody(
- options.messages,
- );
+ const completionBody = {
+ uiMessages: options.messages,
+ includeOtherCode: getCodes(""),
</file context>
| * Resolve @-context for messages. They represent referenced | ||
| * datasets, variables, or other context from the user's prompt. | ||
| */ | ||
| export async function resolveChatContext( |
There was a problem hiding this comment.
P2: New resolveChatContext duplicates core parsing/attachment logic from getAICompletionBodyWithAttachments, creating drift risk. Only the new path stamps providerMetadata.marimo.source = "context", so isContextAttachment-based filtering behaves inconsistently across chat and completion attachments. Extract a shared helper for the common registry parsing/attachment fetching logic.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At frontend/src/components/editor/ai/completion-utils.ts, line 95:
<comment>New `resolveChatContext` duplicates core parsing/attachment logic from `getAICompletionBodyWithAttachments`, creating drift risk. Only the new path stamps `providerMetadata.marimo.source = "context"`, so `isContextAttachment`-based filtering behaves inconsistently across chat and completion attachments. Extract a shared helper for the common registry parsing/attachment fetching logic.</comment>
<file context>
@@ -50,6 +50,91 @@ export function getAICompletionBody({
+ * Resolve @-context for messages. They represent referenced
+ * datasets, variables, or other context from the user's prompt.
+ */
+export async function resolveChatContext(
+ input: string,
+): Promise<ResolvedChatContext> {
</file context>
|
🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.23.12-dev3 |
📝 Summary
Code mode had no way to load deeper reference docs without bloating the system prompt. I split the reference material into deferred pydantic-ai capabilities so the model pulls in gotchas, notebook-improvements, and rich-representations only when needed.
Capabilityobjects wired into the code-mode agent harnessSKILL.mdand reference docs to tell the model to useload_capabilityinstead of reading files from diskexecute_code), and add a nudge to inspect notebook state first📋 Pre-Review Checklist
✅ Merge Checklist
Made with Cursor