Skip to content

🧑‍💻 refactor: Clarify Code Artifact Guidance#187

Merged
danny-avila merged 1 commit into
mainfrom
danny-avila/code-tool-output-guidance
May 21, 2026
Merged

🧑‍💻 refactor: Clarify Code Artifact Guidance#187
danny-avila merged 1 commit into
mainfrom
danny-avila/code-tool-output-guidance

Conversation

@danny-avila
Copy link
Copy Markdown
Owner

Summary

I clarified code execution artifact guidance and restored a compact generated-file summary without listing every file.

  • Added shared guidance for persistent /mnt/data artifacts, same-call /tmp scratch files, and conventional supported extensions.
  • Appended compact session-file counts to code, bash, and programmatic tool outputs while keeping exact file refs in artifacts.
  • Updated tests and stale session-file comments so the executor behavior is clear.

Change Type

  • Bug fix (non-breaking change which fixes an issue)
  • Documentation update

Testing

  • npx jest src/tools/__tests__/ProgrammaticToolCalling.test.ts src/tools/__tests__/BashExecutor.test.ts --runInBand
  • git diff --check

Test Configuration:

  • Node dependencies borrowed from the main worktree via ignored node_modules symlink for the dedicated worktree test run.

Checklist

  • My code adheres to this project's style guidelines
  • I have performed a self-review of my own code
  • I have commented in any complex areas of my code
  • I have made pertinent documentation changes
  • My changes do not introduce new warnings
  • I have written tests demonstrating that my changes are effective or that my feature works
  • Local unit tests pass with my changes

Copy link
Copy Markdown
Owner Author

Post-open live testing against http://localhost:3112/v1:

  • Verified raw /exec bash artifact behavior: supported /mnt/data/*.txt persisted, /tmp scratch did not persist across calls, and reinjection worked when response file refs were converted to CodeEnvFile shape.
  • Verified .raw is not persisted while .json is persisted from the same /mnt/data run, matching CodeAPI's supported-extension allowlist.
  • Verified /exec/programmatic Python and bash artifact behavior, including files generated after a tool-call continuation.
  • Verified bash PTC continuation directly with lang: "bash" and a get_weather tool.
  • Ran live @librechat/agents PTC integration against localhost. The full run passed the first 4 cases, then hit local CodeAPI 429 rate limiting; I reran the remaining 5 cases in spaced batches and they passed.
  • Ran a temporary local Jest smoke for run_tools_with_bash against localhost and confirmed the new compact session-file summary appears without listing the generated filename. Removed the temporary test file after the smoke.

@danny-avila danny-avila marked this pull request as ready for review May 21, 2026 19:23
@danny-avila danny-avila changed the title 🧭 fix: Clarify Code Artifact Guidance 🧑‍💻 fix: Clarify Code Artifact Guidance May 21, 2026
@danny-avila danny-avila changed the title 🧑‍💻 fix: Clarify Code Artifact Guidance 🧑‍💻 refactor: Clarify Code Artifact Guidance May 21, 2026
@danny-avila danny-avila force-pushed the danny-avila/code-tool-output-guidance branch from 5f85546 to 06efcdd Compare May 21, 2026 19:37
Copy link
Copy Markdown
Owner Author

Added one terse bash-specific guidance sentence after the live LLM smoke exposed literal \n CSV writes:

For multi-line file writes, use heredoc or printf; echo "\\n" may stay literal.

Re-ran focused tests:

  • npx jest src/tools/__tests__/ProgrammaticToolCalling.test.ts src/tools/__tests__/BashExecutor.test.ts --runInBand

@danny-avila
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown
Owner Author

Reran the real LLM smoke after the terse bash multiline hint.

Prompt did not mention /mnt/data, /tmp, heredoc/printf, supported extensions, persistence, or Markdown image behavior. The model wrote the CSV with a heredoc on the first attempt:

cat <<EOF > /mnt/data/support_ticket_categories.csv
Category,Count
Login Issues,15
Payment Problems,9
Bug Reports,12
Feature Requests,7
Account Management,10
Other,5
EOF

It also saved the chart as /mnt/data/support_ticket_categories.png, reused the known CSV path on the follow-up without ls/find, avoided /tmp for persistent artifacts, and did not wrap the image in Markdown or invent a download link.

One unrelated bash-mode stumble remains: on the follow-up it first pasted Python statements directly into bash, got a syntax error, then self-corrected to python3 -c. The multiline file-write behavior improved as intended.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 06efcdd46d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/CodeExecutor.ts Outdated
Comment thread src/tools/CodeExecutor.ts Outdated
Comment thread src/tools/CodeExecutor.ts Outdated
@danny-avila danny-avila force-pushed the danny-avila/code-tool-output-guidance branch from 06efcdd to 082fd63 Compare May 21, 2026 19:46
Copy link
Copy Markdown
Owner Author

Condensed the code-artifact/tool guidance while keeping the two live-smoke fixes:

  • Shortened the shared artifact rule to one compact /mnt/data + extension + /tmp sentence.
  • Shortened the PTC state/async rules and removed duplicated generated-file availability wording.
  • Replaced the previous bash multiline note with one compact shell guard: Bash: multi-line files use heredoc/printf; run Python via python3 -c/heredoc, not bare Python.

Validation:

  • npx jest src/tools/__tests__/ProgrammaticToolCalling.test.ts src/tools/__tests__/BashExecutor.test.ts --runInBand
  • Real LLM smoke against http://localhost:3112/v1 with no explicit artifact/path/heredoc/Python-wrapper instructions in the prompt.

Live-smoke result: the model wrote the CSV with a heredoc, wrote the PNG under /mnt/data, reused the known CSV path in the follow-up, and used python3 -c instead of pasting bare Python into bash. It avoided /tmp, ls/find, Markdown image wrapping, and download-link wording.

@danny-avila danny-avila force-pushed the danny-avila/code-tool-output-guidance branch from 082fd63 to 325955b Compare May 21, 2026 19:53
@danny-avila
Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danny-avila danny-avila force-pushed the danny-avila/code-tool-output-guidance branch from 325955b to 5f77b8c Compare May 21, 2026 20:13
@danny-avila
Copy link
Copy Markdown
Owner Author

danny-avila commented May 21, 2026

Added a small PTC inner-tool output contract after live testing against the remote CodeAPI-backed engine. Direct continuation checks confirmed Python receives object tool results as dicts and bash receives object/string tool results as JSON on stdout. Live LLM smoke before the change showed Python sometimes passed a positional dict and bash preserved JSON quotes for string results; after the added guidance, Python used keyword args on the first try and bash used jq -r . to unquote string results.

Validation:

  • Remote CodeAPI /exec/programmatic direct continuation checks for Python and bash
  • Live agent smoke using run_tools_with_code and run_tools_with_bash against http://localhost:3112/v1
  • npx jest src/tools/__tests__/ProgrammaticToolCalling.test.ts src/tools/__tests__/BashExecutor.test.ts src/tools/__tests__/ToolNode.outputReferences.test.ts --runInBand
  • npx eslint src/tools/ProgrammaticToolCalling.ts src/tools/BashProgrammaticToolCalling.ts src/tools/__tests__/ProgrammaticToolCalling.test.ts
  • npx tsc --noEmit

@danny-avila
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5f77b8cdb7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/CodeExecutor.ts Outdated
@danny-avila danny-avila force-pushed the danny-avila/code-tool-output-guidance branch from 5f77b8c to d6bbecb Compare May 21, 2026 20:22
@danny-avila
Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danny-avila danny-avila merged commit ded724b into main May 21, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant