[F:746] debug mode #777

dang232 · 2026-01-14T07:41:08Z

Summary

Add Sherlock - a hypothesis-driven debugging specialist that uses runtime evidence (not guesswork) to diagnose bugs. Implements the Oracle → Sherlock debugging flow where Oracle provides system context first, then Sherlock performs evidence-based debugging.

Key Features

8-Phase Debugging Workflow: Problem Report → Hypothesis Generation → Instrumentation → Reproduction → Log Analysis → Fix → Verification → Cleanup
Oracle-First Context: After 2 failed fix attempts, consult Oracle for system architecture/gotchas, then delegate to Sherlock with that context
Docker/Container Debugging: 5 strategies for debugging code inside Docker (logs, exec, volumes, network, environment)
System Boundary Analysis: Specialized handling for ORM/DB/API bugs (timezone, type coercion, data transformation)
Playwright MCP Integration: Browser automation for UI debugging via skill_mcp

Debugging Flow

Sisyphus fails 2x → STOP
       ↓
Oracle (system context: architecture, known gotchas, focus areas)
       ↓
Sherlock (with Oracle context → better hypotheses)
       ↓
Fix with runtime evidence

Changes

File	Lines	Description
`src/agents/sherlock.ts`	+771	8-phase debugging workflow, Docker/boundary strategies, subagent mode
`src/agents/sherlock.test.ts`	+129	Unit tests for sherlock agent configuration
`src/agents/sisyphus.ts`	+65	Add debugging step to pre-delegation, Phase 2C Oracle→Sherlock flow
`src/agents/sisyphus-prompt-builder.ts`	+64	`buildSherlockSection()` with Oracle vs Sherlock comparison
`src/agents/orchestrator-sisyphus.ts`	+41/-17	Replace `debugging-master` with `sherlock`, Oracle→Sherlock flow
`src/tools/sisyphus-task/tools.ts`	+97	Add sherlock as subagent, consolidate partial-response parsing, timeout guards
`src/agents/types.ts`	+1	Add "sherlock" to SubAgentName union
`src/agents/utils.ts`	+3	Register sherlock metadata
`src/agents/index.ts`	+2	Export sherlock agent
`src/hooks/sisyphus-orchestrator/index.ts`	+17	`isGitRepo()` guard to prevent git diff spam
`src/features/skill-mcp-manager/env-cleaner.ts`	+1	Skip `"undefined"` string values
`src/auth/antigravity/storage.test.ts`	+7	Skip POSIX chmod assertion on Windows

Total: 13 files changed, +1182 lines, -18 lines

Commits

28f176d feat(orchestrator): integrate Sherlock with Oracle-first debugging flow
84bea8a feat(sherlock): Oracle-first debugging flow for better system context
7e30d3d feat(sherlock): integrate into Sisyphus delegation with Docker/system boundary support
576e947 feat(sherlock): add Playwright MCP integration for browser debugging
c4c2432 refactor(sisyphus-task): consolidate partial-response parsing logic
2c3ea3a fix(sisyphus-task): guard timeout partial fetch against session errors
901047c update prompt
e49802c feat: update base sherlock
934a7f7 fix(sisyphus-task): surface sync polling timeout with clear error

Testing

bun run typecheck                                    # Clean
bun test src/agents/sherlock.test.ts                 # 15 pass
bun test src/agents/                                 # 52 pass
bun test src/hooks/sisyphus-orchestrator/            # 28 pass (was timing out)
bun test src/features/skill-mcp-manager/env-cleaner.test.ts  # 10 pass

Sherlock Capabilities

Escalation Path

Iteration	Strategy
1-2	Standard hypothesis → instrumentation → analysis
3	Dependency Scan: trace callers/callees, map data flow
4+	System Boundary Analysis: instrument both sides of ORM/DB/API

Docker Debugging Strategies

Container logs (docker logs -f)
Docker exec (interactive debugging)
Volume-mounted logs
Network debugging
Environment inspection

System Boundary Bug Example (Timezone)

User sets: GMT+7 2024-01-15 14:00
Prisma stores: UTC 2024-01-15 07:00 (timezone stripped)
Display shows: 07:00 (wrong!)

Without Oracle: Sherlock wastes 4 iterations on wrong subsystems
With Oracle: "Prisma strips timezone by default" → Sherlock targets ORM immediately

Related Issues

Closes #746

@sherlock

- Add explicit timeout error surfacing for sync polling loop (10 min max) - Add timeout error surfacing for resume polling loop (60s max) - Include session ID and partial response in timeout error message - Properly cleanup toast manager and session state on timeout This fixes the freeze issue where sisyphus_task would hang silently when an agent (like @sherlock) never reached idle status within the timeout period. Now users see a clear error message with debugging info.

github-actions · 2026-01-14T07:41:18Z

All contributors have signed the CLA. Thank you! ✅
_{Posted by the CLA Assistant Lite bot.}

dang232 · 2026-01-14T07:41:50Z

I have read the CLA Document and I hereby sign the CLA

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 901047cb9d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-14T07:44:30Z

src/tools/sisyphus-task/tools.ts

+          const partialMsgs = ((partialResult as { data?: unknown }).data ?? partialResult) as Array<{
+            info?: { role?: string; time?: { created?: number } }
+            parts?: Array<{ type?: string; text?: string }>
+          }>
+          const partialAssistant = partialMsgs


Guard timeout partial fetch against session errors

In the timeout branch you coerce client.session.messages into an array without checking for an error result. If the session is expired or the API call fails, messages() can return an { error: ... } object (as handled later in the function), so partialMsgs is not an array and the subsequent .filter/.sort path will throw, preventing the tool from returning the timeout message. Consider checking partialResult.error or guarding with Array.isArray/try-catch (the same pattern appears in the resume timeout block above).

Useful? React with 👍 / 👎.

Add error and Array.isArray checks before processing partial messages in both resume and task timeout handlers. Previously, if the session was expired or the API call failed, client.session.messages() could return an { error: ... } object instead of an array, causing .filter() to throw and preventing the timeout message from being returned.

cubic-dev-ai

No issues found across 10 files

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

cubic-dev-ai

1 issue found across 10 files

Confidence score: 4/5

The duplicate partial-response parsing logic in src/tools/sisyphus-task/tools.ts across the timeout branches increases the chance the branches drift out of sync, potentially causing inconsistent parser fixes over time.
Pay close attention to src/tools/sisyphus-task/tools.ts - consolidate the repeated timeout parsing logic to prevent divergence.

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/tools/sisyphus-task/tools.ts">

<violation number="1" location="src/tools/sisyphus-task/tools.ts:274">
P2: Duplicate partial-response parsing logic added in two timeout branches; should be consolidated to avoid divergence</violation>
</file>

Since this is your first cubic review, here's how it works:

cubic automatically reviews your code and comments on bugs and improvements
Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
Ask questions if you need clarification on any suggestion

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

src/tools/delegate-task/tools.ts

gitguardian · 2026-01-14T09:02:58Z

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

Since your pull request originates from a forked repository, GitGuardian is not able to associate the secrets uncovered with secret incidents on your GitGuardian dashboard.
Skipping this check run and merging your pull request will create secret incidents on your GitGuardian dashboard.

🔎 Detected hardcoded secret in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
-	-	Generic Password	`e49802c`	src/agents/sherlock.ts	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secret safely. Learn here the best practices.
Revoke and rotate this secret.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

Extract duplicate code from resume and task timeout handlers into reusable helper functions: - extractPartialResponseText(): handles error/array guards and message parsing - formatPartialResponse(): consistent truncation and formatting Reduces code duplication and prevents future divergence between the two timeout branches.

jkoelker

I've run this agent locally and it did a pretty decent job of diagnosing the suspend issues on my laptop (i already know the underlying issue and it found it as well).

Add skill_mcp tool documentation to Sherlock's prompt for browser-based debugging scenarios. This enables: - Visual bug diagnosis via screenshots - Browser console error inspection - UI interaction debugging - Network request analysis Sherlock can now use Playwright MCP when debugging frontend/UI issues.

… boundary support - Add buildSherlockSection() to sisyphus-prompt-builder for dynamic docs - Add debugging step (step 5) to pre-delegation planning decision tree - Update Phase 2C to delegate to Sherlock after 2 failed fix attempts - Add Docker/Container debugging strategies (5 strategies) - Add System Boundary Analysis for ORM/DB/API bugs (timezone, type coercion) - Add Dependency Scan for 3rd iteration escalation - Add escalation path: iteration 1-2 -> 3 (scan) -> 4+ (boundaries) -> Oracle - Update triggers to include container debugging and system boundary bugs

- Change flow from 'Sherlock then Oracle' to 'Oracle FIRST then Sherlock' - Oracle provides system context (architecture, known gotchas, focus areas) - Sherlock receives Oracle's context to form better hypotheses - Prevents wasted iterations on wrong subsystems - Example: Oracle knows 'Prisma strips timezone' -> Sherlock targets ORM immediately - Update Phase 2C with Oracle consultation and Sherlock delegation formats - Update buildSherlockSection with Oracle -> Sherlock usage pattern

- Replace deprecated 'debugging-master' with 'sherlock' across orchestrator - Update agent selection table to include sherlock with Oracle context note - Update decision matrix: 'Debug complex issue' -> Oracle context -> Sherlock - Update delegation table: Hard debugging -> Oracle -> Sherlock flow - Change failure threshold from 3 to 2 consecutive failures - Add debugging flow diagram: Failure -> Oracle -> Sherlock -> Fix - Update delegation targets to show Oracle + Sherlock pattern

dang232 added 3 commits January 14, 2026 00:47

feat: update base sherlock

e49802c

update prompt

901047c

github-actions bot added a commit that referenced this pull request Jan 14, 2026

@dang232 has signed the CLA in #777

ffbab8f

chatgpt-codex-connector bot reviewed Jan 14, 2026

View reviewed changes

cubic-dev-ai bot reviewed Jan 14, 2026

View reviewed changes

src/tools/delegate-task/tools.ts Show resolved Hide resolved

dang232 mentioned this pull request Jan 14, 2026

[Feature]: Debug mode like in cursor #746

Open

6 tasks

jkoelker approved these changes Jan 15, 2026

View reviewed changes

dang232 added 4 commits January 15, 2026 23:48

dang232 closed this Jan 17, 2026

dang232 reopened this Jan 17, 2026

sssgun pushed a commit to sssgun/oh-my-opencode that referenced this pull request Jan 18, 2026

@dang232 has signed the CLA in code-yeongyu#777

0cfce83

dang232 added 3 commits January 20, 2026 16:31

Merge remote-tracking branch 'upstream/dev' into F-746-Debug-mode

c860f0f

Merge remote-tracking branch 'upstream/dev' into F-746-Debug-mode

d1bcfcb

Merge remote-tracking branch 'upstream/dev' into F-746-Debug-mode

cda1d17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[F:746] debug mode #777

[F:746] debug mode #777

dang232 commented Jan 14, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 14, 2026 •

edited

Loading

Uh oh!

dang232 commented Jan 14, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Jan 14, 2026

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

gitguardian bot commented Jan 14, 2026 •

edited

Loading

Uh oh!

jkoelker left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[F:746] debug mode #777

Are you sure you want to change the base?

[F:746] debug mode #777

Conversation

dang232 commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Features

Debugging Flow

Changes

Commits

Testing

Sherlock Capabilities

Escalation Path

Docker Debugging Strategies

System Boundary Bug Example (Timezone)

Related Issues

Uh oh!

github-actions bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dang232 commented Jan 14, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gitguardian bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Uh oh!

jkoelker left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dang232 commented Jan 14, 2026 •

edited

Loading

github-actions bot commented Jan 14, 2026 •

edited

Loading

gitguardian bot commented Jan 14, 2026 •

edited

Loading