Skip to content

Conversation

@dang232
Copy link

@dang232 dang232 commented Jan 14, 2026

Summary

Add Sherlock - a hypothesis-driven debugging specialist that uses runtime evidence (not guesswork) to diagnose bugs. Implements the Oracle → Sherlock debugging flow where Oracle provides system context first, then Sherlock performs evidence-based debugging.

Key Features

  1. 8-Phase Debugging Workflow: Problem Report → Hypothesis Generation → Instrumentation → Reproduction → Log Analysis → Fix → Verification → Cleanup
  2. Oracle-First Context: After 2 failed fix attempts, consult Oracle for system architecture/gotchas, then delegate to Sherlock with that context
  3. Docker/Container Debugging: 5 strategies for debugging code inside Docker (logs, exec, volumes, network, environment)
  4. System Boundary Analysis: Specialized handling for ORM/DB/API bugs (timezone, type coercion, data transformation)
  5. Playwright MCP Integration: Browser automation for UI debugging via skill_mcp

Debugging Flow

Sisyphus fails 2x → STOP
       ↓
Oracle (system context: architecture, known gotchas, focus areas)
       ↓
Sherlock (with Oracle context → better hypotheses)
       ↓
Fix with runtime evidence

Changes

File Lines Description
src/agents/sherlock.ts +771 8-phase debugging workflow, Docker/boundary strategies, subagent mode
src/agents/sherlock.test.ts +129 Unit tests for sherlock agent configuration
src/agents/sisyphus.ts +65 Add debugging step to pre-delegation, Phase 2C Oracle→Sherlock flow
src/agents/sisyphus-prompt-builder.ts +64 buildSherlockSection() with Oracle vs Sherlock comparison
src/agents/orchestrator-sisyphus.ts +41/-17 Replace debugging-master with sherlock, Oracle→Sherlock flow
src/tools/sisyphus-task/tools.ts +97 Add sherlock as subagent, consolidate partial-response parsing, timeout guards
src/agents/types.ts +1 Add "sherlock" to SubAgentName union
src/agents/utils.ts +3 Register sherlock metadata
src/agents/index.ts +2 Export sherlock agent
src/hooks/sisyphus-orchestrator/index.ts +17 isGitRepo() guard to prevent git diff spam
src/features/skill-mcp-manager/env-cleaner.ts +1 Skip "undefined" string values
src/auth/antigravity/storage.test.ts +7 Skip POSIX chmod assertion on Windows

Total: 13 files changed, +1182 lines, -18 lines

Commits

28f176d feat(orchestrator): integrate Sherlock with Oracle-first debugging flow
84bea8a feat(sherlock): Oracle-first debugging flow for better system context
7e30d3d feat(sherlock): integrate into Sisyphus delegation with Docker/system boundary support
576e947 feat(sherlock): add Playwright MCP integration for browser debugging
c4c2432 refactor(sisyphus-task): consolidate partial-response parsing logic
2c3ea3a fix(sisyphus-task): guard timeout partial fetch against session errors
901047c update prompt
e49802c feat: update base sherlock
934a7f7 fix(sisyphus-task): surface sync polling timeout with clear error

Testing

bun run typecheck                                    # Clean
bun test src/agents/sherlock.test.ts                 # 15 pass
bun test src/agents/                                 # 52 pass
bun test src/hooks/sisyphus-orchestrator/            # 28 pass (was timing out)
bun test src/features/skill-mcp-manager/env-cleaner.test.ts  # 10 pass

Sherlock Capabilities

Escalation Path

Iteration Strategy
1-2 Standard hypothesis → instrumentation → analysis
3 Dependency Scan: trace callers/callees, map data flow
4+ System Boundary Analysis: instrument both sides of ORM/DB/API

Docker Debugging Strategies

  1. Container logs (docker logs -f)
  2. Docker exec (interactive debugging)
  3. Volume-mounted logs
  4. Network debugging
  5. Environment inspection

System Boundary Bug Example (Timezone)

User sets: GMT+7 2024-01-15 14:00
Prisma stores: UTC 2024-01-15 07:00 (timezone stripped)
Display shows: 07:00 (wrong!)

Without Oracle: Sherlock wastes 4 iterations on wrong subsystems
With Oracle: "Prisma strips timezone by default" → Sherlock targets ORM immediately

Related Issues

Closes #746

- Add explicit timeout error surfacing for sync polling loop (10 min max)
- Add timeout error surfacing for resume polling loop (60s max)
- Include session ID and partial response in timeout error message
- Properly cleanup toast manager and session state on timeout

This fixes the freeze issue where sisyphus_task would hang silently
when an agent (like @sherlock) never reached idle status within the
timeout period. Now users see a clear error message with debugging info.
@github-actions
Copy link
Contributor

github-actions bot commented Jan 14, 2026

All contributors have signed the CLA. Thank you! ✅
Posted by the CLA Assistant Lite bot.

@dang232
Copy link
Author

dang232 commented Jan 14, 2026

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Jan 14, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 901047cb9d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 576 to 580
const partialMsgs = ((partialResult as { data?: unknown }).data ?? partialResult) as Array<{
info?: { role?: string; time?: { created?: number } }
parts?: Array<{ type?: string; text?: string }>
}>
const partialAssistant = partialMsgs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard timeout partial fetch against session errors

In the timeout branch you coerce client.session.messages into an array without checking for an error result. If the session is expired or the API call fails, messages() can return an { error: ... } object (as handled later in the function), so partialMsgs is not an array and the subsequent .filter/.sort path will throw, preventing the tool from returning the timeout message. Consider checking partialResult.error or guarding with Array.isArray/try-catch (the same pattern appears in the resume timeout block above).

Useful? React with 👍 / 👎.

Add error and Array.isArray checks before processing partial messages
in both resume and task timeout handlers. Previously, if the session
was expired or the API call failed, client.session.messages() could
return an { error: ... } object instead of an array, causing .filter()
to throw and preventing the timeout message from being returned.
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 10 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 10 files

Confidence score: 4/5

  • The duplicate partial-response parsing logic in src/tools/sisyphus-task/tools.ts across the timeout branches increases the chance the branches drift out of sync, potentially causing inconsistent parser fixes over time.
  • Pay close attention to src/tools/sisyphus-task/tools.ts - consolidate the repeated timeout parsing logic to prevent divergence.
Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/tools/sisyphus-task/tools.ts">

<violation number="1" location="src/tools/sisyphus-task/tools.ts:274">
P2: Duplicate partial-response parsing logic added in two timeout branches; should be consolidated to avoid divergence</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@gitguardian
Copy link

gitguardian bot commented Jan 14, 2026

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

Since your pull request originates from a forked repository, GitGuardian is not able to associate the secrets uncovered with secret incidents on your GitGuardian dashboard.
Skipping this check run and merging your pull request will create secret incidents on your GitGuardian dashboard.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
- - Generic Password e49802c src/agents/sherlock.ts View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Extract duplicate code from resume and task timeout handlers into
reusable helper functions:
- extractPartialResponseText(): handles error/array guards and message parsing
- formatPartialResponse(): consistent truncation and formatting

Reduces code duplication and prevents future divergence between the
two timeout branches.
Copy link
Contributor

@jkoelker jkoelker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've run this agent locally and it did a pretty decent job of diagnosing the suspend issues on my laptop (i already know the underlying issue and it found it as well).

Add skill_mcp tool documentation to Sherlock's prompt for browser-based
debugging scenarios. This enables:
- Visual bug diagnosis via screenshots
- Browser console error inspection
- UI interaction debugging
- Network request analysis

Sherlock can now use Playwright MCP when debugging frontend/UI issues.
… boundary support

- Add buildSherlockSection() to sisyphus-prompt-builder for dynamic docs
- Add debugging step (step 5) to pre-delegation planning decision tree
- Update Phase 2C to delegate to Sherlock after 2 failed fix attempts
- Add Docker/Container debugging strategies (5 strategies)
- Add System Boundary Analysis for ORM/DB/API bugs (timezone, type coercion)
- Add Dependency Scan for 3rd iteration escalation
- Add escalation path: iteration 1-2 -> 3 (scan) -> 4+ (boundaries) -> Oracle
- Update triggers to include container debugging and system boundary bugs
- Change flow from 'Sherlock then Oracle' to 'Oracle FIRST then Sherlock'
- Oracle provides system context (architecture, known gotchas, focus areas)
- Sherlock receives Oracle's context to form better hypotheses
- Prevents wasted iterations on wrong subsystems
- Example: Oracle knows 'Prisma strips timezone' -> Sherlock targets ORM immediately
- Update Phase 2C with Oracle consultation and Sherlock delegation formats
- Update buildSherlockSection with Oracle -> Sherlock usage pattern
- Replace deprecated 'debugging-master' with 'sherlock' across orchestrator
- Update agent selection table to include sherlock with Oracle context note
- Update decision matrix: 'Debug complex issue' -> Oracle context -> Sherlock
- Update delegation table: Hard debugging -> Oracle -> Sherlock flow
- Change failure threshold from 3 to 2 consecutive failures
- Add debugging flow diagram: Failure -> Oracle -> Sherlock -> Fix
- Update delegation targets to show Oracle + Sherlock pattern
@dang232 dang232 closed this Jan 17, 2026
@dang232 dang232 reopened this Jan 17, 2026
sssgun pushed a commit to sssgun/oh-my-opencode that referenced this pull request Jan 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Debug mode like in cursor

2 participants