Skip to content

feat(core): best-first URL scoring strategy for crawl workloads (#983)#1065

Merged
shaun0927 merged 19 commits into
developfrom
feat/983-best-first
May 13, 2026
Merged

feat(core): best-first URL scoring strategy for crawl workloads (#983)#1065
shaun0927 merged 19 commits into
developfrom
feat/983-best-first

Conversation

@shaun0927
Copy link
Copy Markdown
Owner

@shaun0927 shaun0927 commented May 12, 2026

Progress / Review status

Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.

Field Value
Branch feat/983-best-firstdevelop
Draft no
CI ✅ all 9 checks passing
Mergeable ❌ CONFLICTING
Review decision
Codex (latest) 💡 suggestions posted
Other reviewers (latest) chatgpt-codex-connector: commented
Head 09f6025 — Preserve shallow best-first crawl candidates
Commits 16

Owner comment cleanup: 14 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.


Summary

  • Adds opt-in strategy: "best_first" to crawl without changing default BFS behavior.
  • Adds deterministic URL scoring in src/core/crawl/url-scorer.ts using query terms, keywords, preferred/excluded paths, depth penalty, and low-signal path penalties.
  • Emits score / score_reasons per page and summary scoring counters only when best-first is requested.

Closes #983.

Direction / scope check

Verification

  • npm run build -- --pretty false
  • npm test -- --runInBand tests/core/crawl/url-scorer.test.ts tests/core/tools/crawl.engine.test.ts
  • npm run lint:changed

Known gaps

  • Full npm run lint -- --max-warnings=0 is blocked by pre-existing unrelated lint findings in src/core/perception/semantic.ts, src/core/performance/insights/evaluators.ts, src/session-manager.ts, src/tools/connect.ts, and src/tools/fill-form.ts.
  • Live OpenChrome crawl smoke from feat(core): best-first URL scoring strategy for crawl workloads #983 remains post-merge/manual.

@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@qodo-code-review
Copy link
Copy Markdown

ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84c4041eaa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/core/crawl/url-scorer.ts Outdated

const explicitKeywords = (options.keywords || []).map(normalizeTerm).filter(Boolean);
const terms = Array.from(new Set([...queryTerms(options.query), ...explicitKeywords]));
const haystack = `${decodeURIComponent(parsed.pathname)} ${parsed.searchParams.toString()}`.toLowerCase();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle malformed percent-encoding before scoring URL

scoreUrl calls decodeURIComponent(parsed.pathname) without a guard, but new URL(...) accepts paths like /%zz and decodeURIComponent then throws URI malformed. In strategy: "best_first", that exception bubbles out of enqueueItems/crawl loop and aborts the entire crawl instead of just assigning a low score to the bad URL. This makes crawl reliability depend on whether any discovered link has invalid percent-encoding.

Useful? React with 👍 / 👎.

Adds an opt-in best_first crawl strategy with deterministic URL scoring so targeted crawls spend page budget on more relevant URLs first while preserving legacy BFS behavior by default.\n\nConstraint: Must not duplicate open crawl-job/static-fetch PRs or change default crawl output.\nRejected: Content/LLM ranking | would expand scope and conflict with OpenChrome core boundaries.\nConfidence: high\nScope-risk: narrow\nDirective: Keep future scorer changes deterministic and fixture-gated; do not make public live-site ordering a CI gate.\nTested: npm run build -- --pretty false; npm test -- --runInBand tests/core/crawl/url-scorer.test.ts tests/core/tools/crawl.engine.test.ts; npm run lint:changed\nNot-tested: Full npm test suite; live OpenChrome crawl smoke.
@shaun0927 shaun0927 force-pushed the feat/983-best-first branch from 84c4041 to ef02fb5 Compare May 12, 2026 16:40
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef02fb5dc2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/crawl.ts Outdated
Comment on lines 657 to 660
if (probe.depth > maxDepth) break;
// Otherwise put it back and try again — shouldn't happen but be safe
tracker.enqueue([probe]);
enqueueItems([probe]);
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep crawling after re-queuing an in-depth probe item

In strategy: "best_first", the queue is score-sorted rather than depth-ordered, so it can contain high-score URLs beyond max_depth ahead of valid shallower URLs. When the batch is empty, this block re-queues a valid probe (probe.depth <= maxDepth) and then immediately breaks the outer crawl loop, which stops crawling early and can leave crawlable pages unvisited (e.g., concurrency=1, one depth-2 high-score URL ahead of remaining depth-1 URLs).

Useful? React with 👍 / 👎.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Guard URL scoring against malformed percent-encoded paths and keep the best-first crawl loop from stopping after re-queueing valid work.\n\nConstraint: PR #1065 must preserve the default BFS output and only affect opt-in strategy=best_first behavior.\nRejected: Dropping malformed URLs entirely | safe scoring keeps crawl progress deterministic without treating parseable URLs as invalid.\nConfidence: high\nScope-risk: narrow\nDirective: Keep future best-first queue changes tolerant of score-sorted depth ordering.\nTested: npm ci; npm run lint:changed; npm run lint:tier; npx jest --runInBand tests/core/crawl/url-scorer.test.ts tests/core/tools/crawl.engine.test.ts\nNot-tested: Full hosted CI matrix pending after push.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e0bb8830b5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/crawl.ts Outdated
}
for (const entry of entries) {
const item = makeQueueItem(entry);
if (tracker.hasVisited(item.url) || bestFirstQueued.has(item.url)) continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve shallower duplicate URLs in best-first queue

When strategy="best_first", the queue dedupes purely by URL and drops any rediscovery of the same URL (bestFirstQueued.has(item.url)), even if the new discovery has a smaller depth. This can cause reachable pages to be skipped under max_depth: if a URL is first queued at depth 3, then later found at depth 2 while still queued, the depth-2 entry is discarded and the only remaining entry may be filtered out by next.depth > maxDepth. In that case the crawler misses an in-scope page that should have been crawled.

Useful? React with 👍 / 👎.

Merging develop brings the shared CI test fixes into the PR branch without changing the best-first feature scope.

Constraint: PR #1065 targets develop and its prior checks ran against a stale red baseline.

Rejected: Reimplementing shared CI fixes in this branch | merging develop preserves one source of truth.

Confidence: high

Scope-risk: narrow

Directive: Keep this branch focused on best-first crawl behavior after the baseline refresh.

Tested: npm run build; npm run lint:changed; npm run lint:tier; npx jest --runInBand tests/core/crawl/url-scorer.test.ts tests/core/tools/crawl.engine.test.ts

Not-tested: Full hosted CI matrix pending after push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cbbf4af908

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


for (const prefix of options.preferPaths || []) {
const normalized = normalizePathPrefix(prefix);
if (normalized && parsed.pathname.toLowerCase().startsWith(normalized)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match path hints on segment boundaries

Using startsWith for prefer_paths/exclude_paths makes /blog also match unrelated paths like /blogging or /blog-roll, so best-first can boost or suppress the wrong URLs on real sites with similarly prefixed routes. This changes crawl ordering and can spend the page budget on less relevant pages even when users provide precise path hints.

Useful? React with 👍 / 👎.

Comment thread src/tools/crawl.ts
const scored = scoreUrl(normalized, entry.depth, scoringOptions);
item.score = scored.score;
item.score_reasons = scored.reasons;
scoredUrls++;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Count scored URLs only after queue deduplication

scored_urls is incremented inside makeQueueItem before checking hasVisited/bestFirstQueued, so duplicate discoveries inflate the summary counter even though those URLs are skipped and never enter the queue. In crawls with repeated nav links this can significantly overstate scoring activity and make strategy metrics misleading.

Useful? React with 👍 / 👎.

The Cursor smoke test should prove core tool visibility without failing every develop-based PR when additional Tier 1 tools graduate.

Constraint: Hosted CI currently reports 45 Tier 1 tools while the test hard-coded 39.

Rejected: Raising the exact expected count to 45 | that would recreate the same brittle failure on the next Tier 1 graduation.

Confidence: high

Scope-risk: narrow

Directive: Assert required core tools explicitly and avoid exact catalog-size checks in cross-client smoke tests.

Tested: npx jest --runInBand tests/cross-env/cursor-verification.test.ts

Not-tested: Linux Node 20/22 local execution; hosted CI pending after push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d404561ee

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +151 to +153
if (options.sameDepthBias && Number.isFinite(options.sameDepthBias)) {
score += options.sameDepthBias;
reasons.push(`bias:${options.sameDepthBias}`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply same_depth_bias conditionally by candidate depth

same_depth_bias is currently added to every scored URL unconditionally, so it shifts all scores by the same constant and does not change best_first ordering at all. In practice, users who provide this option get different numeric score values but no traversal effect, which makes the knob effectively non-functional for crawl prioritization.

Useful? React with 👍 / 👎.

The console_capture regression should compare response shape, not fail Windows CI because checked-out JSON fixtures use CRLF line endings.

Constraint: Windows CI reads the baseline fixture with CRLF while JSON.stringify emits LF.

Rejected: Updating the fixture to Windows line endings | that would break Unix runners and still encode platform formatting into the assertion.

Confidence: high

Scope-risk: narrow

Directive: Normalize platform line endings before snapshot-style string comparisons.

Tested: npx jest --runInBand tests/tools/console-capture-regression.test.ts

Not-tested: Full hosted CI matrix pending after push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ac3de5d1bf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/crawl.ts Outdated
Comment on lines +550 to +551
if (tracker.hasVisited(item.url) || bestFirstQueued.has(item.url)) continue;
bestFirstQueued.add(item.url);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prefer shallower depth when de-duplicating queued URLs

When strategy='best_first', enqueueItems drops a rediscovered URL if it is already in bestFirstQueued, regardless of depth. This can suppress a valid shallow path: if a URL is first queued at depth 3, then later rediscovered at depth 1, the depth-1 entry is discarded, and if max_depth is 1 or 2 the depth-3 entry is skipped and the page is never crawled even though it is reachable within the configured depth limit. Please update queued de-duplication to retain (or replace with) the minimum discovered depth per URL.

Useful? React with 👍 / 👎.

The health endpoint integration test should treat both code=0 and signal=SIGTERM as clean shutdown outcomes after sending SIGTERM.

Constraint: CI can report SIGTERM exits as code=null even when teardown is graceful.

Rejected: Extending timeouts | the failure is the accepted exit shape, not startup or shutdown latency.

Confidence: high

Scope-risk: narrow

Directive: Avoid platform-specific assumptions for Node ChildProcess signal exit reporting.

Tested: npx jest --runInBand tests/integration/health-endpoint-gating.test.ts

Not-tested: Full hosted CI matrix pending after push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d7d6f8650a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +151 to +153
if (options.sameDepthBias && Number.isFinite(options.sameDepthBias)) {
score += options.sameDepthBias;
reasons.push(`bias:${options.sameDepthBias}`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make same_depth_bias affect ranking

same_depth_bias is currently added as a constant to every scored URL, so it does not change relative scores and therefore cannot influence best_first crawl ordering at all. In crawl, URLs are prioritized strictly by score, so this option is effectively a no-op for its intended purpose (changing traversal priority) and users who set it will see no behavioral difference beyond score labels.

Useful? React with 👍 / 👎.

The health endpoint integration test should allow the same 30s shutdown window on Linux that it already allowed on Windows under hosted CI load.

Constraint: Ubuntu CI observed a graceful SIGTERM path that exceeded the previous 10s non-Windows limit.

Rejected: Skipping the scenario | the endpoint gating behavior remains important and passes with a realistic teardown window.

Confidence: high

Scope-risk: narrow

Directive: Keep process-spawning integration tests tolerant of hosted runner teardown latency.

Tested: npx jest --runInBand tests/integration/health-endpoint-gating.test.ts

Not-tested: Full hosted CI matrix pending after push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

The admin key test should verify exactly one plaintext token without failing when Jest worker console noise is captured around the in-process CLI stdout hook.

Constraint: Windows CI can interleave decorated Jest console output into the captured stdout buffer.

Rejected: Requiring stdout to have exactly one line | the helper already documents and handles shared stdout hook noise.

Confidence: high

Scope-risk: narrow

Directive: Assert secret emission by token occurrence, not by total captured line count.

Tested: npx jest --runInBand tests/cli/admin-keys.test.ts

Not-tested: Full hosted CI matrix pending after push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b051df8593

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +151 to +153
if (options.sameDepthBias && Number.isFinite(options.sameDepthBias)) {
score += options.sameDepthBias;
reasons.push(`bias:${options.sameDepthBias}`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make same_depth_bias affect relative URL ordering

same_depth_bias is currently added as a flat constant to every scored URL, so it cancels out in all pairwise comparisons and cannot change which pages are crawled first in strategy="best_first". In practice, callers who set url_score.same_depth_bias get different score values but identical traversal order, which makes this advertised scoring hint ineffective for crawl prioritization.

Useful? React with 👍 / 👎.

The storage-size E2E should measure complete persisted JSON states rather than a zero-byte file observed while the async writer is truncating and rewriting.

Constraint: Windows CI can take longer than a fixed sleep to flush DomainMemory writes.

Rejected: Increasing the blind sleep | state-based polling proves the needed persisted state and avoids new timing flakes.

Confidence: high

Scope-risk: narrow

Directive: Prefer state-based waits over fixed sleeps for async persistence tests.

Tested: npx jest --runInBand tests/e2e/scenarios/domain-memory-persistence.test.ts

Not-tested: Full hosted CI matrix pending after push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Merge latest develop into PR #1065 while preserving the branch's non-brittle Tier 1 visibility smoke test.

Constraint: PR #962 changed shared transport files and CI fixtures after this branch was opened.\nRejected: Restoring an exact Tier 1 count | this branch intentionally allows the Tier 1 surface to grow as core tools graduate.\nConfidence: high\nScope-risk: narrow\nDirective: Keep fixture refreshes semantically faithful to the feature branch instead of overwriting lower-bound guards with develop's exact-count snapshot.\nTested: npx jest tests/tools/console-capture-regression.test.ts tests/cross-env/cursor-verification.test.ts --runInBand --forceExit (cross-env suite may skip locally by platform guard); git diff --check for touched fixture files.\nNot-tested: Full GitHub Actions matrix after this refresh.\nCo-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f72b035272

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/crawl.ts
const probe = dequeueItem();
if (!probe) break; // Queue is truly empty
// If it's beyond depth, we're done
if (probe.depth > maxDepth) break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Continue scanning queue before ending best-first crawl

When strategy="best_first", this break assumes that seeing one over-depth probe means no in-scope work remains, but score ordering can place multiple depth > max_depth URLs ahead of valid shallower URLs. With concurrency=1 (or a small batch), the loop can skip one deep item in batch collection, hit another deep probe here, and terminate early while crawlable URLs are still queued, causing missed pages within the configured depth limit.

Useful? React with 👍 / 👎.

shaun0927 and others added 2 commits May 13, 2026 09:10
Constraint: Pull request 1065 only conflicted in tests/cli/admin-keys.test.ts after develop absorbed schema lint and server changes.
Rejected: Resolve product-code conflicts | This refresh is limited to equivalent admin-key stdout assertion logic.
Confidence: high
Scope-risk: narrow
Directive: Keep plaintext-token assertions tolerant of unrelated Jest worker stdout noise.
Tested: npm test -- --runInBand tests/cli/admin-keys.test.ts --forceExit
Not-tested: Full CI matrix not run locally; GitHub Actions will rerun after push.
Co-authored-by: OmX <omx@oh-my-codex.dev>
Resolve the develop merge for the best-first PR and keep the admin-key assertion focused on the generated plaintext token. The schema-lint pipeline is hardened so large tool manifests are fully flushed and read through pipes.

Constraint: PR #1065 CI failed on lint:tool-schemas due truncated piped JSON and needed an admin-key test conflict resolution.
Rejected: Increasing schema limits or updating the baseline | the manifest was valid; the failure came from pipe IO truncation, not a new schema violation.
Confidence: high
Scope-risk: moderate
Directive: Do not reintroduce process.exit immediately after large stdout writes in CLI introspection paths.
Tested: npm ci; npm run build; npm run lint; npm run lint:tier; npm run lint:tool-schemas; npx jest tests/cli/admin-keys.test.ts --runInBand --forceExit; git diff --check
Not-tested: Full multi-OS GitHub Actions matrix before push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Merge the latest develop crawl-tool additions into PR #1065, preserve its cross-environment test hardening, and make the new crawl tool schemas comply with required-prefix linting.

Constraint: develop advanced after the first push and introduced conflicts in cross-env/domain-memory/console-capture tests plus crawl schema lint requirements.
Rejected: Exact Tier 1 tool-count assertions | Tier 1 grows as tools graduate and the visibility contract is better covered by core presence checks.
Confidence: high
Scope-risk: moderate
Directive: Avoid brittle exact tool counts in cross-environment discovery tests.
Tested: npm run build; npm run lint:tool-schemas; npx jest tests/cli/admin-keys.test.ts tests/cross-env/cursor-verification.test.ts tests/tools/console-capture-regression.test.ts --runInBand --forceExit; git diff --check
Not-tested: Full multi-OS GitHub Actions matrix before push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a84903025

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +151 to +153
if (options.sameDepthBias && Number.isFinite(options.sameDepthBias)) {
score += options.sameDepthBias;
reasons.push(`bias:${options.sameDepthBias}`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make same_depth_bias affect relative URL ranking

same_depth_bias is currently added as a constant to every scored URL, so it never changes ordering in strategy: "best_first" (the queue sort uses score differences, and a uniform offset cancels out). This makes a documented scoring hint effectively inert for traversal decisions, so callers cannot actually tune depth preference with this field.

Useful? React with 👍 / 👎.

shaun0927 and others added 2 commits May 13, 2026 09:59
Make the HTTP bearer auth regression allocate its port range from the Jest process id before using per-case ports, avoiding collisions across parallel worktrees and CI shards.

Constraint: the best-first PR now includes the shared transport suite after develop merges and should not inherit fixed-port flakes.
Rejected: Relying on sequential suite order | the test must remain robust under parallel verification.
Confidence: high
Scope-risk: narrow
Directive: Keep transport tests independent of globally fixed loopback ports.
Tested: npx jest tests/transports/http-bearer-auth.test.ts --runInBand --forceExit
Not-tested: Full multi-OS GitHub Actions matrix before push.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Make the admin-keys JSON helper scan for a parseable JSON array instead of slicing from the first bracket, so Jest worker log prefixes cannot corrupt list --json assertions.

Constraint: Ubuntu 18 full-suite CI captured bracketed worker noise before the CLI JSON payload.
Rejected: Assuming stdout starts with JSON | the in-process harness intentionally shares stdout with Jest capture hooks.
Confidence: high
Scope-risk: narrow
Directive: CLI tests that tolerate worker noise should locate the actual machine-readable payload, not rely on stream position.
Tested: npx jest tests/cli/admin-keys.test.ts --runInBand --forceExit; git diff --check
Not-tested: Full multi-OS GitHub Actions matrix after this commit.

Co-authored-by: OmX <omx@oh-my-codex.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 370e51f7df

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +151 to +153
if (options.sameDepthBias && Number.isFinite(options.sameDepthBias)) {
score += options.sameDepthBias;
reasons.push(`bias:${options.sameDepthBias}`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make same_depth_bias influence best-first ordering

same_depth_bias is added as a constant for every candidate URL, so it cancels out in pairwise comparisons and cannot change crawl priority at all. In strategy: "best_first", users can set this option expecting different traversal, but the ordering is identical regardless of the value because every score is shifted equally. Apply the bias conditionally (for the intended depth relation) or remove/explain the option to avoid a no-op tuning knob.

Useful? React with 👍 / 👎.

Comment thread src/tools/crawl.ts Outdated
Comment on lines +573 to +575
for (const entry of entries) {
const item = makeQueueItem(entry);
if (tracker.hasVisited(item.url) || bestFirstQueued.has(item.url)) continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Count scored_urls only for accepted queue entries

scored_urls is incremented before duplicate filtering, so links already visited or already queued still increase the summary counter. On pages with repeated nav/footer links, this can significantly over-report how many URLs were actually considered for crawling, making the new scoring diagnostics misleading. Increment after the hasVisited/bestFirstQueued checks so the metric reflects unique queued candidates.

Useful? React with 👍 / 👎.

Keep the shallower queued entry when best-first discovers the same URL at multiple depths, so an early deep discovery cannot cause a later in-scope discovery to be skipped by max_depth filtering. A depth tie-breaker also preserves shallow preference when scores match.\n\nConstraint: best_first sorting is score-based, so queue de-duplication must not discard a shallower path before max_depth checks run.\nRejected: Pure URL set de-duplication | can leave only an over-depth candidate for an otherwise crawlable URL.\nConfidence: high\nScope-risk: narrow\nDirective: Best-first queue bookkeeping should retain the minimum discovered depth for each queued URL.\nTested: npm run build; npx jest tests/core/tools/crawl.engine.test.ts tests/core/tools/crawl.legacy-snapshot.test.ts tests/core/crawl/url-scorer.test.ts --runInBand --forceExit; npm run lint:tool-schemas; git diff --check\nNot-tested: Full CI matrix locally.\n\nCo-authored-by: OmX <omx@oh-my-codex.dev>
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 09f602555d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tools/crawl.ts
Comment on lines +90 to +94
strategy: {
type: 'string',
enum: ['bfs', 'best_first'],
description: 'Crawl traversal strategy. Default: bfs. best_first scores discovered URLs by query/url_score and visits highest-scoring URLs first.',
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate best-first options to crawl job APIs

crawl now advertises strategy/query/url_score, but the resumable path still drops them: crawl_start only persists the legacy fields in JobConfig (no strategy/scoring fields), and crawl_status advances from that stored config. In practice, callers that switch to crawl_start/crawl_status and pass strategy: "best_first" will silently get BFS ordering, which is a behavioral mismatch against the new crawl capability and makes async crawl results inconsistent with one-shot crawl for the same inputs.

Useful? React with 👍 / 👎.

@shaun0927 shaun0927 merged commit 212ef96 into develop May 13, 2026
17 of 18 checks passed
shaun0927 added a commit that referenced this pull request May 13, 2026
…ndows CI

Same fix applied to feat/983-best-first (#1065). The
'separate writer processes serialize read/merge/write updates' test
exceeds the default 10s jest deadline on hosted windows-22 runners
because ts-node startup × 8 children overruns the budget before the
assertions can run. Bump the per-test timeout to 60s.
shaun0927 added a commit that referenced this pull request May 13, 2026
* feat(core): ToolAnnotations on every tool (#867)

Adds the MCP-spec ToolAnnotations field (readOnlyHint, destructiveHint,
idempotentHint, openWorldHint) to every tool exposed by the MCP server.
Pure metadata — zero behavior change, no new dependencies, no native
modules, no outbound traffic. Compliant with Portability-Harness
Contract P1/P2/P3/P4/P5.

Approach (hybrid):
- src/types/tool-annotations.ts (NEW) — canonical TOOL_ANNOTATIONS
  table, single source of truth.
- MCPToolDefinition.annotations is a REQUIRED field — TypeScript
  enforces presence at compile time for every future tool.
- Each tool file declares `annotations: TOOL_ANNOTATIONS.<name>` (one
  import + one reference line) rather than duplicating literal values.
  70 definitions across 58 tool files + 2 pilot/handoff tools + the
  inline expand_tools virtual tool.

Worst-case rule (consistent across the table): annotations describe the
most destructive / least pure behavior across all valid inputs.
javascript_tool, batch_execute, act, network, and request_intercept
are all flagged destructive + open-world because their worst-case
inputs (arbitrary JS, batch dispatch, NL action routing, network
emulation/blocking) can mutate or block observable state.

Verification:
- npm run build, npm run lint, npm test pass.
- tests/unit/tool-annotations.test.ts (NEW, 9 cases) covers shape,
  destructive/read-only/open-world expectations, requireAnnotations,
  source-level injection check, and reverse-direction orphan check.
- Only pre-existing flaky stress timing test fails under parallel
  load; passes in isolation — unrelated to this change.

docs/mcp/tool-annotations.md mirrors the runtime table for human
review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(annotations): reclassify validate_page, batch_paginate, console_capture (codex P1/P2 from #945)

Codex flagged three tools incorrectly mapped to READ_ONLY:
- validate_page navigates via smartGoto, can create a tab, waits on live
  page state — open-world, mutating, non-idempotent.
- batch_paginate presses keys, clicks, scrolls; in 'url' mode creates tabs
  and navigates to generated URLs — mutating, non-idempotent, potentially
  open-world.
- console_capture supports start/stop/clear; clear deletes buffered logs —
  destructive, non-idempotent.

Updates docs/mcp/tool-annotations.md mirror to match.

Addresses Codex review feedback on PR #945.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Allow manifest entries to carry annotations

Update the worker tool manifest type so tests and generated manifests can include MCP ToolAnnotations added by the tool-annotation PR.

Constraint: Preserve existing manifest filtering behavior and make annotations optional for older manifests.

Rejected: Removing annotations from manifest tests | workers should be able to receive annotation metadata when present.

Confidence: high

Scope-risk: narrow

Directive: Keep tool manifest typing aligned with MCPToolDefinition when new shared fields are surfaced.

Tested: /Users/jh0927/openchrome/node_modules/.bin/tsc -p tsconfig.json --pretty false; npx jest --config jest.config.js --runInBand tests/orchestration/tool-manifest.test.ts tests/unit/tool-annotations.test.ts

Not-tested: Full GitHub Actions matrix before push

* Annotate develop-era tools for MCP safety hints

Constraint: PR #945 now requires every MCP tool definition, including tools merged after the original branch, to carry canonical annotations.\nRejected: Making annotations optional for newer tools | that would weaken the issue #867 contract and hide future omissions.\nConfidence: high\nScope-risk: narrow\nDirective: Add TOOL_ANNOTATIONS and docs rows whenever a new tool definition is registered.\nTested: npx tsc -p tsconfig.json --pretty false; npx jest --config jest.config.js --runInBand tests/orchestration/tool-manifest.test.ts tests/unit/tool-annotations.test.ts\nNot-tested: full CI matrix

* Align Cursor tier-one count with current tool surface

Constraint: CI runs the Cursor cross-env suite with OPENCHROME_RUN_CROSS_ENV=1, where develop now exposes 46 non-expand tier-one tools.\nRejected: Reverting new tier-one registrations | those came from current develop and are outside PR #945's scope.\nConfidence: high\nScope-risk: narrow\nDirective: Update this count when intentional tier-one tool exposure changes.\nTested: npx jest --config jest.config.js --runInBand tests/cross-env/cursor-verification.test.ts (skipped locally without cross-env flag)\nNot-tested: full CI matrix

* Stabilize CI baselines for current tool surface

Constraint: This branch inherits develop's 45-tool tier-one surface and Windows CI checks out fixtures with CRLF.\nRejected: Reverting develop tool registrations | outside PR #945 scope and would reintroduce merge drift.\nConfidence: high\nScope-risk: narrow\nDirective: Keep Cursor tier-one counts branch-specific and normalize fixture newlines for generated JSON.\nTested: npx jest --config jest.config.js --runInBand tests/tools/console-capture-regression.test.ts tests/cross-env/cursor-verification.test.ts\nNot-tested: full CI matrix

* Harden admin key stdout assertion against worker noise

Constraint: Windows CI can interleave unrelated Jest worker output into the in-process stdout hook.\nRejected: Requiring stdout to contain only one physical line | that assertion flakes when shared process output is polluted outside the CLI path.\nConfidence: high\nScope-risk: narrow\nDirective: Assert the secret token appears exactly once and never in stderr; do not depend on unrelated captured stdout being empty.\nTested: npx jest --config jest.config.js --runInBand tests/cli/admin-keys.test.ts\nNot-tested: full CI matrix

* fix(annotations): reclassify execute_plan, workflow_init, memory

Codex P1/P2 findings on PR #945:
- execute_plan: open-world AND destructive (resolves arbitrary tool names;
  worst-case dispatch set includes destructive tools + non-loopback nav)
- workflow_init: open-world (dnsResolve + worker page.goto to non-loopback)
- memory: destructive (validate-prune path deletes memory entries)

Mirrors the docs table in docs/mcp/tool-annotations.md.

* feat(annotations): annotate develop-era tools merged after this PR

PR #945 enforces ToolAnnotations on every MCPToolDefinition by typing
`annotations` as required, but the tool surface on develop has grown by
15 entries since the PR was branched — crawl_*, oc_reflect, oc_task_*,
oc_skill_replay, oc_run_*, and the dynamic-skill synthesizer. The tsc
build accordingly fails on all 9 OS/Node combos with TS2741.

Add catalog entries to TOOL_ANNOTATIONS and wire each new tool's
definition to its annotation:
- crawl_start → OPEN_WORLD; crawl_cancel → DESTRUCTIVE; crawl_status → READ_ONLY
- oc_reflect, oc_task_start, oc_task_cancel/get/list/wait classified by worst-case input
- oc_skill_replay → OPEN_WORLD_DESTRUCTIVE (matches execute_plan envelope)
- oc_run_start (MUTATES), oc_run_finish (DESTRUCTIVE), status/events (READ_ONLY)
- Synthesized skill-replay tools inline OPEN_WORLD_DESTRUCTIVE since they
  may navigate to arbitrary origins under the replay engine.

Restores green CI on all 9 OS/Node combos.

* fix(945): add annotations for develop-era tools

* fix(945): include run-harness dir in TOOL_ANNOTATIONS reverse test

* fix(945): restore develop's recording.ts incl. oc_recording_status

* fix(945): add annotations to recording tools incl. oc_recording_status

* test(annotations): sync expected counts and dynamic-skill fixture for develop merges

Two follow-ups after merging develop into this branch:

1. tests/unit/tool-annotations.test.ts now expects recording.ts to have
   5 annotated tools (oc_recording_status landed on develop after this
   PR branched). Bump the spot-check expectation from 4 → 5.

2. tests/pilot/dynamic-skills/registry.test.ts builds a synthetic
   RegistryEntry whose `definition` is typed as MCPToolDefinition. After
   this PR makes `annotations` required on the type, the literal in
   makeEntry() must carry an annotations field too — add the same
   OPEN_WORLD_DESTRUCTIVE shape the synthesizer emits.

* feat(annotations): annotate oc_task_run_* tools landed from develop

PR #1083 (oc_task_run_* lifecycle) just merged into develop. The PR
merge commit now drags src/tools/task-run.ts onto this branch and
tsc fails with TS2741 on 7 tools missing the required annotations
field.

Add annotations to each of the 7 oc_task_run_* tool definitions and
catalog them in TOOL_ANNOTATIONS:
- start/update/checkpoint/needs_help → MUTATES
- complete                          → DESTRUCTIVE (terminal)
- get/list                          → READ_ONLY

* test(pilot/curator): bump cross-process serialize test timeout for Windows CI

Same fix applied to feat/983-best-first (#1065). The
'separate writer processes serialize read/merge/write updates' test
exceeds the default 10s jest deadline on hosted windows-22 runners
because ts-node startup × 8 children overruns the budget before the
assertions can run. Bump the per-test timeout to 60s.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: OmX <omx@oh-my-codex.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant