feat(core): best-first URL scoring strategy for crawl workloads (#983)#1065
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 84c4041eaa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| const explicitKeywords = (options.keywords || []).map(normalizeTerm).filter(Boolean); | ||
| const terms = Array.from(new Set([...queryTerms(options.query), ...explicitKeywords])); | ||
| const haystack = `${decodeURIComponent(parsed.pathname)} ${parsed.searchParams.toString()}`.toLowerCase(); |
There was a problem hiding this comment.
Handle malformed percent-encoding before scoring URL
scoreUrl calls decodeURIComponent(parsed.pathname) without a guard, but new URL(...) accepts paths like /%zz and decodeURIComponent then throws URI malformed. In strategy: "best_first", that exception bubbles out of enqueueItems/crawl loop and aborts the entire crawl instead of just assigning a low score to the bad URL. This makes crawl reliability depend on whether any discovered link has invalid percent-encoding.
Useful? React with 👍 / 👎.
Adds an opt-in best_first crawl strategy with deterministic URL scoring so targeted crawls spend page budget on more relevant URLs first while preserving legacy BFS behavior by default.\n\nConstraint: Must not duplicate open crawl-job/static-fetch PRs or change default crawl output.\nRejected: Content/LLM ranking | would expand scope and conflict with OpenChrome core boundaries.\nConfidence: high\nScope-risk: narrow\nDirective: Keep future scorer changes deterministic and fixture-gated; do not make public live-site ordering a CI gate.\nTested: npm run build -- --pretty false; npm test -- --runInBand tests/core/crawl/url-scorer.test.ts tests/core/tools/crawl.engine.test.ts; npm run lint:changed\nNot-tested: Full npm test suite; live OpenChrome crawl smoke.
84c4041 to
ef02fb5
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ef02fb5dc2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (probe.depth > maxDepth) break; | ||
| // Otherwise put it back and try again — shouldn't happen but be safe | ||
| tracker.enqueue([probe]); | ||
| enqueueItems([probe]); | ||
| break; |
There was a problem hiding this comment.
Keep crawling after re-queuing an in-depth probe item
In strategy: "best_first", the queue is score-sorted rather than depth-ordered, so it can contain high-score URLs beyond max_depth ahead of valid shallower URLs. When the batch is empty, this block re-queues a valid probe (probe.depth <= maxDepth) and then immediately breaks the outer crawl loop, which stops crawling early and can leave crawlable pages unvisited (e.g., concurrency=1, one depth-2 high-score URL ahead of remaining depth-1 URLs).
Useful? React with 👍 / 👎.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Guard URL scoring against malformed percent-encoded paths and keep the best-first crawl loop from stopping after re-queueing valid work.\n\nConstraint: PR #1065 must preserve the default BFS output and only affect opt-in strategy=best_first behavior.\nRejected: Dropping malformed URLs entirely | safe scoring keeps crawl progress deterministic without treating parseable URLs as invalid.\nConfidence: high\nScope-risk: narrow\nDirective: Keep future best-first queue changes tolerant of score-sorted depth ordering.\nTested: npm ci; npm run lint:changed; npm run lint:tier; npx jest --runInBand tests/core/crawl/url-scorer.test.ts tests/core/tools/crawl.engine.test.ts\nNot-tested: Full hosted CI matrix pending after push.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e0bb8830b5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
| for (const entry of entries) { | ||
| const item = makeQueueItem(entry); | ||
| if (tracker.hasVisited(item.url) || bestFirstQueued.has(item.url)) continue; |
There was a problem hiding this comment.
Preserve shallower duplicate URLs in best-first queue
When strategy="best_first", the queue dedupes purely by URL and drops any rediscovery of the same URL (bestFirstQueued.has(item.url)), even if the new discovery has a smaller depth. This can cause reachable pages to be skipped under max_depth: if a URL is first queued at depth 3, then later found at depth 2 while still queued, the depth-2 entry is discarded and the only remaining entry may be filtered out by next.depth > maxDepth. In that case the crawler misses an in-scope page that should have been crawled.
Useful? React with 👍 / 👎.
Merging develop brings the shared CI test fixes into the PR branch without changing the best-first feature scope. Constraint: PR #1065 targets develop and its prior checks ran against a stale red baseline. Rejected: Reimplementing shared CI fixes in this branch | merging develop preserves one source of truth. Confidence: high Scope-risk: narrow Directive: Keep this branch focused on best-first crawl behavior after the baseline refresh. Tested: npm run build; npm run lint:changed; npm run lint:tier; npx jest --runInBand tests/core/crawl/url-scorer.test.ts tests/core/tools/crawl.engine.test.ts Not-tested: Full hosted CI matrix pending after push. Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cbbf4af908
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| for (const prefix of options.preferPaths || []) { | ||
| const normalized = normalizePathPrefix(prefix); | ||
| if (normalized && parsed.pathname.toLowerCase().startsWith(normalized)) { |
There was a problem hiding this comment.
Match path hints on segment boundaries
Using startsWith for prefer_paths/exclude_paths makes /blog also match unrelated paths like /blogging or /blog-roll, so best-first can boost or suppress the wrong URLs on real sites with similarly prefixed routes. This changes crawl ordering and can spend the page budget on less relevant pages even when users provide precise path hints.
Useful? React with 👍 / 👎.
| const scored = scoreUrl(normalized, entry.depth, scoringOptions); | ||
| item.score = scored.score; | ||
| item.score_reasons = scored.reasons; | ||
| scoredUrls++; |
There was a problem hiding this comment.
Count scored URLs only after queue deduplication
scored_urls is incremented inside makeQueueItem before checking hasVisited/bestFirstQueued, so duplicate discoveries inflate the summary counter even though those URLs are skipped and never enter the queue. In crawls with repeated nav links this can significantly overstate scoring activity and make strategy metrics misleading.
Useful? React with 👍 / 👎.
The Cursor smoke test should prove core tool visibility without failing every develop-based PR when additional Tier 1 tools graduate. Constraint: Hosted CI currently reports 45 Tier 1 tools while the test hard-coded 39. Rejected: Raising the exact expected count to 45 | that would recreate the same brittle failure on the next Tier 1 graduation. Confidence: high Scope-risk: narrow Directive: Assert required core tools explicitly and avoid exact catalog-size checks in cross-client smoke tests. Tested: npx jest --runInBand tests/cross-env/cursor-verification.test.ts Not-tested: Linux Node 20/22 local execution; hosted CI pending after push. Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5d404561ee
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (options.sameDepthBias && Number.isFinite(options.sameDepthBias)) { | ||
| score += options.sameDepthBias; | ||
| reasons.push(`bias:${options.sameDepthBias}`); |
There was a problem hiding this comment.
Apply same_depth_bias conditionally by candidate depth
same_depth_bias is currently added to every scored URL unconditionally, so it shifts all scores by the same constant and does not change best_first ordering at all. In practice, users who provide this option get different numeric score values but no traversal effect, which makes the knob effectively non-functional for crawl prioritization.
Useful? React with 👍 / 👎.
The console_capture regression should compare response shape, not fail Windows CI because checked-out JSON fixtures use CRLF line endings. Constraint: Windows CI reads the baseline fixture with CRLF while JSON.stringify emits LF. Rejected: Updating the fixture to Windows line endings | that would break Unix runners and still encode platform formatting into the assertion. Confidence: high Scope-risk: narrow Directive: Normalize platform line endings before snapshot-style string comparisons. Tested: npx jest --runInBand tests/tools/console-capture-regression.test.ts Not-tested: Full hosted CI matrix pending after push. Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ac3de5d1bf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (tracker.hasVisited(item.url) || bestFirstQueued.has(item.url)) continue; | ||
| bestFirstQueued.add(item.url); |
There was a problem hiding this comment.
Prefer shallower depth when de-duplicating queued URLs
When strategy='best_first', enqueueItems drops a rediscovered URL if it is already in bestFirstQueued, regardless of depth. This can suppress a valid shallow path: if a URL is first queued at depth 3, then later rediscovered at depth 1, the depth-1 entry is discarded, and if max_depth is 1 or 2 the depth-3 entry is skipped and the page is never crawled even though it is reachable within the configured depth limit. Please update queued de-duplication to retain (or replace with) the minimum discovered depth per URL.
Useful? React with 👍 / 👎.
The health endpoint integration test should treat both code=0 and signal=SIGTERM as clean shutdown outcomes after sending SIGTERM. Constraint: CI can report SIGTERM exits as code=null even when teardown is graceful. Rejected: Extending timeouts | the failure is the accepted exit shape, not startup or shutdown latency. Confidence: high Scope-risk: narrow Directive: Avoid platform-specific assumptions for Node ChildProcess signal exit reporting. Tested: npx jest --runInBand tests/integration/health-endpoint-gating.test.ts Not-tested: Full hosted CI matrix pending after push. Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d7d6f8650a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (options.sameDepthBias && Number.isFinite(options.sameDepthBias)) { | ||
| score += options.sameDepthBias; | ||
| reasons.push(`bias:${options.sameDepthBias}`); |
There was a problem hiding this comment.
Make
same_depth_bias affect ranking
same_depth_bias is currently added as a constant to every scored URL, so it does not change relative scores and therefore cannot influence best_first crawl ordering at all. In crawl, URLs are prioritized strictly by score, so this option is effectively a no-op for its intended purpose (changing traversal priority) and users who set it will see no behavioral difference beyond score labels.
Useful? React with 👍 / 👎.
The health endpoint integration test should allow the same 30s shutdown window on Linux that it already allowed on Windows under hosted CI load. Constraint: Ubuntu CI observed a graceful SIGTERM path that exceeded the previous 10s non-Windows limit. Rejected: Skipping the scenario | the endpoint gating behavior remains important and passes with a realistic teardown window. Confidence: high Scope-risk: narrow Directive: Keep process-spawning integration tests tolerant of hosted runner teardown latency. Tested: npx jest --runInBand tests/integration/health-endpoint-gating.test.ts Not-tested: Full hosted CI matrix pending after push. Co-authored-by: OmX <omx@oh-my-codex.dev>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
The admin key test should verify exactly one plaintext token without failing when Jest worker console noise is captured around the in-process CLI stdout hook. Constraint: Windows CI can interleave decorated Jest console output into the captured stdout buffer. Rejected: Requiring stdout to have exactly one line | the helper already documents and handles shared stdout hook noise. Confidence: high Scope-risk: narrow Directive: Assert secret emission by token occurrence, not by total captured line count. Tested: npx jest --runInBand tests/cli/admin-keys.test.ts Not-tested: Full hosted CI matrix pending after push. Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b051df8593
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (options.sameDepthBias && Number.isFinite(options.sameDepthBias)) { | ||
| score += options.sameDepthBias; | ||
| reasons.push(`bias:${options.sameDepthBias}`); |
There was a problem hiding this comment.
Make same_depth_bias affect relative URL ordering
same_depth_bias is currently added as a flat constant to every scored URL, so it cancels out in all pairwise comparisons and cannot change which pages are crawled first in strategy="best_first". In practice, callers who set url_score.same_depth_bias get different score values but identical traversal order, which makes this advertised scoring hint ineffective for crawl prioritization.
Useful? React with 👍 / 👎.
The storage-size E2E should measure complete persisted JSON states rather than a zero-byte file observed while the async writer is truncating and rewriting. Constraint: Windows CI can take longer than a fixed sleep to flush DomainMemory writes. Rejected: Increasing the blind sleep | state-based polling proves the needed persisted state and avoids new timing flakes. Confidence: high Scope-risk: narrow Directive: Prefer state-based waits over fixed sleeps for async persistence tests. Tested: npx jest --runInBand tests/e2e/scenarios/domain-memory-persistence.test.ts Not-tested: Full hosted CI matrix pending after push. Co-authored-by: OmX <omx@oh-my-codex.dev>
|
Codex Review: Didn't find any major issues. Hooray! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Merge latest develop into PR #1065 while preserving the branch's non-brittle Tier 1 visibility smoke test. Constraint: PR #962 changed shared transport files and CI fixtures after this branch was opened.\nRejected: Restoring an exact Tier 1 count | this branch intentionally allows the Tier 1 surface to grow as core tools graduate.\nConfidence: high\nScope-risk: narrow\nDirective: Keep fixture refreshes semantically faithful to the feature branch instead of overwriting lower-bound guards with develop's exact-count snapshot.\nTested: npx jest tests/tools/console-capture-regression.test.ts tests/cross-env/cursor-verification.test.ts --runInBand --forceExit (cross-env suite may skip locally by platform guard); git diff --check for touched fixture files.\nNot-tested: Full GitHub Actions matrix after this refresh.\nCo-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f72b035272
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const probe = dequeueItem(); | ||
| if (!probe) break; // Queue is truly empty | ||
| // If it's beyond depth, we're done | ||
| if (probe.depth > maxDepth) break; |
There was a problem hiding this comment.
Continue scanning queue before ending best-first crawl
When strategy="best_first", this break assumes that seeing one over-depth probe means no in-scope work remains, but score ordering can place multiple depth > max_depth URLs ahead of valid shallower URLs. With concurrency=1 (or a small batch), the loop can skip one deep item in batch collection, hit another deep probe here, and terminate early while crawlable URLs are still queued, causing missed pages within the configured depth limit.
Useful? React with 👍 / 👎.
Constraint: Pull request 1065 only conflicted in tests/cli/admin-keys.test.ts after develop absorbed schema lint and server changes. Rejected: Resolve product-code conflicts | This refresh is limited to equivalent admin-key stdout assertion logic. Confidence: high Scope-risk: narrow Directive: Keep plaintext-token assertions tolerant of unrelated Jest worker stdout noise. Tested: npm test -- --runInBand tests/cli/admin-keys.test.ts --forceExit Not-tested: Full CI matrix not run locally; GitHub Actions will rerun after push. Co-authored-by: OmX <omx@oh-my-codex.dev>
Resolve the develop merge for the best-first PR and keep the admin-key assertion focused on the generated plaintext token. The schema-lint pipeline is hardened so large tool manifests are fully flushed and read through pipes. Constraint: PR #1065 CI failed on lint:tool-schemas due truncated piped JSON and needed an admin-key test conflict resolution. Rejected: Increasing schema limits or updating the baseline | the manifest was valid; the failure came from pipe IO truncation, not a new schema violation. Confidence: high Scope-risk: moderate Directive: Do not reintroduce process.exit immediately after large stdout writes in CLI introspection paths. Tested: npm ci; npm run build; npm run lint; npm run lint:tier; npm run lint:tool-schemas; npx jest tests/cli/admin-keys.test.ts --runInBand --forceExit; git diff --check Not-tested: Full multi-OS GitHub Actions matrix before push. Co-authored-by: OmX <omx@oh-my-codex.dev>
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Merge the latest develop crawl-tool additions into PR #1065, preserve its cross-environment test hardening, and make the new crawl tool schemas comply with required-prefix linting. Constraint: develop advanced after the first push and introduced conflicts in cross-env/domain-memory/console-capture tests plus crawl schema lint requirements. Rejected: Exact Tier 1 tool-count assertions | Tier 1 grows as tools graduate and the visibility contract is better covered by core presence checks. Confidence: high Scope-risk: moderate Directive: Avoid brittle exact tool counts in cross-environment discovery tests. Tested: npm run build; npm run lint:tool-schemas; npx jest tests/cli/admin-keys.test.ts tests/cross-env/cursor-verification.test.ts tests/tools/console-capture-regression.test.ts --runInBand --forceExit; git diff --check Not-tested: Full multi-OS GitHub Actions matrix before push. Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3a84903025
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (options.sameDepthBias && Number.isFinite(options.sameDepthBias)) { | ||
| score += options.sameDepthBias; | ||
| reasons.push(`bias:${options.sameDepthBias}`); |
There was a problem hiding this comment.
Make same_depth_bias affect relative URL ranking
same_depth_bias is currently added as a constant to every scored URL, so it never changes ordering in strategy: "best_first" (the queue sort uses score differences, and a uniform offset cancels out). This makes a documented scoring hint effectively inert for traversal decisions, so callers cannot actually tune depth preference with this field.
Useful? React with 👍 / 👎.
Make the HTTP bearer auth regression allocate its port range from the Jest process id before using per-case ports, avoiding collisions across parallel worktrees and CI shards. Constraint: the best-first PR now includes the shared transport suite after develop merges and should not inherit fixed-port flakes. Rejected: Relying on sequential suite order | the test must remain robust under parallel verification. Confidence: high Scope-risk: narrow Directive: Keep transport tests independent of globally fixed loopback ports. Tested: npx jest tests/transports/http-bearer-auth.test.ts --runInBand --forceExit Not-tested: Full multi-OS GitHub Actions matrix before push. Co-authored-by: OmX <omx@oh-my-codex.dev>
Make the admin-keys JSON helper scan for a parseable JSON array instead of slicing from the first bracket, so Jest worker log prefixes cannot corrupt list --json assertions. Constraint: Ubuntu 18 full-suite CI captured bracketed worker noise before the CLI JSON payload. Rejected: Assuming stdout starts with JSON | the in-process harness intentionally shares stdout with Jest capture hooks. Confidence: high Scope-risk: narrow Directive: CLI tests that tolerate worker noise should locate the actual machine-readable payload, not rely on stream position. Tested: npx jest tests/cli/admin-keys.test.ts --runInBand --forceExit; git diff --check Not-tested: Full multi-OS GitHub Actions matrix after this commit. Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 370e51f7df
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (options.sameDepthBias && Number.isFinite(options.sameDepthBias)) { | ||
| score += options.sameDepthBias; | ||
| reasons.push(`bias:${options.sameDepthBias}`); |
There was a problem hiding this comment.
Make same_depth_bias influence best-first ordering
same_depth_bias is added as a constant for every candidate URL, so it cancels out in pairwise comparisons and cannot change crawl priority at all. In strategy: "best_first", users can set this option expecting different traversal, but the ordering is identical regardless of the value because every score is shifted equally. Apply the bias conditionally (for the intended depth relation) or remove/explain the option to avoid a no-op tuning knob.
Useful? React with 👍 / 👎.
| for (const entry of entries) { | ||
| const item = makeQueueItem(entry); | ||
| if (tracker.hasVisited(item.url) || bestFirstQueued.has(item.url)) continue; |
There was a problem hiding this comment.
Count scored_urls only for accepted queue entries
scored_urls is incremented before duplicate filtering, so links already visited or already queued still increase the summary counter. On pages with repeated nav/footer links, this can significantly over-report how many URLs were actually considered for crawling, making the new scoring diagnostics misleading. Increment after the hasVisited/bestFirstQueued checks so the metric reflects unique queued candidates.
Useful? React with 👍 / 👎.
Keep the shallower queued entry when best-first discovers the same URL at multiple depths, so an early deep discovery cannot cause a later in-scope discovery to be skipped by max_depth filtering. A depth tie-breaker also preserves shallow preference when scores match.\n\nConstraint: best_first sorting is score-based, so queue de-duplication must not discard a shallower path before max_depth checks run.\nRejected: Pure URL set de-duplication | can leave only an over-depth candidate for an otherwise crawlable URL.\nConfidence: high\nScope-risk: narrow\nDirective: Best-first queue bookkeeping should retain the minimum discovered depth for each queued URL.\nTested: npm run build; npx jest tests/core/tools/crawl.engine.test.ts tests/core/tools/crawl.legacy-snapshot.test.ts tests/core/crawl/url-scorer.test.ts --runInBand --forceExit; npm run lint:tool-schemas; git diff --check\nNot-tested: Full CI matrix locally.\n\nCo-authored-by: OmX <omx@oh-my-codex.dev>
|
Codex Review: Didn't find any major issues. Can't wait for the next one! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 09f602555d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| strategy: { | ||
| type: 'string', | ||
| enum: ['bfs', 'best_first'], | ||
| description: 'Crawl traversal strategy. Default: bfs. best_first scores discovered URLs by query/url_score and visits highest-scoring URLs first.', | ||
| }, |
There was a problem hiding this comment.
Propagate best-first options to crawl job APIs
crawl now advertises strategy/query/url_score, but the resumable path still drops them: crawl_start only persists the legacy fields in JobConfig (no strategy/scoring fields), and crawl_status advances from that stored config. In practice, callers that switch to crawl_start/crawl_status and pass strategy: "best_first" will silently get BFS ordering, which is a behavioral mismatch against the new crawl capability and makes async crawl results inconsistent with one-shot crawl for the same inputs.
Useful? React with 👍 / 👎.
…ndows CI Same fix applied to feat/983-best-first (#1065). The 'separate writer processes serialize read/merge/write updates' test exceeds the default 10s jest deadline on hosted windows-22 runners because ts-node startup × 8 children overruns the budget before the assertions can run. Bump the per-test timeout to 60s.
* feat(core): ToolAnnotations on every tool (#867) Adds the MCP-spec ToolAnnotations field (readOnlyHint, destructiveHint, idempotentHint, openWorldHint) to every tool exposed by the MCP server. Pure metadata — zero behavior change, no new dependencies, no native modules, no outbound traffic. Compliant with Portability-Harness Contract P1/P2/P3/P4/P5. Approach (hybrid): - src/types/tool-annotations.ts (NEW) — canonical TOOL_ANNOTATIONS table, single source of truth. - MCPToolDefinition.annotations is a REQUIRED field — TypeScript enforces presence at compile time for every future tool. - Each tool file declares `annotations: TOOL_ANNOTATIONS.<name>` (one import + one reference line) rather than duplicating literal values. 70 definitions across 58 tool files + 2 pilot/handoff tools + the inline expand_tools virtual tool. Worst-case rule (consistent across the table): annotations describe the most destructive / least pure behavior across all valid inputs. javascript_tool, batch_execute, act, network, and request_intercept are all flagged destructive + open-world because their worst-case inputs (arbitrary JS, batch dispatch, NL action routing, network emulation/blocking) can mutate or block observable state. Verification: - npm run build, npm run lint, npm test pass. - tests/unit/tool-annotations.test.ts (NEW, 9 cases) covers shape, destructive/read-only/open-world expectations, requireAnnotations, source-level injection check, and reverse-direction orphan check. - Only pre-existing flaky stress timing test fails under parallel load; passes in isolation — unrelated to this change. docs/mcp/tool-annotations.md mirrors the runtime table for human review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(annotations): reclassify validate_page, batch_paginate, console_capture (codex P1/P2 from #945) Codex flagged three tools incorrectly mapped to READ_ONLY: - validate_page navigates via smartGoto, can create a tab, waits on live page state — open-world, mutating, non-idempotent. - batch_paginate presses keys, clicks, scrolls; in 'url' mode creates tabs and navigates to generated URLs — mutating, non-idempotent, potentially open-world. - console_capture supports start/stop/clear; clear deletes buffered logs — destructive, non-idempotent. Updates docs/mcp/tool-annotations.md mirror to match. Addresses Codex review feedback on PR #945. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Allow manifest entries to carry annotations Update the worker tool manifest type so tests and generated manifests can include MCP ToolAnnotations added by the tool-annotation PR. Constraint: Preserve existing manifest filtering behavior and make annotations optional for older manifests. Rejected: Removing annotations from manifest tests | workers should be able to receive annotation metadata when present. Confidence: high Scope-risk: narrow Directive: Keep tool manifest typing aligned with MCPToolDefinition when new shared fields are surfaced. Tested: /Users/jh0927/openchrome/node_modules/.bin/tsc -p tsconfig.json --pretty false; npx jest --config jest.config.js --runInBand tests/orchestration/tool-manifest.test.ts tests/unit/tool-annotations.test.ts Not-tested: Full GitHub Actions matrix before push * Annotate develop-era tools for MCP safety hints Constraint: PR #945 now requires every MCP tool definition, including tools merged after the original branch, to carry canonical annotations.\nRejected: Making annotations optional for newer tools | that would weaken the issue #867 contract and hide future omissions.\nConfidence: high\nScope-risk: narrow\nDirective: Add TOOL_ANNOTATIONS and docs rows whenever a new tool definition is registered.\nTested: npx tsc -p tsconfig.json --pretty false; npx jest --config jest.config.js --runInBand tests/orchestration/tool-manifest.test.ts tests/unit/tool-annotations.test.ts\nNot-tested: full CI matrix * Align Cursor tier-one count with current tool surface Constraint: CI runs the Cursor cross-env suite with OPENCHROME_RUN_CROSS_ENV=1, where develop now exposes 46 non-expand tier-one tools.\nRejected: Reverting new tier-one registrations | those came from current develop and are outside PR #945's scope.\nConfidence: high\nScope-risk: narrow\nDirective: Update this count when intentional tier-one tool exposure changes.\nTested: npx jest --config jest.config.js --runInBand tests/cross-env/cursor-verification.test.ts (skipped locally without cross-env flag)\nNot-tested: full CI matrix * Stabilize CI baselines for current tool surface Constraint: This branch inherits develop's 45-tool tier-one surface and Windows CI checks out fixtures with CRLF.\nRejected: Reverting develop tool registrations | outside PR #945 scope and would reintroduce merge drift.\nConfidence: high\nScope-risk: narrow\nDirective: Keep Cursor tier-one counts branch-specific and normalize fixture newlines for generated JSON.\nTested: npx jest --config jest.config.js --runInBand tests/tools/console-capture-regression.test.ts tests/cross-env/cursor-verification.test.ts\nNot-tested: full CI matrix * Harden admin key stdout assertion against worker noise Constraint: Windows CI can interleave unrelated Jest worker output into the in-process stdout hook.\nRejected: Requiring stdout to contain only one physical line | that assertion flakes when shared process output is polluted outside the CLI path.\nConfidence: high\nScope-risk: narrow\nDirective: Assert the secret token appears exactly once and never in stderr; do not depend on unrelated captured stdout being empty.\nTested: npx jest --config jest.config.js --runInBand tests/cli/admin-keys.test.ts\nNot-tested: full CI matrix * fix(annotations): reclassify execute_plan, workflow_init, memory Codex P1/P2 findings on PR #945: - execute_plan: open-world AND destructive (resolves arbitrary tool names; worst-case dispatch set includes destructive tools + non-loopback nav) - workflow_init: open-world (dnsResolve + worker page.goto to non-loopback) - memory: destructive (validate-prune path deletes memory entries) Mirrors the docs table in docs/mcp/tool-annotations.md. * feat(annotations): annotate develop-era tools merged after this PR PR #945 enforces ToolAnnotations on every MCPToolDefinition by typing `annotations` as required, but the tool surface on develop has grown by 15 entries since the PR was branched — crawl_*, oc_reflect, oc_task_*, oc_skill_replay, oc_run_*, and the dynamic-skill synthesizer. The tsc build accordingly fails on all 9 OS/Node combos with TS2741. Add catalog entries to TOOL_ANNOTATIONS and wire each new tool's definition to its annotation: - crawl_start → OPEN_WORLD; crawl_cancel → DESTRUCTIVE; crawl_status → READ_ONLY - oc_reflect, oc_task_start, oc_task_cancel/get/list/wait classified by worst-case input - oc_skill_replay → OPEN_WORLD_DESTRUCTIVE (matches execute_plan envelope) - oc_run_start (MUTATES), oc_run_finish (DESTRUCTIVE), status/events (READ_ONLY) - Synthesized skill-replay tools inline OPEN_WORLD_DESTRUCTIVE since they may navigate to arbitrary origins under the replay engine. Restores green CI on all 9 OS/Node combos. * fix(945): add annotations for develop-era tools * fix(945): include run-harness dir in TOOL_ANNOTATIONS reverse test * fix(945): restore develop's recording.ts incl. oc_recording_status * fix(945): add annotations to recording tools incl. oc_recording_status * test(annotations): sync expected counts and dynamic-skill fixture for develop merges Two follow-ups after merging develop into this branch: 1. tests/unit/tool-annotations.test.ts now expects recording.ts to have 5 annotated tools (oc_recording_status landed on develop after this PR branched). Bump the spot-check expectation from 4 → 5. 2. tests/pilot/dynamic-skills/registry.test.ts builds a synthetic RegistryEntry whose `definition` is typed as MCPToolDefinition. After this PR makes `annotations` required on the type, the literal in makeEntry() must carry an annotations field too — add the same OPEN_WORLD_DESTRUCTIVE shape the synthesizer emits. * feat(annotations): annotate oc_task_run_* tools landed from develop PR #1083 (oc_task_run_* lifecycle) just merged into develop. The PR merge commit now drags src/tools/task-run.ts onto this branch and tsc fails with TS2741 on 7 tools missing the required annotations field. Add annotations to each of the 7 oc_task_run_* tool definitions and catalog them in TOOL_ANNOTATIONS: - start/update/checkpoint/needs_help → MUTATES - complete → DESTRUCTIVE (terminal) - get/list → READ_ONLY * test(pilot/curator): bump cross-process serialize test timeout for Windows CI Same fix applied to feat/983-best-first (#1065). The 'separate writer processes serialize read/merge/write updates' test exceeds the default 10s jest deadline on hosted windows-22 runners because ts-node startup × 8 children overruns the budget before the assertions can run. Bump the per-test timeout to 60s. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: OmX <omx@oh-my-codex.dev>
Progress / Review status
Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.
feat/983-best-first→develop09f6025— Preserve shallow best-first crawl candidatesOwner comment cleanup: 14 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.
Summary
strategy: "best_first"tocrawlwithout changing default BFS behavior.src/core/crawl/url-scorer.tsusing query terms, keywords, preferred/excluded paths, depth penalty, and low-signal path penalties.score/score_reasonsper page and summary scoring counters only when best-first is requested.Closes #983.
Direction / scope check
crawloutput remains free of strategy metadata unless the caller opts in.Verification
npm run build -- --pretty falsenpm test -- --runInBand tests/core/crawl/url-scorer.test.ts tests/core/tools/crawl.engine.test.tsnpm run lint:changedKnown gaps
npm run lint -- --max-warnings=0is blocked by pre-existing unrelated lint findings insrc/core/perception/semantic.ts,src/core/performance/insights/evaluators.ts,src/session-manager.ts,src/tools/connect.ts, andsrc/tools/fill-form.ts.