[pull] master from GaijinEntertainment:master#980
Merged
Conversation
…P tool Closes the false-positive blind spot in query_log. Today a row with match_count > 0 looks like a hit even if all returned cards just shared BM25 tokens with the question. The new `useful` column captures the caller's "none of these addressed my question" judgment. Default-positive bias means we never write useful=true; only the negative signal carries information. ## Schema (utils/mouse/index.das) - New v4 migration adds `useful` (Option<int>) to query_log via add_column. - v2 migration converted from `create_table(type<QueryLog>)` to raw SQL pinned to the v2-era schema. `create_table` regenerates DDL from the current struct, so as soon as v4 added a column the v2 fresh-DB path would create it too — and v4 would duplicate. Pinning v2's DDL decouples migration history from struct shape. - `log_query` now returns int64 (the inserted query_id). - New `mark_no_match(db, id)` helper. - New `LogMode` enum (All / Misses / Bad / Review). `recent_queries` now takes a mode parameter — small breaking change to the bool signature (3 callsites updated: 2 tests + cmd_log). ## Per-ask path (hot path) - `mouse__ask` response now begins with `query_id: N`. Same in CLI. - New MCP tool `mouse__bad(queryId)` and CLI `mouse bad <id>` mark a hit as no-match (UPDATE useful = 0). Idempotent. Distinct error exit codes (1 = usage, 2 = no such id) so shell callers can branch. - `tool_ask` ends with a hint reminding the agent: if NONE of the cards address the question, mouse__bad before moving on. ## Wrap-up path (safety net) - New MCP tool `mouse__log` exposes the existing `recent_queries` symmetrically to the CLI. New `--bad` and `--review` flags on `mouse log` alongside `--misses`. `--review` shows match_count > 0 AND useful IS NULL — the queue for retrospective rating. - `skills/task_wrap_up.md` Section 1 extended with the `--review` step: scan unrated hits, mark unhelpful ones via `mouse__bad`, mouse__add the real answer if this session has it. ## Skill / OVERVIEW updates - `skills/mouse.md`: new trigger-table row + Asking-section bullet for the per-ask mark-no-match rule. Frames why we don't mark positive. - `utils/mouse/OVERVIEW.md`: Operations table gains `bad` + `log` entries; query_log schema documented. ## Tests (utils/mouse/tests/test_index.das) 40/40 pass (was 34/34). Six new tests: - log_query returns increasing ids - mark_no_match sets useful=0 + starts NULL - mark_no_match on nonexistent id returns 0 - mark_no_match idempotent (re-mark is no-op) - recent_queries filters Bad - recent_queries Review excludes match_count=0 rows ## Opportunistic lint cleanups (lint surfaced from generic instantiation) - `daslib/linq.das:847` `take`: ternary → min(total, length(arr)) [PERF015] - `modules/dasSQLITE/daslib/sqlite_linq.das:38` `_first_opt`: `length(arr) > 0` → `!empty(arr)` [PERF017] Note: sqlite_linq.das has 41 other pre-existing lint warnings unrelated to this PR (PERF013/PERF017/STYLE015/STYLE016) scattered through 4800+ lines. Deliberately left for a separate focused cleanup PR. Verified end-to-end via CLI smoke (ask → log → bad → log --bad/--review) and JSON-RPC tools/list (returns mouse__ask, mouse__add, mouse__get, mouse__rebuild, mouse__bad, mouse__log). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three review comments accepted (one with split verdict): 1. cmd_log empty-state was misleading under filter modes — saying "(no queries logged yet)" while a populated log just had no rows matching --misses/--bad/--review. Now mode-aware: keep the original message only for LogMode.All; print "(no rows matching the selected filter)" otherwise. 2. tool_log displayed the raw caller-provided `mode` string verbatim — so `mode:""` (omitted) rendered as "mode=" and unknown values rendered as "mode=foo" while silently returning All. Display now derived from the parsed enum via log_mode_display(), so the surfaced mode always matches what we filtered by. Did NOT add error-on-unknown defensive guard — the JSON schema enum is the right place to enforce the contract; runtime validation duplicates it. 3. skills/mouse.md trigger-table row showed `mouse__bad(query_id)` while the MCP arg is `queryId` (camelCase, matching `rawQuery` convention). Reader copying the doc would hit a parameter-name error. One-word fix. 40/40 mouse tests still pass; lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…int)
Round-2 Copilot review caught a real bug: my get_int_arg / get_int64_arg
checked `v.value is _number` only, but daslib/json parses JSON integer
literals into `_longint` (int64) — not _number (double). So MCP callers
sending `queryId: 1` (JSON integer) got "missing or invalid 'queryId'"
while `queryId: 1.0` (JSON float) worked. Same latent bug on
tool_ask's `k` and tool_log's `limit`.
The canonical fix turned out to be deletion. daslib/json_boost.das:85-140
already defines operator?? overloads for every numeric type, bool, and
string — each backed by null_coalescing (lines 59-69) which handles BOTH
_number AND _longint. The skill at skills/json.md:105 shows the pattern:
`let first_score = js?["user"]?["scores"]?[0] ?? 0`.
So instead of fixing my four helpers (get_string_arg, get_int_arg,
get_bool_arg, get_int64_arg), I dropped them entirely (-40 lines) and
replaced 18 callsites with direct `args?[key] ?? default` form. The
explicit type annotation on each let-binding (`let k : int = ...`) pins
the ?? overload selection.
Repro before: queryId:1 → "missing or invalid 'queryId'"
Repro after: queryId:1 → "marked id=1 as no-match. ..."
Also addressed in this commit:
- resolve_log_mode comment said "First-set-wins" but the code is
fixed-priority misses>bad>review regardless of CLI flag order.
Comment updated to match reality.
- tests/json/safe.das: filled the small gap in ?? operator coverage.
Existing tests cover `?? int` (line 14) and `?? string`/`?? bool`,
but `?? int64` directly off a JSON integer literal (the path
tool_bad uses) wasn't exercised. One-line addition asserting
`js?.a ?? -1l == 1l` on a `{ "a": 1 }` payload. 4/4 tests still pass.
40/40 mouse tests still pass; lint clean; format clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot caught that `mouse__bad <id>` reads as a CLI-style positional invocation, but mouse__bad is an MCP tool taking the named argument `queryId`. Reworded to "call mouse__bad with the row's id (CLI: mouse bad <id>)" — descriptive rather than syntax-literal, so the example can't be copy-pasted into an invalid invocation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(mouse): bad_mouse — mark no-match asks; useful flag + log MCP tool
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )