Add database indexes for common stream query patterns#616
Add database indexes for common stream query patterns#616dolaoluwa574-source wants to merge 1 commit into
Conversation
|
@dolaoluwa574-source is attempting to deploy a commit to the ritik4ever's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
@dolaoluwa574-source Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits. You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀 |
📝 WalkthroughWalkthroughAdds SQLite indexes for common ChangesSQLite stream indexes and benchmark
Sequence DiagramsequenceDiagram
participant main as main()
participant buildDb as buildDb(withIndexes)
participant coldBench as coldBench(dbPath, sql, params, iters)
participant sqlite as better-sqlite3
participant fs as fs
main->>buildDb: create scan-only temp DB
main->>buildDb: create indexed temp DB
buildDb->>sqlite: create schema, insert 100,000 rows, add indexes
main->>sqlite: EXPLAIN QUERY PLAN for each benchmark query
main->>coldBench: time each query on both DBs
coldBench->>sqlite: open read-only, prepare, execute, close
main->>fs: delete temporary database files
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/benchmark-indexes.ts`:
- Around line 8-10: The script header documents a DB_PATH mode that is not
actually supported, so either implement reading process.env.DB_PATH in
benchmark-indexes.ts or remove that usage example from the comment. Update the
script entry path handling around the benchmark-indexes.ts setup so the
documented invocation matches the behavior, and keep the usage text aligned with
the actual CLI options.
- Around line 87-111: The benchmark setup in buildDb and the transaction that
populates streams should use the same synthetic rows for both database variants
instead of generating fresh random start_at and total_amount values per call.
Extract row generation from the insertion loop into a shared dataset, then have
buildDb(false) and buildDb(true) insert that identical row set so the benchmark
in scripts/benchmark-indexes.ts compares indexes on vs off against the same
data.
- Around line 175-186: The EXPLAIN QUERY PLAN section in benchmark-indexes.ts
only prints planner output and then later reports success unconditionally, so
add a real assertion in the indexed DB loop by parsing the returned `detail`
rows from `db.prepare(...).all(...)` and verifying each indexed query shows the
expected `idx_streams_*` index name, or at minimum `SEARCH streams USING INDEX`,
before reaching the success summary. Use the existing `QUERIES` loop and
`EXPLAIN QUERY PLAN` output handling to fail fast when a query falls back to
`SCAN streams`, and only print the “all indexes are used” message after those
checks pass.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ae8c1d9a-2080-4979-b907-12fb7fd50887
📒 Files selected for processing (3)
backend/src/migrations/0002_add_stream_indexes.sqlbackend/src/migrations/0002_add_stream_indexes.tsscripts/benchmark-indexes.ts
| * Usage (from repo root): | ||
| * npx ts-node scripts/benchmark-indexes.ts | ||
| * DB_PATH=backend/data/streams.db npx ts-node scripts/benchmark-indexes.ts |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Remove or implement the documented DB_PATH mode.
The header says DB_PATH=backend/data/streams.db npx ts-node scripts/benchmark-indexes.ts, but the script never reads process.env.DB_PATH. That usage string is misleading as written.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/benchmark-indexes.ts` around lines 8 - 10, The script header
documents a DB_PATH mode that is not actually supported, so either implement
reading process.env.DB_PATH in benchmark-indexes.ts or remove that usage example
from the comment. Update the script entry path handling around the
benchmark-indexes.ts setup so the documented invocation matches the behavior,
and keep the usage text aligned with the actual CLI options.
| const now = Math.floor(Date.now() / 1000); | ||
| const insert = db.prepare(` | ||
| INSERT INTO streams | ||
| (id, sender, recipient, asset_code, total_amount, | ||
| start_at, duration_sec, canceled_at, completed_at, paused_at) | ||
| VALUES (?,?,?,?,?,?,?,?,?,?) | ||
| `); | ||
|
|
||
| db.transaction(() => { | ||
| for (let i = 0; i < ROW_COUNT; i++) { | ||
| const ca = i % 20 === 0 ? now - 3600 : null; | ||
| const cp = ca === null && i % 15 === 0 ? now - 1800 : null; | ||
| const pa = ca === null && cp === null && i % 30 === 0 ? now - 600 : null; | ||
| insert.run( | ||
| `s${String(i).padStart(7, "0")}`, | ||
| stellarId("U", "0", i), // unique sender per row (high cardinality) | ||
| stellarId("R", "0", i), // unique recipient per row | ||
| ASSETS[i % ASSETS.length], | ||
| Math.round(Math.random() * 10_000 * 100) / 100, | ||
| now - Math.floor(Math.random() * 30 * 86_400), | ||
| 3_600, | ||
| ca, cp, pa | ||
| ); | ||
| } | ||
| })(); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Benchmark both databases from the same synthetic dataset.
buildDb(false) and buildDb(true) each generate fresh random start_at/total_amount values, so the benchmark is comparing different tables instead of isolating “indexes on vs off”. That makes the measured speedups noisy and can hide regressions or create false wins, especially for the range query. Generate the rows once and load the identical row set into both databases.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/benchmark-indexes.ts` around lines 87 - 111, The benchmark setup in
buildDb and the transaction that populates streams should use the same synthetic
rows for both database variants instead of generating fresh random start_at and
total_amount values per call. Extract row generation from the insertion loop
into a shared dataset, then have buildDb(false) and buildDb(true) insert that
identical row set so the benchmark in scripts/benchmark-indexes.ts compares
indexes on vs off against the same data.
| // ── EXPLAIN QUERY PLAN ────────────────────────────────────────────────── | ||
| console.log("─".repeat(70)); | ||
| console.log("EXPLAIN QUERY PLAN (indexed DB)"); | ||
| console.log("─".repeat(70)); | ||
| { | ||
| const db = new Database(idxDb, { readonly: true }); | ||
| for (const q of QUERIES) { | ||
| const plan = db.prepare(`EXPLAIN QUERY PLAN ${q.sql}`).all(...q.params) as Array<{ detail: string }>; | ||
| console.log(`\n ▸ ${q.label}`); | ||
| for (const row of plan) console.log(` ${row.detail}`); | ||
| } | ||
| db.close(); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Fail when EXPLAIN QUERY PLAN does not show the expected index.
Right now this only prints the planner output, then Line 223 unconditionally says all four indexes are used. If SQLite falls back to SCAN streams, the script still reports success as long as the equality timing check passes. Parse the detail rows and assert the expected idx_streams_* name (or at least SEARCH streams USING INDEX) for each indexed case before printing the success summary.
Also applies to: 223-226
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/benchmark-indexes.ts` around lines 175 - 186, The EXPLAIN QUERY PLAN
section in benchmark-indexes.ts only prints planner output and then later
reports success unconditionally, so add a real assertion in the indexed DB loop
by parsing the returned `detail` rows from `db.prepare(...).all(...)` and
verifying each indexed query shows the expected `idx_streams_*` index name, or
at minimum `SEARCH streams USING INDEX`, before reaching the success summary.
Use the existing `QUERIES` loop and `EXPLAIN QUERY PLAN` output handling to fail
fast when a query falls back to `SCAN streams`, and only print the “all indexes
are used” message after those checks pass.
Closes #361
Problem
Every filtered query against the
streamstable performs a full table scan. At small row counts this is invisible, but as the database grows the four most common access patterns become the dominant bottleneck:Run the benchmark yourself:
Testing checklist
EXPLAIN QUERY PLANoutput reviewed (see Verification section above)npm run dev:backendstarts without errors afterup()is wired intodb.tsscripts/benchmark-indexes.tsruns to completion with no assertion failuresNo breaking changes
Indexes are read-only additions to the schema. They have no effect on API contracts, existing query results, or the Soroban contract layer.
Summary by CodeRabbit
Performance
Chores