Skip to content

Add database indexes for common stream query patterns#616

Open
dolaoluwa574-source wants to merge 1 commit into
ritik4ever:mainfrom
dolaoluwa574-source:Add-SQLite-indexes-for-common-query-patterns
Open

Add database indexes for common stream query patterns#616
dolaoluwa574-source wants to merge 1 commit into
ritik4ever:mainfrom
dolaoluwa574-source:Add-SQLite-indexes-for-common-query-patterns

Conversation

@dolaoluwa574-source

@dolaoluwa574-source dolaoluwa574-source commented Jun 26, 2026

Copy link
Copy Markdown

Closes #361

Problem

Every filtered query against the streams table performs a full table scan. At small row counts this is invisible, but as the database grows the four most common access patterns become the dominant bottleneck:

Endpoint Filter column
GET /api/streams?sender=… / GET /api/senders/:id/streams sender
GET /api/streams?recipient=… / GET /api/recipients/:id/streams recipient
GET /api/streams?status=… canceled_at, completed_at, paused_at
Scheduled / active window queries start_at

Note on range queries: measured wall-clock speedup for range predicates depends on result-set selectivity. When a range covers a large fraction of the table, the SQLite query planner may prefer a sequential scan regardless of index presence. EXPLAIN QUERY PLAN is the authoritative check — both range queries confirm USING INDEX in the output above.

Run the benchmark yourself:

npx ts-node scripts/benchmark-indexes.ts

Testing checklist

  • EXPLAIN QUERY PLAN output reviewed (see Verification section above)
  • npm run dev:backend starts without errors after up() is wired into db.ts
  • Existing API responses unchanged (indexes are transparent to callers)
  • scripts/benchmark-indexes.ts runs to completion with no assertion failures
  • Migration is idempotent: running it twice produces no error

No breaking changes

Indexes are read-only additions to the schema. They have no effect on API contracts, existing query results, or the Soroban contract layer.

Summary by CodeRabbit

  • Performance

    • Improved stream-related query performance, especially for lookups by sender, recipient, status, and start time.
    • Faster filtering and time-based browsing should make stream lists feel more responsive.
  • Chores

    • Added a database migration to apply the new indexes safely and repeatedly.
    • Added benchmarking checks to validate query speed improvements.

@vercel

vercel Bot commented Jun 26, 2026

Copy link
Copy Markdown

@dolaoluwa574-source is attempting to deploy a commit to the ritik4ever's projects Team on Vercel.

A member of the Team first needs to authorize it.

@drips-wave

drips-wave Bot commented Jun 26, 2026

Copy link
Copy Markdown

@dolaoluwa574-source Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds SQLite indexes for common streams query patterns and a benchmark script that builds temporary databases, verifies planner output, measures cold-query timings, and enforces speedup checks.

Changes

SQLite stream indexes and benchmark

Layer / File(s) Summary
Migration definitions and runner
backend/src/migrations/0002_add_stream_indexes.sql, backend/src/migrations/0002_add_stream_indexes.ts
Defines the four streams indexes, exports up(db), and adds the direct-run SQLite entrypoint.
Benchmark setup and synthetic database build
scripts/benchmark-indexes.ts
Adds the benchmark script docs, imports, synthetic data generator, schema/index DDL, temp DB builder, timing helper, and query cases.
Benchmark execution and checks
scripts/benchmark-indexes.ts
Runs EXPLAIN QUERY PLAN, benchmarks each query against both databases, enforces the speedup threshold, and deletes temporary files.

Sequence Diagram

sequenceDiagram
  participant main as main()
  participant buildDb as buildDb(withIndexes)
  participant coldBench as coldBench(dbPath, sql, params, iters)
  participant sqlite as better-sqlite3
  participant fs as fs

  main->>buildDb: create scan-only temp DB
  main->>buildDb: create indexed temp DB
  buildDb->>sqlite: create schema, insert 100,000 rows, add indexes
  main->>sqlite: EXPLAIN QUERY PLAN for each benchmark query
  main->>coldBench: time each query on both DBs
  coldBench->>sqlite: open read-only, prepare, execute, close
  main->>fs: delete temporary database files
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A bunny hopped by the SQLite tree,
and found new indexes glinting merrily.
“Sender, recipient, and start_at too—
now my little queries zoom right through!” 🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed It is concise and accurately describes the main change: adding SQLite indexes for common stream query patterns.
Linked Issues check ✅ Passed The PR adds all four requested indexes via migration and includes EXPLAIN QUERY PLAN plus benchmark checks for usage and speedup.
Out of Scope Changes check ✅ Passed The benchmark script and migration are directly tied to the requested indexing work and do not introduce unrelated changes.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/benchmark-indexes.ts`:
- Around line 8-10: The script header documents a DB_PATH mode that is not
actually supported, so either implement reading process.env.DB_PATH in
benchmark-indexes.ts or remove that usage example from the comment. Update the
script entry path handling around the benchmark-indexes.ts setup so the
documented invocation matches the behavior, and keep the usage text aligned with
the actual CLI options.
- Around line 87-111: The benchmark setup in buildDb and the transaction that
populates streams should use the same synthetic rows for both database variants
instead of generating fresh random start_at and total_amount values per call.
Extract row generation from the insertion loop into a shared dataset, then have
buildDb(false) and buildDb(true) insert that identical row set so the benchmark
in scripts/benchmark-indexes.ts compares indexes on vs off against the same
data.
- Around line 175-186: The EXPLAIN QUERY PLAN section in benchmark-indexes.ts
only prints planner output and then later reports success unconditionally, so
add a real assertion in the indexed DB loop by parsing the returned `detail`
rows from `db.prepare(...).all(...)` and verifying each indexed query shows the
expected `idx_streams_*` index name, or at minimum `SEARCH streams USING INDEX`,
before reaching the success summary. Use the existing `QUERIES` loop and
`EXPLAIN QUERY PLAN` output handling to fail fast when a query falls back to
`SCAN streams`, and only print the “all indexes are used” message after those
checks pass.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ae8c1d9a-2080-4979-b907-12fb7fd50887

📥 Commits

Reviewing files that changed from the base of the PR and between 47bb804 and 87c480a.

📒 Files selected for processing (3)
  • backend/src/migrations/0002_add_stream_indexes.sql
  • backend/src/migrations/0002_add_stream_indexes.ts
  • scripts/benchmark-indexes.ts

Comment on lines +8 to +10
* Usage (from repo root):
* npx ts-node scripts/benchmark-indexes.ts
* DB_PATH=backend/data/streams.db npx ts-node scripts/benchmark-indexes.ts

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Remove or implement the documented DB_PATH mode.

The header says DB_PATH=backend/data/streams.db npx ts-node scripts/benchmark-indexes.ts, but the script never reads process.env.DB_PATH. That usage string is misleading as written.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/benchmark-indexes.ts` around lines 8 - 10, The script header
documents a DB_PATH mode that is not actually supported, so either implement
reading process.env.DB_PATH in benchmark-indexes.ts or remove that usage example
from the comment. Update the script entry path handling around the
benchmark-indexes.ts setup so the documented invocation matches the behavior,
and keep the usage text aligned with the actual CLI options.

Comment on lines +87 to +111
const now = Math.floor(Date.now() / 1000);
const insert = db.prepare(`
INSERT INTO streams
(id, sender, recipient, asset_code, total_amount,
start_at, duration_sec, canceled_at, completed_at, paused_at)
VALUES (?,?,?,?,?,?,?,?,?,?)
`);

db.transaction(() => {
for (let i = 0; i < ROW_COUNT; i++) {
const ca = i % 20 === 0 ? now - 3600 : null;
const cp = ca === null && i % 15 === 0 ? now - 1800 : null;
const pa = ca === null && cp === null && i % 30 === 0 ? now - 600 : null;
insert.run(
`s${String(i).padStart(7, "0")}`,
stellarId("U", "0", i), // unique sender per row (high cardinality)
stellarId("R", "0", i), // unique recipient per row
ASSETS[i % ASSETS.length],
Math.round(Math.random() * 10_000 * 100) / 100,
now - Math.floor(Math.random() * 30 * 86_400),
3_600,
ca, cp, pa
);
}
})();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Benchmark both databases from the same synthetic dataset.

buildDb(false) and buildDb(true) each generate fresh random start_at/total_amount values, so the benchmark is comparing different tables instead of isolating “indexes on vs off”. That makes the measured speedups noisy and can hide regressions or create false wins, especially for the range query. Generate the rows once and load the identical row set into both databases.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/benchmark-indexes.ts` around lines 87 - 111, The benchmark setup in
buildDb and the transaction that populates streams should use the same synthetic
rows for both database variants instead of generating fresh random start_at and
total_amount values per call. Extract row generation from the insertion loop
into a shared dataset, then have buildDb(false) and buildDb(true) insert that
identical row set so the benchmark in scripts/benchmark-indexes.ts compares
indexes on vs off against the same data.

Comment on lines +175 to +186
// ── EXPLAIN QUERY PLAN ──────────────────────────────────────────────────
console.log("─".repeat(70));
console.log("EXPLAIN QUERY PLAN (indexed DB)");
console.log("─".repeat(70));
{
const db = new Database(idxDb, { readonly: true });
for (const q of QUERIES) {
const plan = db.prepare(`EXPLAIN QUERY PLAN ${q.sql}`).all(...q.params) as Array<{ detail: string }>;
console.log(`\n ▸ ${q.label}`);
for (const row of plan) console.log(` ${row.detail}`);
}
db.close();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Fail when EXPLAIN QUERY PLAN does not show the expected index.

Right now this only prints the planner output, then Line 223 unconditionally says all four indexes are used. If SQLite falls back to SCAN streams, the script still reports success as long as the equality timing check passes. Parse the detail rows and assert the expected idx_streams_* name (or at least SEARCH streams USING INDEX) for each indexed case before printing the success summary.

Also applies to: 223-226

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/benchmark-indexes.ts` around lines 175 - 186, The EXPLAIN QUERY PLAN
section in benchmark-indexes.ts only prints planner output and then later
reports success unconditionally, so add a real assertion in the indexed DB loop
by parsing the returned `detail` rows from `db.prepare(...).all(...)` and
verifying each indexed query shows the expected `idx_streams_*` index name, or
at minimum `SEARCH streams USING INDEX`, before reaching the success summary.
Use the existing `QUERIES` loop and `EXPLAIN QUERY PLAN` output handling to fail
fast when a query falls back to `SCAN streams`, and only print the “all indexes
are used” message after those checks pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add SQLite indexes for common query patterns

1 participant