Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
126 commits
Select commit Hold shift + click to select a range
344e933
Fix audit search filters and help output
doublemover Dec 30, 2025
b33547c
Add minimal API server
doublemover Dec 30, 2025
6234117
Add CLI score explainability
doublemover Dec 30, 2025
47fb8f7
Add VS Code editor integration
doublemover Dec 30, 2025
c8bad9c
Add streaming API endpoints
doublemover Dec 31, 2025
54b894d
Enhance AST control flow and alias metadata
doublemover Dec 31, 2025
e286363
Fix search tooling wiring and SQL flow metadata
doublemover Dec 31, 2025
0df95ad
Document SQL flow metadata
doublemover Dec 31, 2025
c57b6e8
Refactor search CLI into modules
doublemover Dec 31, 2025
30d9b2d
Unify doc comment extraction
doublemover Dec 31, 2025
09fe254
Add index validation tooling
doublemover Dec 31, 2025
3134d97
Add LSP tooling plumbing and best-effort clangd
doublemover Jan 1, 2026
c101c23
Refactor tooling types before call links
doublemover Jan 1, 2026
c754466
Extract tooling providers for cross-file inference
doublemover Jan 1, 2026
deaa0c1
Improve TypeScript tooling coverage and tsconfig support
doublemover Jan 1, 2026
fae2519
Add tsconfig-aware TypeScript tooling provider
doublemover Jan 1, 2026
177e614
Add clangd/sourcekit tooling providers and fix summary report
doublemover Jan 1, 2026
31ac4c7
Add stub LSP server for tooling enrichment tests
doublemover Jan 1, 2026
6db6786
Add AST-backed TypeScript chunking
doublemover Jan 1, 2026
beb3813
Add python AST worker pool and search perf upgrades
doublemover Jan 1, 2026
66b447d
taking the shoes for a run
doublemover Jan 1, 2026
37253ad
Update benchmarks, setup flow, and plans
doublemover Jan 2, 2026
d572137
Complete Phase 67 bugfixes
doublemover Jan 2, 2026
2557ab8
Auto-raise heap for bench runs
doublemover Jan 2, 2026
dbd6684
Auto-raise heap in bench runner
doublemover Jan 2, 2026
23a1c8a
Update complete plan decisions
doublemover Jan 2, 2026
3fe2f17
Add benchmark profiling and fix indexing regressions
doublemover Jan 2, 2026
0b5eff0
Add OSS reference summaries and roadmap phases
doublemover Jan 2, 2026
a297fdc
Fix npm scoped external docs and add regression test
doublemover Jan 2, 2026
07e04ad
Update documentation parity and MCP server docs
doublemover Jan 2, 2026
3ca20e2
Adopt vscode-jsonrpc for MCP/LSP plumbing
doublemover Jan 2, 2026
9fa285a
chore: gate flaky crossfile test and split model bench
doublemover Jan 3, 2026
13815a5
test: add retries and failure logs for script coverage
doublemover Jan 3, 2026
e77985e
test: gate LSP enrichment failure and track in docs
doublemover Jan 3, 2026
020edff
test: gate fixture parity flake and track failures
doublemover Jan 3, 2026
5c11dd9
docs: move completed phases out of plan
doublemover Jan 3, 2026
91090c4
indexer: reduce artifacts and reuse lint caching
doublemover Jan 3, 2026
6bb4b72
process: complete phase 74 execa migration
doublemover Jan 3, 2026
d81815a
Complete phase 77 dependency hygiene and phase 78 fixes
doublemover Jan 3, 2026
c10ba1f
Add tree-sitter backbone
doublemover Jan 3, 2026
92ef704
Align tooling registry and SQL parser
doublemover Jan 3, 2026
6a553fa
Batch git blame and embeddings
doublemover Jan 3, 2026
0ef347c
Add file_meta artifact for file data
doublemover Jan 3, 2026
f2a99be
Default SQLite storage and compress artifacts
doublemover Jan 3, 2026
c1d49ef
Fix file relations scope in incremental build
doublemover Jan 3, 2026
6178e6c
Guard tree-sitter language load errors
doublemover Jan 3, 2026
5b3b4bf
Guard tree-sitter parse failures
doublemover Jan 3, 2026
b70839f
Align embed dims and clarify tree-sitter warnings
doublemover Jan 3, 2026
144382b
Update ESLint init options
doublemover Jan 3, 2026
ecc7548
Add crash logging for index builds
doublemover Jan 3, 2026
b648593
Add line-rate metrics to bench build progress
doublemover Jan 3, 2026
b77602f
Auto-clear stale locks for bench builds
doublemover Jan 3, 2026
cc38c07
Add HTML/CSS parsing and embedded chunks
doublemover Jan 3, 2026
15ac4bf
Enhance embedded HTML chunking for JSON/XML/YAML
doublemover Jan 3, 2026
7dadf1e
Improve bench/index progress and sqlite build path
doublemover Jan 3, 2026
1ae0240
Fix sqlite file meta ingestion and import link tests
doublemover Jan 3, 2026
5b0ecea
Add codebase review notes for correctness pass
doublemover Jan 3, 2026
1c53b84
Update codebase review notes for search pass
doublemover Jan 3, 2026
d510189
Add minhash parity test
doublemover Jan 3, 2026
65e685c
Update codebase review notes for tooling and shared
doublemover Jan 3, 2026
f4f34f0
Update codebase review notes for language handlers
doublemover Jan 3, 2026
cfa211a
Add file path prefiltering and punctuation tokens
doublemover Jan 3, 2026
7f06a53
Add structural search, GTAGS ingest, and service mode
doublemover Jan 3, 2026
003a676
Fix html/css chunking and update tests for file_meta
doublemover Jan 3, 2026
09e1bf1
Fix embedded config chunks and stabilize ingest tests
doublemover Jan 3, 2026
5ddb04a
Use Windows path resolution for structural search
doublemover Jan 3, 2026
a298eb6
Improve structural search resolution messages
doublemover Jan 3, 2026
8904ea7
Finish search prefilter phase
doublemover Jan 3, 2026
096a7c2
Close phase 83 and cover repo map
doublemover Jan 3, 2026
6a80ba3
Add benchmark matrix runner
doublemover Jan 3, 2026
ab0717a
Default matrix runs to typical tier
doublemover Jan 3, 2026
52030fc
Fix bench index existence checks
doublemover Jan 4, 2026
326602b
Clear stale bench locks reliably
doublemover Jan 4, 2026
ffe49f7
Add Kotlin performance guardrails
doublemover Jan 4, 2026
9c2d38f
Fix large JSON crashes in bench matrix
doublemover Jan 4, 2026
39e3008
Update docs and sqlite bundle rebuilds
doublemover Jan 4, 2026
19fd2cd
Remove deps fixes and archive roadmap
doublemover Jan 4, 2026
1e88da0
Update README.md
doublemover Jan 4, 2026
1a1a15c
Add global profiles and config validation
doublemover Jan 5, 2026
0c5a67c
Add backend auto policy
doublemover Jan 5, 2026
373e7d4
Refine parser selection and TS repo resolution
doublemover Jan 5, 2026
f4c1210
Add tokenization guardrails and adaptive dictionary config
doublemover Jan 5, 2026
945dbdb
Add core API helpers and core API tests
doublemover Jan 5, 2026
4d50b0e
Use core search in API/MCP servers with cached indexes
doublemover Jan 5, 2026
b43f2a0
Add RRF fusion and BM25 defaults with docs
doublemover Jan 5, 2026
52aed78
Add retrieval evaluation harness and quality gate
doublemover Jan 5, 2026
f5cf794
Complete phase 9 fielded indexing
doublemover Jan 5, 2026
a481b3a
Complete phase 10 large artifact strategy
doublemover Jan 5, 2026
6a633f4
Complete phase 11 query intent classification
doublemover Jan 5, 2026
75b0363
Complete phase 12 context expansion
doublemover Jan 5, 2026
4b89103
Complete phase 13 structural search integration
doublemover Jan 5, 2026
e772a48
Complete phase 14 filter index artifact
doublemover Jan 5, 2026
90503a9
Complete phase 15 command surface simplification
doublemover Jan 5, 2026
aaf2fcb
Refactor module layout into index/retrieval/integrations
doublemover Jan 5, 2026
639ac40
Mark phase 16 complete in plan
doublemover Jan 5, 2026
229ee45
Add microbench suite and benchmark methodology docs
doublemover Jan 5, 2026
7d8e32b
Harden worker tokenization payloads
doublemover Jan 5, 2026
3dd9b76
Handle oversized JSON artifacts safely
doublemover Jan 5, 2026
fc69734
Polish CLI aliases and Kotlin perf notes
doublemover Jan 5, 2026
296493e
Adjust benchmark tiers and harden tree-sitter
doublemover Jan 5, 2026
2388642
Update tree-sitter wasm usage and tooling
doublemover Jan 5, 2026
651228d
Improve indexing pipeline and bench tooling
doublemover Jan 6, 2026
45270a3
mermy
doublemover Jan 6, 2026
8774e8b
Update README.md
doublemover Jan 6, 2026
eeb6941
making a huge fuckin mess
doublemover Jan 6, 2026
aceba98
Merge branch 'main' of https://github.com/doublemover/PairOfCleats
doublemover Jan 6, 2026
badb347
GIGAPLAN 0.5
doublemover Jan 6, 2026
961590f
SAMUEL ITS ME, I NEED 100 BILLION TOKENS
doublemover Jan 11, 2026
4b60a6b
hey I have an idea let's make a huge fucking mess
doublemover Jan 12, 2026
397baf1
Add Sublime Text plugin foundation
doublemover Jan 12, 2026
f14f558
Add Sublime Text search UX
doublemover Jan 12, 2026
c656d46
Add Sublime index lifecycle commands
doublemover Jan 12, 2026
1664e77
ITS CLETE
doublemover Jan 12, 2026
5f04487
rt?
doublemover Jan 12, 2026
9473eff
Auto-shard large artifacts and enhance throughput totals
doublemover Jan 13, 2026
6d77700
Squashed updates since 9473eff
doublemover Jan 14, 2026
1737412
making a mess
doublemover Jan 14, 2026
efac140
making a mess
doublemover Jan 14, 2026
c4e1b2f
Merge pull request #17 from doublemover/p7
doublemover Jan 14, 2026
2a0545f
making a mess
doublemover Jan 14, 2026
3ca66e0
Merge branch 'main' into p32
doublemover Jan 14, 2026
30c23ce
Merge pull request #18 from doublemover/p32
doublemover Jan 14, 2026
f2828ce
Merge pull request #19 from doublemover/p16
doublemover Jan 14, 2026
18a4483
Add LanceDB ANN backend support
doublemover Jan 14, 2026
48ff654
Merge remote changes
doublemover Jan 14, 2026
64b9082
making a mess
doublemover Jan 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
14 changes: 14 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# To get started with Dependabot version updates, you'll need to specify which
# package ecosystems to update and where the package manifests are located.
# Please see the documentation for all configuration options:
# https://docs.github.com/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file

version: 2
updates:
# Enable version updates for npm
- package-ecosystem: "npm"
# Look for `package.json` and `lock` files in the `root` directory
directory: "/"
# Check the npm registry for updates every day (weekdays)
schedule:
interval: "daily"
28 changes: 24 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,28 @@ jobs:
- name: Lint
run: npm run lint

- name: Verify
run: npm run verify
- name: Short tests (no bench)
run: npm run test-all-no-bench

- name: Fixture smoke
run: npm run fixture-smoke
windows:
runs-on: windows-latest
env:
PAIROFCLEATS_EMBEDDINGS: stub
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: '18'
cache: npm

- name: Install deps
run: npm ci

- name: Windows regression lane
run: |
node tests/worker-pool-windows.js
node tests/search-windows-path-filter.js
node tests/fixture-parity.js --fixtures sample
29 changes: 29 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,37 @@ index-code/
index-prose/
ci-artifacts/
tests/.cache/
tests/.logs/
benchmarks/repos/
benchmarks/cache/
benchmarks/results/
docs/benchmarks.json
docs/phase3-parity-report.json
*.db
*.db-shm
*.db-wal
__pycache__/
*.py[cod]
*.pyo
*$py.class
.Python
.pytest_cache/
.mypy_cache/
.ruff_cache/
.pytype/
.coverage
coverage.xml
htmlcov/
.tox/
.nox/
.venv/
venv/
ENV/
env/
env.bak/
venv.bak/
*.egg
*.egg-info/
.eggs/
pip-wheel-metadata/
.pairofcleats/
176 changes: 92 additions & 84 deletions .pairofcleats.json
Original file line number Diff line number Diff line change
@@ -1,101 +1,109 @@
{
"dictionary": {
"languages": [
"en"
],
"includeSlang": true,
"enableRepoDictionary": false,
"dir": "",
"files": [],
"slangDirs": [],
"slangFiles": []
},
"cache": {
"root": ""
},
// Enable sqlite index artifacts for search backends.
// Speed impact: adds sqlite build time when stage4 runs.
"sqlite": {
"use": true,
"dbDir": "",
"annMode": "extension",
"compactOnIncremental": false,
"vectorExtension": {
"provider": "sqlite-vec",
"dir": "",
"path": "",
"table": "dense_vectors_ann",
"column": "embedding",
"encoding": "float32",
"options": ""
}
// Toggle sqlite index usage/artifact generation.
// Speed impact: enabling adds some indexing time and disk usage.
"use": true
},
// Search defaults for query-time behavior.
// Speed impact: no direct impact on indexing speed.
"search": {
// Prefer ANN search by default when multiple backends exist.
// Speed impact: no impact on indexing; affects query latency/recall.
"annDefault": true,
"sqliteFtsNormalize": false,
"queryCache": {
"enabled": false,
"maxEntries": 200,
"ttlMs": 0
}
},
"triage": {
"recordsDir": "",
"storeRawPayload": false,
"promoteFields": [
"recordType",
"source",
"recordId",
"service",
"env",
"team",
"owner",
"vulnId",
"cve",
"packageName",
"packageEcosystem",
"severity",
"status",
"assetId"
],
"contextPack": {
"maxHistory": 5,
"maxEvidencePerQuery": 5
}
// Dense vector combination strategy for search.
// Speed impact: minor impact on embedding/storage cost during indexing.
"denseVectorMode": "merged"
},
// Index build pipeline options.
// Speed impact: many flags here change CPU/IO per file.
"indexing": {
"concurrency": 4,
"importConcurrency": 4,
"astDataflow": true,
"controlFlow": true,
"riskAnalysis": true,
"riskAnalysisCrossFile": true,
"typeInference": false,
"typeInferenceCrossFile": false,
"workerPool": {
"enabled": true,
"maxWorkers": 8
},
// Sparse postings generation settings.
// Speed impact: heavier postings settings increase indexing time/size.
"postings": {
// Build phrase n-gram postings.
// Speed impact: increases indexing time and index size.
"enablePhraseNgrams": true,
// Smallest phrase n-gram length.
// Speed impact: lower values add more n-grams and cost.
"phraseMinN": 2,
// Largest phrase n-gram length.
// Speed impact: higher values increase indexing time and size.
"phraseMaxN": 4,
// Build chargram postings for fuzzy matching.
// Speed impact: noticeable extra CPU and disk usage.
"enableChargrams": true,
// Smallest chargram length.
// Speed impact: lower values increase chargram volume and cost.
"chargramMinN": 3,
"chargramMaxN": 5
}
},
"sql": {
"dialect": "",
"dialectByExt": {
".psql": "postgres",
".pgsql": "postgres",
".mysql": "mysql",
".sqlite": "sqlite"
// Largest chargram length.
// Speed impact: higher values increase chargram volume and cost.
"chargramMaxN": 5,
// Choose which fields contribute chargrams.
// Speed impact: more fields increase indexing work.
"chargramSource": "fields",
// Cap token length eligible for chargrams.
// Speed impact: higher caps increase CPU on long identifiers.
"chargramMaxTokenLength": 48,
// Track postings per field (name, path, body, etc).
// Speed impact: slight overhead for richer scoring.
"fielded": true
},
// When to scan imports ("pre" or "post" indexing).
// Speed impact: small; "post" avoids extra upfront work.
"importScan": "post",
// Enable AST dataflow analysis.
// Speed impact: moderate CPU cost on large codebases.
"astDataflow": true,
// Enable control-flow analysis.
// Speed impact: moderate CPU cost on large codebases.
"controlFlow": true,
// Enable risk analysis rules.
// Speed impact: moderate CPU cost; can be heavy on huge repos.
"riskAnalysis": true,
// Enable cross-file risk correlation.
// Speed impact: heavy extra work on large repos.
"riskAnalysisCrossFile": true,
// Enable type inference.
// Speed impact: moderate to heavy CPU cost.
"typeInference": true,
// Enable cross-file type inference.
// Speed impact: heavy extra work on large repos.
"typeInferenceCrossFile": true,
// Collect git blame/churn metadata per file.
// Speed impact: heavy IO/CPU; can dominate indexing time.
"gitBlame": false,
// Run linting pass for diagnostics.
// Speed impact: extra CPU per file.
"lint": false,
// Compute complexity metrics.
// Speed impact: extra CPU per file.
"complexity": true,
// Python AST parsing options.
// Speed impact: small to moderate CPU on Python files.
"pythonAst": {
// Enable Python AST parsing.
// Speed impact: small to moderate on Python-heavy repos.
"enabled": true
},
// Tree-sitter parsing options.
// Speed impact: moderate CPU, improved chunking accuracy.
"treeSitter": {
// Enable tree-sitter parsing.
// Speed impact: moderate CPU on supported languages.
"enabled": true
}
},
"models": {
"id": "Xenova/all-MiniLM-L12-v2",
"dir": ""
},
"tooling": {
"autoInstallOnDetect": false,
"installScope": "cache",
"allowGlobalFallback": true,
"dir": ""
// Runtime process limits for the indexer.
// Speed impact: higher heap reduces GC stalls on big repos.
"runtime": {
// Max Node heap size in MB for the indexer process.
// Speed impact: too low slows indexing; higher reduces GC overhead.
"maxOldSpaceMb": 98048
}
}
1 change: 1 addition & 0 deletions .rgignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
benchmarks/repos/
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Changelog

All notable changes to PairOfCleats are documented in this file.

## Unreleased
### Breaking
- None.

### Added
- None.

### Fixed
- None.

## v0.2.0 - 2026-01-11
- Initial internal release.
Loading
Loading