Give your coding agents a pair of cleats, so they can sprint through your codebase.
PairOfCleats builds a hybrid semantic index for a repo (code + docs) and exposes a CLI/MCP server for fast, filterable search. It is designed for agent workflows, with artifacts stored outside the repo by default so they can be shared across runs, containers, and CI while keeping working trees clean.
The index captures rich structure and metadata: language-aware chunking across code, configs, and docs; docstrings/signatures/annotations; call/import/usage relations; control-flow and dataflow summaries; type inference (intra-file with optional cross-file); git-aware churn metadata; and embeddings for semantic search. Search combines BM25 token/phrase scoring, MinHash similarity, dense vectors, and optional SQLite backends (including FTS5 and ANN via sqlite-vec) with filters and human/JSON output. The tooling also includes incremental indexing, cache management, dictionary bootstrapping, CI artifact restore/build, optional language tooling detection/installation, and triage workflows for ingesting vulnerability records plus generating context packs.
Active development. Current execution status lives in COMPLETE_PLAN.md; ROADMAP.md is historical.
- Node.js 18+
- Optional: Python 3 for AST-based metadata on
.pyfiles (fallbacks to heuristics; worker pool viaindexing.pythonAst.*) - Optional: SQLite backend (via
better-sqlite3) - Optional: SQLite vector extension (
sqlite-vec) for ANN acceleration
npm run setup- Guided prompts for install, dictionaries, models, extensions, tooling, and indexes.
- Add
--non-interactivefor CI or automated runs. - Add
--with-sqliteto build SQLite indexes. - Add
--incrementalto reuse per-file cache bundles.
npm run bootstrap(fast, no prompts)- Add
--with-sqliteto build SQLite indexes. - Add
--incrementalto reuse per-file cache bundles.
- Add
npm run watch-index(FS events by default; add--watch-pollto enable polling)npm run api-server(local HTTP JSON API for status/search)npm run indexer-service(multi-repo sync + queue; see docs/service-mode.md)- Cache is outside the repo by default; set
cache.rootin.pairofcleats.jsonto override. - CLI commands auto-detect repo roots; use
--repo <path>to override. - Local CLI entrypoint:
node bin/pairofcleats.js <command>(mirrorsnpm runscripts).
- Languages: JavaScript/TypeScript, Python, Swift, Rust, C/C++/ObjC, Go, Java, C#, Kotlin, Ruby, PHP, Lua, SQL (dialects), Perl, Shell
- LSP enrichment (clangd/sourcekit-lsp) is best-effort; clangd uses compile_commands.json when available and can be required via
tooling.clangd.requireCompilationDatabase - Config formats: JSON, TOML, INI/CFG/CONF, XML, YAML, Dockerfile, Makefile, GitHub Actions YAML
- Docs: Markdown, RST, AsciiDoc
- Chunking:
- Code declarations (functions, classes, methods, types)
- Config sections (keys/blocks)
- Doc headings/sections
- Ignore files:
.pairofcleatsignore(gitignore-style) and.gitignore - Large file guardrails:
indexing.maxFileBytes(default 5 MB; set to0to disable) - Metadata per chunk:
- docstrings, signatures, params, decorators/annotations
- modifiers + visibility + inheritance
- code relations (calls/imports/exports/usages)
- interprocedural call summaries (args + return hints)
- dataflow (reads/writes/mutations/aliases) + control-flow summaries
- risk signals (sources/sinks/flows + tags, with cross-file call correlation)
- type inference (intra-file, optional cross-file)
- git metadata (last author/date, churn = added+deleted lines), JS complexity/lint, headline + neighbor context
- Triage records (findings + decisions) indexed outside the repo
- Index artifacts:
- token postings (always)
- phrase/chargram postings (configurable via
indexing.postings.*) - MinHash signatures
- dense vectors (merged + doc/code variants; MiniLM)
- repo map (symbols + signatures + file paths)
- incremental per-file cache bundles
- optional ctags ingest (
npm run ctags-ingest) (docs/ctags.md) - optional SCIP ingest (
npm run scip-ingest) (docs/scip.md) - optional LSIF ingest (
npm run lsif-ingest) (docs/lsif.md) - optional GNU Global ingest (
npm run gtags-ingest) (docs/gtags.md)
- Symbol source precedence: docs/symbol-sources.md
- BM25 token/phrase search + n-grams/chargrams
- MinHash similarity fallback
- Dense vectors (optional, ANN-aware when enabled)
- Query syntax:
-termexcludes tokens,"exact phrase"boosts phrase matches,-"phrase"excludes phrases - File/path regex and substring filters use a chargram prefilter before exact matching.
- Symbol-aware ranking boosts for declarations/exports (configurable via
search.symbolBoost.*, default def=1.2, export=1.1). - Modes:
code,prose,both,records,all - Backends:
memory(file-backed JSON)sqlite(same scoring, shared artifacts)sqlite-fts(SQLite-only FTS5 scoring)
- Structural search CLI for rule packs (Semgrep/ast-grep/Comby): docs/structural-search.md
- Common filters (ext/kind/author/visibility) use precomputed indexes for speed.
- Filters (high-signal subset):
--type,--signature,--param,--decorator,--inferred-type,--return-type--throws,--reads,--writes,--mutates,--awaits--alias--risk,--risk-tag,--risk-source,--risk-sink,--risk-category,--risk-flow--branches,--loops,--breaks,--continues--async,--generator,--returns--author,--chunk-author,--modified-after,--modified-since,--churn [min](git numstat added+deleted),--lint,--calls,--import,--uses,--extends--path/--file(substring or/regex/),--ext,--lang,--branch--case,--case-file,--case-tokens(case-sensitive matching)--meta,--meta-json(records metadata filters)
- Output:
- human-readable (color),
--json(full), or--json-compact(lean tooling payload) - full JSON includes
score(selected),scoreType,sparseScore,annScore, andscoreBreakdown(sparse/ann/phrase/symbol/selected) --explain/--whyprints a score breakdown in human output (selected/sparse/ANN/phrase)
- human-readable (color),
- Optional query cache (
search.queryCache.*in.pairofcleats.json)
- Ingest findings into cache-backed records:
node tools/triage/ingest.js --source dependabot --in dependabot.json --meta service=api --meta env=prodnode tools/triage/ingest.js --source aws_inspector --in inspector.json --meta service=api --meta env=prodnode tools/triage/ingest.js --source generic --in record.json --meta service=api --meta env=prod
- Build the records index:
node build_index.js --mode records --incremental - Search records with metadata filters:
node search.js "CVE-2024-0001" --mode records --meta service=api --meta env=prod --json
- Create decision records:
node tools/triage/decision.js --finding <recordId> --status accept --justification "..."
- Generate a context pack:
node tools/triage/context-pack.js --record <recordId> --out context.json
- Docs:
docs/triage-records.md
- Default English wordlist:
npm run download-dicts -- --lang en(setup/ bootstrap runs this) - Cache dir:
<cache>/dictionaries(override withdictionary.dirorPAIROFCLEATS_DICT_DIR) - Update dictionaries with ETag/Last-Modified:
npm run download-dicts -- --update - Add custom lists:
npm run download-dicts -- --url mylist=https://example.com/words.txt - Slang support: drop
.txtfiles into theslang/folder in the dictionary cache - Repo-specific dictionary (opt-in):
npm run generate-repo-dict -- --min-count 3- enable via
{ "dictionary": { "enableRepoDictionary": true } }
- Models live under
<cache>/modelsby default - Download:
npm run download-models - Override in
.pairofcleats.json:{ "models": { "id": "Xenova/all-MiniLM-L12-v2", "dir": "C:/cache/pairofcleats/models" } } - Env overrides:
PAIROFCLEATS_MODELS_DIR,PAIROFCLEATS_MODEL
- Build:
npm run build-sqlite-index - Uses split DBs (
index-code.db+index-prose.db) for concurrency search.jsauto-uses SQLite whensqlite.useis not disabled and DBs exist, unlesssearch.sqliteAutoChunkThresholdkeeps small repos on file-backed indexes (default 0; set higher to keep small repos on file-backed indexes)- FTS5 scoring (optional): set
sqlite.scoreModetofts - ANN extension (optional): set
sqlite.annMode = "extension"and installsqlite-vec- ANN is on by default when
search.annDefaultis true; use--no-annor setsearch.annDefault: falseto disable - Install:
npm run download-extensions - Verify:
npm run verify-extensions
- ANN is on by default when
- Guided setup:
npm run setup(prompts) - CI/automation:
npm run setup -- --non-interactive --json(summary JSON on stdout) - Manual steps:
- Install dependencies:
npm install - Optional extras:
- Dictionaries:
npm run download-dicts -- --lang en - Models:
npm run download-models - SQLite ANN extension:
npm run download-extensions - Verify extension:
npm run verify-extensions - Detect tooling:
npm run tooling-detect - Install tooling:
npm run tooling-install -- --scope cache - Tooling targets: tsserver, typescript-language-server, clangd, sourcekit-lsp, rust-analyzer, gopls, jdtls, kotlin-language-server, kotlin-lsp, omnisharp, csharp-ls, ruby-lsp, solargraph, phpactor, intelephense, lua-language-server, bash-language-server, sqls
- Git hooks:
npm run git-hooks -- --install - Validate config:
npm run config-validate -- --config .pairofcleats.json
- Dictionaries:
- Build indexes:
- File-backed + SQLite (default):
node build_index.js(add--incrementalif desired; add--no-sqliteto skip SQLite) - SQLite only:
npm run build-sqlite-index - Validate:
npm run index-validate
- File-backed + SQLite (default):
- Install dependencies:
Run: npm run api-server or node bin/pairofcleats.js server
Endpoints:
GET /healthGET /status?repo=<path>POST /search(JSON payload mirrors CLI filters)GET /status/stream(SSE)POST /search/stream(SSE)- Docs:
docs/api-server.md
- VS Code extension (CLI shell-out) under
extensions/vscode - Command:
PairOfCleats: Search - Uses
pairofcleats search --json-compactwith file/line hints - Docs:
docs/editor-integration.md
Run: npm run mcp-server
Tools:
index_statusconfig_statusbuild_indexsearchtriage_ingesttriage_decisiontriage_context_packdownload_modelsdownload_dictionariesdownload_extensionsverify_extensionsbuild_sqlite_indexcompact_sqlite_indexcache_gcclean_artifactsbootstrapreport_artifactssearchdefaults to compact JSON payloads (setoutput: "full"for full JSON).- Progress: long-running tools emit
notifications/progresswith{ id, tool, message, stream, phase }. - Errors:
tools/callresponses setisError=trueand return a JSON payload withmessageplus optionalcode,stdout,stderr,hint. - Docs:
docs/mcp-server.md
All-in-one (runs everything it can):
npm run test-allnpm run test-all-no-bench(skips the benchmark run)npm run test-all -- --skip-bench(same as above)
Core:
npm run verifynpm run fixture-smokenpm run fixture-paritynpm run fixture-evalnpm run search-explain-test
Fidelity:
npm run language-fidelity-testnpm run format-fidelity-testnpm run type-inference-crossfile-test
SQLite + extensions:
npm run sqlite-incremental-testnpm run sqlite-compact-testnpm run sqlite-ann-extension-testnpm run download-extensions-test
Tooling + caches:
npm run download-dicts-testnpm run setup-testnpm run tooling-detect-testnpm run tooling-install-testnpm run query-cache-testnpm run index-validate-testnpm run clean-artifacts-testnpm run uninstall-testnpm run cache-gc-testnpm run git-hooks-test
Triage:
npm run triage-test
Reports + MCP:
npm run repometrics-dashboard-testnpm run summary-report-testnpm run mcp-server-testnpm run api-server-testnpm run api-server-stream-testnpm run vscode-extension-test
Meta:
npm run script-coverage-testnpm run docs-consistency-testnpm run bench/npm run bench-ann/npm run bench-language
- Report cache sizes:
npm run report-artifacts(add-- --allfor all repos) - Validate index artifacts:
npm run index-validate - Cache GC (age/size):
npm run cache-gc -- --max-gb 10or--max-age-days 30 - Clean repo artifacts:
npm run clean-artifacts(add-- --allto clear repo caches; keeps models/dictionaries/extensions) - Uninstall caches + models + extensions:
npm run uninstall - Compact SQLite indexes:
npm run compact-sqlite-index - Dependency policy: versions are pinned in
package.json; update vianpm installand commitpackage-lock.json. - Repometrics dashboard:
npm run repometrics-dashboard - Model comparison:
npm run compare-models - Combined summary report:
npm run summary-report(add-- --jsonfor JSON output) - Tooling detect/install:
npm run tooling-detect,npm run tooling-install - Git hooks (post-commit/post-merge):
npm run git-hooks -- --install - CI artifacts:
node tools/ci-build-artifacts.js --out ci-artifacts,node tools/ci-restore-artifacts.js --from ci-artifacts
COMPLETE_PLAN.md- single source of truth for all phasesdocs/ast-feature-list.md- metadata schema + per-language coveragedocs/language-fidelity.md- parsing validation checklistdocs/parser-backbone.md- parser and inference strategydocs/language-handler-imports.md- registry import tradeoffsdocs/editor-integration.md- editor contract + VS Code extensiondocs/api-server.md- local HTTP JSON API surfacedocs/mcp-server.md- MCP tool surface and behaviordocs/sqlite-index-schema.md- SQLite schema for artifactsdocs/sqlite-incremental-updates.md- incremental update flowdocs/sqlite-compaction.md- compaction detailsdocs/sqlite-ann-extension.md- SQLite ANN extension setupdocs/model-comparison.md- model evaluation harnessdocs/language-benchmarks.md- language benchmark repos and workflowdocs/query-cache.md- query cache behaviordocs/repometrics-dashboard.md- repometrics output and usagedocs/setup.md- unified setup flow and flagsdocs/structural-search.md- structural search CLIdocs/rule-packs.md- rule pack registrydocs/gtags.md- GNU Global ingestdocs/service-mode.md- multi-repo service workflowdocs/external-backends.md- backend evaluation notesdocs/triage-records.md- triage ingestion + context packsdocs/config-schema.json- config schema for.pairofcleats.jsondocs/references/README.md- OSS references and takeaways
<cache>/repos/<repoId>/index-code<cache>/repos/<repoId>/index-prose<cache>/repos/<repoId>/index-records<cache>/repos/<repoId>/incremental/<mode><cache>/repos/<repoId>/repometrics<cache>/repos/<repoId>/triage/records<cache>/repos/<repoId>/triage/context-packs<cache>/repos/<repoId>/index-sqlite/index-code.db<cache>/repos/<repoId>/index-sqlite/index-prose.db<cache>/dictionaries<cache>/models<cache>/extensions<cache>/tooling
Default cache root:
- Windows:
%LOCALAPPDATA%\\PairOfCleats - Linux/macOS:
$XDG_CACHE_HOME/pairofcleatsor~/.cache/pairofcleats - Override with
cache.root,PAIROFCLEATS_CACHE_ROOT, orPAIROFCLEATS_HOME