Semantic + hybrid code search that complements ripgrep and ast-grep.
┌─────────────────────────────────────────────────────────────────┐
│ ripgrep (rg) │ ast-grep (sg) │ sgrep │
│ ───────────── │ ────────────── │ ────── │
│ Exact text/regex │ AST patterns │ Semantic + hybrid │
│ "findUser" │ $fn($args) │ "auth validation" │
└─────────────────────────────────────────────────────────────────┘
Coding agents (Amp, Claude Code, Cursor) waste tokens on failed grep attempts when searching for concepts rather than exact strings. sgrep understands what you mean, not just what you type.
# ❌ Agent tries 10+ grep patterns, burns 2000 tokens
rg "authenticate" && rg "auth" && rg "login" && rg "session" ...
# ✅ One semantic query, 50 tokens
sgrep "how does user authentication work"brew tap XiaoConstantine/tap
brew install sgrepcurl -fsSL https://raw.githubusercontent.com/XiaoConstantine/sgrep/main/install.sh | bashgo install github.com/XiaoConstantine/sgrep/cmd/sgrep@latestgit clone https://github.com/XiaoConstantine/sgrep.git
cd sgrep
# Default build (uses libSQL with DiskANN vector search)
go build -o sgrep ./cmd/sgrep
# Alternative: sqlite-vec backend
go build -tags=sqlite_vec -o sgrep ./cmd/sgrepRequirements: llama.cpp (for the embedding server)
brew install llama.cpp # macOS
# or build from source: https://github.com/ggerganov/llama.cppgo get github.com/XiaoConstantine/sgrep@latest# One-time setup: downloads embedding model (~130MB)
sgrep setup
# Index your codebase (auto-starts embedding server)
sgrep index .
# Semantic search (quick)
sgrep "error handling for database connections"
# Hybrid + ColBERT (recommended - best accuracy)
sgrep --hybrid --colbert "JWT token validation logic"
sgrep --hybrid --colbert "how are API rate limits implemented"
# Hybrid with custom weights
sgrep --hybrid --colbert "authentication middleware" --semantic-weight 0.5 --bm25-weight 0.5
# Watch mode (background indexing)
sgrep watch .The embedding server starts automatically when needed and stays running as a daemon.
Hybrid search combines semantic understanding with lexical matching (BM25) for improved accuracy. This helps when:
- Searching for specific technical terms (e.g., "JWT", "OAuth", "mutex")
- The query contains exact function/variable names
- Semantic search alone misses exact keyword matches
# Default: semantic-only search
sgrep "authentication"
# Hybrid: semantic (60%) + BM25 (40%) - default weights
sgrep --hybrid "authentication"
# Custom weights: more emphasis on exact matches
sgrep --hybrid --semantic-weight 0.4 --bm25-weight 0.6 "parseAST"Note: Hybrid search requires building with FTS5 support (see From Source). The FTS5 index is created automatically on first hybrid search - no re-indexing needed.
sgrep uses a sophisticated multi-stage retrieval pipeline for maximum accuracy:
Query: "authentication middleware"
↓
┌─────────────────────────────────────────────────────────────────┐
│ Stage 1: Hybrid Retrieval (--hybrid) │
│ ┌───────────────┐ ┌───────────────┐ │
│ │ Semantic │ │ BM25 │ │
│ │ (DiskANN) │ │ (FTS5) │ │
│ │ 60% │ │ 40% │ │
│ └───────┬───────┘ └───────┬───────┘ │
│ └────────┬───────────┘ │
│ ↓ │
│ Top 50 candidates │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ Stage 2: ColBERT Late Interaction (--colbert) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Token-level similarity: MaxSim(query_tokens, doc_tokens) │ │
│ │ Scores all 50 candidates with fine-grained matching │ │
│ └───────────────────────────────────────────────────────────┘ │
│ ↓ │
│ Re-scored candidates │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ Stage 3: Cross-Encoder Reranking (--rerank) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Full attention: query ⊗ document → relevance score │ │
│ │ Reranks top 20 ColBERT results (~300-700ms) │ │
│ └───────────────────────────────────────────────────────────┘ │
│ ↓ │
│ Final ranked results │
└─────────────────────────────────────────────────────────────────┘
| Mode | Command | MRR | Latency | Best For |
|---|---|---|---|---|
| Semantic only | sgrep "query" |
0.61 | ~30ms | Quick searches |
| Hybrid + ColBERT | sgrep --hybrid --colbert "query" |
0.70 | ~200ms | Best accuracy for code |
| Hybrid | sgrep --hybrid "query" |
0.62 | ~50ms | Exact term matching |
| Cascade (all 3 stages) | sgrep --hybrid --colbert --rerank "query" |
0.60 | ~500ms | General text (not code) |
Recommended for code: Use --hybrid --colbert. ColBERT provides +13% MRR over plain hybrid.
Note: Cross-encoder reranking adds a third stage but currently hurts code search accuracy (MRR drops from 0.70 to 0.60). This is because available cross-encoder models (mxbai-rerank) are trained on general text, not code. Cross-encoder may help for non-code search tasks.
# Best accuracy (recommended)
sgrep --hybrid --colbert "authentication middleware"
# Quick search (semantic only)
sgrep "error handling"
# With custom weights
sgrep --hybrid --colbert --semantic-weight 0.5 --bm25-weight 0.5 "JWT token"# Basic setup (embedding model only, ~130MB)
sgrep setup
# With cross-encoder reranking (~1.6GB additional)
sgrep setup --with-rerankNote: ColBERT scoring uses the same embedding model—no additional setup required. Cross-encoder reranking requires a separate model download.
sgrep automatically handles meta-queries about your repository:
# These queries use document-level embeddings
sgrep "what does this repo do"
sgrep "project overview"
sgrep "purpose of this codebase"Document-level embeddings (mean of chunk embeddings per file) are computed during indexing, enabling README.md and other overview files to rank highly for repository-level questions.
Default output is minimal for token efficiency:
$ sgrep "authentication middleware"
auth/middleware.go:45-67
auth/jwt.go:12-38
handlers/login.go:89-112Use -c for context (still concise):
$ sgrep -c "authentication middleware"
auth/middleware.go:45-67
func AuthMiddleware(next http.Handler) http.Handler {
token := r.Header.Get("Authorization")
...
auth/jwt.go:12-38
func ValidateJWT(token string) (*Claims, error) {
...JSON output for programmatic use:
$ sgrep --json "authentication"
[{"file":"auth/middleware.go","start":45,"end":67,"score":0.92}]The search hierarchy for agents:
- sgrep - Find the right files/functions by intent
- ast-grep - Match structural patterns in those files
- ripgrep - Exact text search for specific symbols
Example workflow:
# Step 1: Semantic search to find relevant code
sgrep "rate limiting implementation"
# → api/ratelimit.go:20-80
# Step 2: AST pattern to find all similar usages
sg -p 'rateLimiter.Check($ctx, $key)'
# Step 3: Exact search for specific constant
rg "RATE_LIMIT_MAX"All data is stored in ~/.sgrep/:
~/.sgrep/
├── models/
│ └── nomic-embed-text-v1.5.Q8_0.gguf # Embedding model (~130MB)
├── repos/
│ ├── a1b2c3/ # Hash of /path/to/repo1
│ │ ├── index.db # libSQL database with DiskANN vectors
│ │ └── metadata.json # Repo path, index time
│ └── d4e5f6/ # Hash of /path/to/repo2
│ └── ...
├── server.pid # Embedding server PID
└── server.log # Embedding server logs
Use sgrep list to see all indexed repositories.
sgrep supports two vector storage backends:
| Backend | Build Command | Storage Efficiency | Best For |
|---|---|---|---|
| libSQL (default) | go build ./cmd/sgrep |
~5-10 KB/vector | Large repos, production |
| sqlite-vec | go build -tags=sqlite_vec ./cmd/sgrep |
~780 KB/vector | Development, compatibility |
libSQL advantages:
- Uses DiskANN for approximate nearest neighbor search
- 93-177x more space-efficient than sqlite-vec
- Native F32_BLOB column type for vectors
- Compress neighbors option for index compression
| Command | Description |
|---|---|
sgrep [query] |
Semantic search (default) |
sgrep index [path] |
Index a directory |
sgrep watch [path] |
Watch and auto-index |
sgrep list |
List all indexed repos |
sgrep status |
Show index status |
sgrep clear |
Clear index |
sgrep setup |
Download embedding model, verify llama-server |
sgrep setup --with-rerank |
Also download reranker model (~636MB) |
sgrep server start |
Manually start embedding server |
sgrep server stop |
Stop embedding server |
sgrep server status |
Show server status |
sgrep install-claude-code |
Install Claude Code plugin |
Install the sgrep plugin for Claude Code with one command:
sgrep install-claude-codeThis creates a plugin at ~/.claude/plugins/sgrep that:
- Auto-indexes your project when Claude Code starts
- Watch mode keeps the index updated as you code
- Skill documentation teaches Claude when to use sgrep vs ripgrep
After installation, restart Claude Code to activate. The plugin works automatically—Claude will use sgrep for semantic searches like "how does authentication work" while using ripgrep for exact matches.
| Flag | Description |
|---|---|
-n, --limit N |
Max results (default: 10) |
-c, --context |
Show code context |
--json |
JSON output for agents |
-q, --quiet |
Minimal output (paths only) |
--threshold F |
L2 distance threshold (default: 1.5, lower = stricter) |
-t, --include-tests |
Include test files in results (excluded by default) |
--all-chunks |
Show all matching chunks (disable deduplication) |
--hybrid |
Enable hybrid search (semantic + BM25) |
--colbert |
Enable ColBERT late interaction scoring (recommended with --hybrid) |
--semantic-weight F |
Weight for semantic score in hybrid mode (default: 0.6) |
--bm25-weight F |
Weight for BM25 score in hybrid mode (default: 0.4) |
--rerank |
Enable cross-encoder reranking (requires sgrep setup --with-rerank) |
-d, --debug |
Show debug timing information |
Environment variables:
SGREP_HOME=~/.sgrep # Data storage location
SGREP_ENDPOINT=http://localhost:8080 # Override embedding server URL
SGREP_PORT=8080 # Embedding server port
SGREP_DIMS=768 # Vector dimensions- Setup:
sgrep setupdownloads the embedding model and verifies llama-server - Indexing: Files are chunked using AST-aware splitting (Go, TS, Python) or size-based fallback
- Embedding: Each chunk is embedded via llama.cpp (local, $0 cost, auto-started)
- Storage: Vectors stored in libSQL with DiskANN indexing
- Search: Query embedded → DiskANN approximate nearest neighbor → load matching documents
Smart skip for large repos: When indexing repos with >1000 files, sgrep automatically filters out test files, generated code (*.pb.go, *.generated.go), and vendored directories to speed up indexing.
┌──────────────────────────────────────────────────────────────┐
│ sgrep │
├──────────────────────────────────────────────────────────────┤
│ Query: "error handling" │
│ ↓ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ llama.cpp │───▶│ DiskANN │───▶│ libSQL │ │
│ │ Embedding │ │ + BM25/FTS5 │ │ Documents │ │
│ │ (~15ms) │ │ (~10ms) │ │ (~5ms) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ▲ │ │
│ │ ▼ (with --colbert) │
│ │ ┌─────────────┐ │
│ │ │ ColBERT │ │
│ │ │ Late-Interx │ │
│ │ │ (~150ms) │ │
│ │ └──────┬──────┘ │
│ │ │ │
│ │ ▼ (with --rerank) │
│ │ ┌─────────────┐ │
│ │ │Cross-Encoder│ │
│ │ │ Reranker │ │
│ │ │ (~300-700ms)│ │
│ │ └─────────────┘ │
│ │ │
│ │ Auto-started by sgrep (16 parallel slots) │
│ │ (daemon mode, continuous batching) │
│ │
│ Recommended: --hybrid --colbert (~200ms, MRR 0.70) │
└──────────────────────────────────────────────────────────────┘
When --hybrid is enabled, sgrep combines semantic and lexical search:
Query: "authentication middleware"
↓
┌──────────────────────────────────────────────────────┐
│ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Semantic │ │ BM25 │ │
│ │ (Vectors) │ │ (FTS5) │ │
│ │ 60% │ │ 40% │ │
│ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │
│ └───────┬───────────────┘ │
│ ↓ │
│ ┌─────────────┐ │
│ │ Hybrid │ │
│ │ Ranking │ │
│ └─────────────┘ │
│ │
└──────────────────────────────────────────────────────┘
- Semantic: Understands intent ("auth" matches "authentication", "login", "session")
- BM25: Exact term matching with TF-IDF weighting (boosts exact "authentication" matches)
Benchmarked on maestro codebase (102 files, 1572 chunks, 768-dim vectors):
| Metric | sgrep | ripgrep |
|---|---|---|
| Latency (avg) | 31ms | 10ms |
| Token usage | 57% less | baseline |
| Attempts needed | 1 | 3-7 |
Embedding server optimization:
The llama.cpp server is configured for maximum throughput:
- 16 parallel slots with continuous batching (
-cb) - Dynamic thread count based on CPU cores
- GPU acceleration (Metal on Mac, CUDA on Linux)
The embedding model (nomic-embed-text) has a 2048 token context limit. sgrep handles this by:
- Default chunk size: 1000 tokens (with AST-aware splitting)
- Safety truncation at 1500 tokens in embedder
- Large functions/types split into parts automatically
Use sgrep as an embedded library in your Go application:
package main
import (
"context"
"fmt"
"log"
"github.com/XiaoConstantine/sgrep"
)
func main() {
ctx := context.Background()
// Create client for a codebase
client, err := sgrep.New("/path/to/codebase")
if err != nil {
log.Fatal(err)
}
defer client.Close()
// Index the codebase (required before searching)
if err := client.Index(ctx); err != nil {
log.Fatal(err)
}
// Search for code by semantic intent
results, err := client.Search(ctx, "authentication logic", 10)
if err != nil {
log.Fatal(err)
}
for _, r := range results {
fmt.Printf("%s:%d-%d (score: %.2f)\n", r.FilePath, r.StartLine, r.EndLine, r.Score)
}
}For more control, use the pkg/ subpackages directly:
pkg/index- Indexing and file watchingpkg/search- Search with cachingpkg/embed- Embedding generationpkg/store- Vector storagepkg/chunk- Code chunking with AST awareness
Apache-2.0