sgrep - Smart Grep for Code

Semantic + hybrid code search that complements ripgrep and ast-grep.

┌─────────────────────────────────────────────────────────────────┐
│  ripgrep (rg)     │  ast-grep (sg)    │  sgrep              │
│  ─────────────    │  ──────────────   │  ──────             │
│  Exact text/regex │  AST patterns     │  Semantic + hybrid  │
│  "findUser"       │  $fn($args)       │  "auth validation"  │
└─────────────────────────────────────────────────────────────────┘

Why sgrep?

Coding agents (Amp, Claude Code, Cursor) waste tokens on failed grep attempts when searching for concepts rather than exact strings. sgrep understands what you mean, not just what you type.

# ❌ Agent tries 10+ grep patterns, burns 2000 tokens
rg "authenticate" && rg "auth" && rg "login" && rg "session" ...

# ✅ One semantic query, 50 tokens
sgrep "how does user authentication work"

Installation

Homebrew (macOS/Linux)

brew tap XiaoConstantine/tap
brew install sgrep

Quick Install (curl)

curl -fsSL https://raw.githubusercontent.com/XiaoConstantine/sgrep/main/install.sh | bash

Go Install

go install github.com/XiaoConstantine/sgrep/cmd/sgrep@latest

From Source

git clone https://github.com/XiaoConstantine/sgrep.git
cd sgrep

# Default build (uses libSQL with DiskANN vector search)
go build -o sgrep ./cmd/sgrep

# Alternative: sqlite-vec backend
go build -tags=sqlite_vec -o sgrep ./cmd/sgrep

Requirements: llama.cpp (for the embedding server)

brew install llama.cpp   # macOS
# or build from source: https://github.com/ggerganov/llama.cpp

As Library

go get github.com/XiaoConstantine/sgrep@latest

Quick Start

# One-time setup: downloads embedding model (~130MB)
sgrep setup

# Index your codebase (auto-starts embedding server)
sgrep index .

# Semantic search (quick)
sgrep "error handling for database connections"

# Hybrid + ColBERT (recommended - best accuracy)
sgrep --hybrid --colbert "JWT token validation logic"
sgrep --hybrid --colbert "how are API rate limits implemented"

# Hybrid with custom weights
sgrep --hybrid --colbert "authentication middleware" --semantic-weight 0.5 --bm25-weight 0.5

# Watch mode (background indexing)
sgrep watch .

The embedding server starts automatically when needed and stays running as a daemon.

Hybrid Search

Hybrid search combines semantic understanding with lexical matching (BM25) for improved accuracy. This helps when:

Searching for specific technical terms (e.g., "JWT", "OAuth", "mutex")
The query contains exact function/variable names
Semantic search alone misses exact keyword matches

# Default: semantic-only search
sgrep "authentication"

# Hybrid: semantic (60%) + BM25 (40%) - default weights
sgrep --hybrid "authentication"

# Custom weights: more emphasis on exact matches
sgrep --hybrid --semantic-weight 0.4 --bm25-weight 0.6 "parseAST"

Note: Hybrid search requires building with FTS5 support (see From Source). The FTS5 index is created automatically on first hybrid search - no re-indexing needed.

Multi-Stage Retrieval Pipeline

sgrep uses a sophisticated multi-stage retrieval pipeline for maximum accuracy:

Query: "authentication middleware"
         ↓
┌─────────────────────────────────────────────────────────────────┐
│ Stage 1: Hybrid Retrieval (--hybrid)                            │
│ ┌───────────────┐    ┌───────────────┐                         │
│ │   Semantic    │    │     BM25      │                         │
│ │  (DiskANN)    │    │    (FTS5)     │                         │
│ │     60%       │    │     40%       │                         │
│ └───────┬───────┘    └───────┬───────┘                         │
│         └────────┬───────────┘                                  │
│                  ↓                                              │
│         Top 50 candidates                                       │
└─────────────────────────────────────────────────────────────────┘
                   ↓
┌─────────────────────────────────────────────────────────────────┐
│ Stage 2: ColBERT Late Interaction (--colbert)                   │
│ ┌───────────────────────────────────────────────────────────┐  │
│ │  Token-level similarity: MaxSim(query_tokens, doc_tokens) │  │
│ │  Scores all 50 candidates with fine-grained matching      │  │
│ └───────────────────────────────────────────────────────────┘  │
│                  ↓                                              │
│         Re-scored candidates                                    │
└─────────────────────────────────────────────────────────────────┘
                   ↓
┌─────────────────────────────────────────────────────────────────┐
│ Stage 3: Cross-Encoder Reranking (--rerank)                     │
│ ┌───────────────────────────────────────────────────────────┐  │
│ │  Full attention: query ⊗ document → relevance score       │  │
│ │  Reranks top 20 ColBERT results (~300-700ms)              │  │
│ └───────────────────────────────────────────────────────────┘  │
│                  ↓                                              │
│         Final ranked results                                    │
└─────────────────────────────────────────────────────────────────┘

Retrieval Modes

Mode	Command	MRR	Latency	Best For
Semantic only	`sgrep "query"`	0.61	~30ms	Quick searches
Hybrid + ColBERT	`sgrep --hybrid --colbert "query"`	0.70	~200ms	Best accuracy for code
Hybrid	`sgrep --hybrid "query"`	0.62	~50ms	Exact term matching
Cascade (all 3 stages)	`sgrep --hybrid --colbert --rerank "query"`	0.60	~500ms	General text (not code)

Recommended for code: Use --hybrid --colbert. ColBERT provides +13% MRR over plain hybrid.

Note: Cross-encoder reranking adds a third stage but currently hurts code search accuracy (MRR drops from 0.70 to 0.60). This is because available cross-encoder models (mxbai-rerank) are trained on general text, not code. Cross-encoder may help for non-code search tasks.

# Best accuracy (recommended)
sgrep --hybrid --colbert "authentication middleware"

# Quick search (semantic only)
sgrep "error handling"

# With custom weights
sgrep --hybrid --colbert --semantic-weight 0.5 --bm25-weight 0.5 "JWT token"

Setup

# Basic setup (embedding model only, ~130MB)
sgrep setup

# With cross-encoder reranking (~1.6GB additional)
sgrep setup --with-rerank

Note: ColBERT scoring uses the same embedding model—no additional setup required. Cross-encoder reranking requires a separate model download.

Document-Level Search

sgrep automatically handles meta-queries about your repository:

# These queries use document-level embeddings
sgrep "what does this repo do"
sgrep "project overview"
sgrep "purpose of this codebase"

Document-level embeddings (mean of chunk embeddings per file) are computed during indexing, enabling README.md and other overview files to rank highly for repository-level questions.

Agent-Optimized Output

Default output is minimal for token efficiency:

$ sgrep "authentication middleware"
auth/middleware.go:45-67
auth/jwt.go:12-38
handlers/login.go:89-112

Use -c for context (still concise):

$ sgrep -c "authentication middleware"
auth/middleware.go:45-67
  func AuthMiddleware(next http.Handler) http.Handler {
      token := r.Header.Get("Authorization")
      ...

auth/jwt.go:12-38
  func ValidateJWT(token string) (*Claims, error) {
      ...

JSON output for programmatic use:

$ sgrep --json "authentication"
[{"file":"auth/middleware.go","start":45,"end":67,"score":0.92}]

Combining with ripgrep and ast-grep

The search hierarchy for agents:

sgrep - Find the right files/functions by intent
ast-grep - Match structural patterns in those files
ripgrep - Exact text search for specific symbols

Example workflow:

# Step 1: Semantic search to find relevant code
sgrep "rate limiting implementation" 
# → api/ratelimit.go:20-80

# Step 2: AST pattern to find all similar usages
sg -p 'rateLimiter.Check($ctx, $key)' 

# Step 3: Exact search for specific constant
rg "RATE_LIMIT_MAX"

Storage

All data is stored in ~/.sgrep/:

~/.sgrep/
├── models/
│   └── nomic-embed-text-v1.5.Q8_0.gguf   # Embedding model (~130MB)
├── repos/
│   ├── a1b2c3/              # Hash of /path/to/repo1
│   │   ├── index.db         # libSQL database with DiskANN vectors
│   │   └── metadata.json    # Repo path, index time
│   └── d4e5f6/              # Hash of /path/to/repo2
│       └── ...
├── server.pid               # Embedding server PID
└── server.log               # Embedding server logs

Use sgrep list to see all indexed repositories.

Storage Backends

sgrep supports two vector storage backends:

Backend	Build Command	Storage Efficiency	Best For
libSQL (default)	`go build ./cmd/sgrep`	~5-10 KB/vector	Large repos, production
sqlite-vec	`go build -tags=sqlite_vec ./cmd/sgrep`	~780 KB/vector	Development, compatibility

libSQL advantages:

Uses DiskANN for approximate nearest neighbor search
93-177x more space-efficient than sqlite-vec
Native F32_BLOB column type for vectors
Compress neighbors option for index compression

Commands

Command	Description
`sgrep [query]`	Semantic search (default)
`sgrep index [path]`	Index a directory
`sgrep watch [path]`	Watch and auto-index
`sgrep list`	List all indexed repos
`sgrep status`	Show index status
`sgrep clear`	Clear index
`sgrep setup`	Download embedding model, verify llama-server
`sgrep setup --with-rerank`	Also download reranker model (~636MB)
`sgrep server start`	Manually start embedding server
`sgrep server stop`	Stop embedding server
`sgrep server status`	Show server status
`sgrep install-claude-code`	Install Claude Code plugin

Claude Code Integration

Install the sgrep plugin for Claude Code with one command:

sgrep install-claude-code

This creates a plugin at ~/.claude/plugins/sgrep that:

Auto-indexes your project when Claude Code starts
Watch mode keeps the index updated as you code
Skill documentation teaches Claude when to use sgrep vs ripgrep

After installation, restart Claude Code to activate. The plugin works automatically—Claude will use sgrep for semantic searches like "how does authentication work" while using ripgrep for exact matches.

Flags

Flag	Description
`-n, --limit N`	Max results (default: 10)
`-c, --context`	Show code context
`--json`	JSON output for agents
`-q, --quiet`	Minimal output (paths only)
`--threshold F`	L2 distance threshold (default: 1.5, lower = stricter)
`-t, --include-tests`	Include test files in results (excluded by default)
`--all-chunks`	Show all matching chunks (disable deduplication)
`--hybrid`	Enable hybrid search (semantic + BM25)
`--colbert`	Enable ColBERT late interaction scoring (recommended with --hybrid)
`--semantic-weight F`	Weight for semantic score in hybrid mode (default: 0.6)
`--bm25-weight F`	Weight for BM25 score in hybrid mode (default: 0.4)
`--rerank`	Enable cross-encoder reranking (requires `sgrep setup --with-rerank`)
`-d, --debug`	Show debug timing information

Configuration

Environment variables:

SGREP_HOME=~/.sgrep                    # Data storage location
SGREP_ENDPOINT=http://localhost:8080   # Override embedding server URL
SGREP_PORT=8080                        # Embedding server port
SGREP_DIMS=768                         # Vector dimensions

How It Works

Setup: sgrep setup downloads the embedding model and verifies llama-server
Indexing: Files are chunked using AST-aware splitting (Go, TS, Python) or size-based fallback
Embedding: Each chunk is embedded via llama.cpp (local, $0 cost, auto-started)
Storage: Vectors stored in libSQL with DiskANN indexing
Search: Query embedded → DiskANN approximate nearest neighbor → load matching documents

Smart skip for large repos: When indexing repos with >1000 files, sgrep automatically filters out test files, generated code (*.pb.go, *.generated.go), and vendored directories to speed up indexing.

Architecture

┌──────────────────────────────────────────────────────────────┐
│                         sgrep                                │
├──────────────────────────────────────────────────────────────┤
│  Query: "error handling"                                     │
│         ↓                                                    │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐      │
│  │ llama.cpp   │───▶│  DiskANN    │───▶│   libSQL    │      │
│  │ Embedding   │    │ + BM25/FTS5 │    │  Documents  │      │
│  │   (~15ms)   │    │   (~10ms)   │    │   (~5ms)    │      │
│  └─────────────┘    └─────────────┘    └─────────────┘      │
│       ▲                    │                                 │
│       │                    ▼ (with --colbert)                │
│       │              ┌─────────────┐                         │
│       │              │  ColBERT    │                         │
│       │              │ Late-Interx │                         │
│       │              │  (~150ms)   │                         │
│       │              └──────┬──────┘                         │
│       │                     │                                │
│       │                     ▼ (with --rerank)                │
│       │              ┌─────────────┐                         │
│       │              │Cross-Encoder│                         │
│       │              │  Reranker   │                         │
│       │              │ (~300-700ms)│                         │
│       │              └─────────────┘                         │
│       │                                                      │
│       │ Auto-started by sgrep (16 parallel slots)           │
│       │ (daemon mode, continuous batching)                  │
│                                                              │
│  Recommended: --hybrid --colbert (~200ms, MRR 0.70)         │
└──────────────────────────────────────────────────────────────┘

Hybrid Search Architecture

When --hybrid is enabled, sgrep combines semantic and lexical search:

Query: "authentication middleware"
         ↓
  ┌──────────────────────────────────────────────────────┐
  │                                                      │
  │  ┌─────────────┐         ┌─────────────┐           │
  │  │  Semantic   │         │    BM25     │           │
  │  │  (Vectors)  │         │   (FTS5)    │           │
  │  │    60%      │         │    40%      │           │
  │  └──────┬──────┘         └──────┬──────┘           │
  │         │                       │                   │
  │         └───────┬───────────────┘                   │
  │                 ↓                                   │
  │         ┌─────────────┐                            │
  │         │   Hybrid    │                            │
  │         │   Ranking   │                            │
  │         └─────────────┘                            │
  │                                                      │
  └──────────────────────────────────────────────────────┘

Semantic: Understands intent ("auth" matches "authentication", "login", "session")
BM25: Exact term matching with TF-IDF weighting (boosts exact "authentication" matches)

Performance

Benchmarked on maestro codebase (102 files, 1572 chunks, 768-dim vectors):

Metric	sgrep	ripgrep
Latency (avg)	31ms	10ms
Token usage	57% less	baseline
Attempts needed	1	3-7

Embedding server optimization:

The llama.cpp server is configured for maximum throughput:

16 parallel slots with continuous batching (-cb)
Dynamic thread count based on CPU cores
GPU acceleration (Metal on Mac, CUDA on Linux)

Chunk Size Limits

The embedding model (nomic-embed-text) has a 2048 token context limit. sgrep handles this by:

Default chunk size: 1000 tokens (with AST-aware splitting)
Safety truncation at 1500 tokens in embedder
Large functions/types split into parts automatically

Library Usage

Use sgrep as an embedded library in your Go application:

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/XiaoConstantine/sgrep"
)

func main() {
    ctx := context.Background()
    
    // Create client for a codebase
    client, err := sgrep.New("/path/to/codebase")
    if err != nil {
        log.Fatal(err)
    }
    defer client.Close()

    // Index the codebase (required before searching)
    if err := client.Index(ctx); err != nil {
        log.Fatal(err)
    }

    // Search for code by semantic intent
    results, err := client.Search(ctx, "authentication logic", 10)
    if err != nil {
        log.Fatal(err)
    }

    for _, r := range results {
        fmt.Printf("%s:%d-%d (score: %.2f)\n", r.FilePath, r.StartLine, r.EndLine, r.Score)
    }
}

For more control, use the pkg/ subpackages directly:

pkg/index - Indexing and file watching
pkg/search - Search with caching
pkg/embed - Embedding generation
pkg/store - Vector storage
pkg/chunk - Code chunking with AST awareness

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.amp/skills		.amp/skills
.claude-plugin		.claude-plugin
.claude/commands		.claude/commands
.github/workflows		.github/workflows
bench		bench
cmd		cmd
internal		internal
pkg		pkg
plugins/sgrep		plugins/sgrep
scripts		scripts
.gitignore		.gitignore
.goreleaser.yaml		.goreleaser.yaml
AGENTS.md		AGENTS.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
benchmark_results.json		benchmark_results.json
go.mod		go.mod
go.sum		go.sum
install.sh		install.sh
profile		profile
sgrep.go		sgrep.go

License

XiaoConstantine/sgrep

Folders and files

Latest commit

History

Repository files navigation

sgrep - Smart Grep for Code

Why sgrep?

Installation

Homebrew (macOS/Linux)

Quick Install (curl)

Go Install

From Source

As Library

Quick Start

Hybrid Search

Multi-Stage Retrieval Pipeline

Retrieval Modes

Setup

Document-Level Search

Agent-Optimized Output

Combining with ripgrep and ast-grep

Storage

Storage Backends

Commands

Claude Code Integration

Flags

Configuration

How It Works

Architecture

Hybrid Search Architecture

Performance

Chunk Size Limits

Library Usage

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Contributors 2

Uh oh!

Languages

Packages