This example was extracted from AGPA — my fully autonomous general-purpose agent (closed-source, ~150k LOC).
A local Retrieval-Augmented Generation (RAG) system for .NET that uses BERT embeddings and multiple search strategies for efficient semantic search and information retrieval.
LocalRAG provides a complete RAG implementation that runs entirely on your local machine, with no external API dependencies. It combines BERT-based embeddings with multiple search strategies to provide fast and accurate semantic search capabilities.
- BERT-based Text Embeddings: Uses ONNX Runtime for high-performance BERT inference
- Multiple Search Strategies:
- Locality-Sensitive Hashing (LSH) for efficient similarity search
- Full-Text Search (FTS5) integration via SQLite
- Memory-based vector indexing for real-time queries
- SQLite Database: Persistent storage for embeddings and metadata
- Configurable Processing: Adjustable chunking, overlap, and threading parameters
- Asynchronous API: Non-blocking operations for better performance
- Windows Forms Demo: Example application demonstrating usage
- .NET 10.0 SDK or later
- Windows, Linux, or macOS
- BERT ONNX model (see setup instructions below)
- BERT vocabulary file (vocab.txt)
- Clone the repository:
git clone https://github.com/yourusername/LocalRAG.git
cd LocalRAG- Restore NuGet packages:
dotnet restore-
Download a BERT model in ONNX format:
- Visit Hugging Face ONNX Models
- Models under Apache 2.0; see Hugging Face for details
- Download a BERT model (e.g.,
bert-base-uncasedorbert-large-uncased) - Place the
.onnxfile in theonnxBERT/directory - Download the corresponding
vocab.txtfile - Place it in the
Vocabularies/directory
-
Build the project:
dotnet buildusing LocalRAG;
// Configure the RAG system
var config = new RAGConfiguration
{
ModelPath = "onnxBERT/model.onnx",
VocabularyPath = "Vocabularies/vocab.txt",
DatabasePath = "Database/embeddings.db"
};
// Initialize the database
using var database = new EmbeddingDatabaseNew(config);
// Add documents
await database.AddRequestToEmbeddingDatabaseAsync(
requestId: "doc1",
theRequest: "What is machine learning?",
embed: true
);
await database.UpdateTextResponse(
requestId: "doc1",
message: "Machine learning is a subset of artificial intelligence...",
embed: true
);
// Search for similar content
var results = await database.SearchEmbeddingsAsync(
searchText: "artificial intelligence",
topK: 5,
minimumSimilarity: 0.75f
);
foreach (var result in results)
{
Console.WriteLine($"Similarity: {result.Similarity:F3}");
Console.WriteLine($"Request: {result.Request}");
Console.WriteLine($"Response: {result.TextResponse}");
}The RAGConfiguration class provides various settings:
public class RAGConfiguration
{
// File paths
public string DatabasePath { get; set; } // SQLite database location
public string ModelPath { get; set; } // ONNX model file
public string VocabularyPath { get; set; } // BERT vocab file
// Embedding settings
public int MaxSequenceLength { get; set; } = 512;
public int WordsPerString { get; set; } = 40;
public double OverlapPercentage { get; set; } = 15;
// LSH settings
public int NumberOfHashFunctions { get; set; } = 8;
public int NumberOfHashTables { get; set; } = 10;
// Performance settings
public int InterOpNumThreads { get; set; } = 32;
public int IntraOpNumThreads { get; set; } = 2;
public int MaxCacheItems { get; set; } = 10000;
}- EmbedderClassNew: Handles BERT embeddings generation using ONNX Runtime
- EmbeddingDatabaseNew: Main database interface with SQLite storage
- MemoryHashIndex: In-memory hash-based indexing for fast lookups
- FeedbackDatabaseValues: Data model for stored documents and embeddings
- Text is preprocessed (tokenized, stop words removed)
- BERT generates embeddings via ONNX Runtime
- Embeddings are indexed using LSH for fast retrieval
- Multiple search strategies are combined for optimal results
- Results are ranked by similarity score
The DemoApp project provides a Windows Forms application demonstrating LocalRAG usage:
cd DemoApp
dotnet runThe demo shows:
- Adding documents with embeddings
- Searching for similar content
- Retrieving conversation history
- Formatting search results
- First Run: Initial embedding generation may be slow
- Caching: Frequently accessed embeddings are cached in memory
- Threading: Adjust
InterOpNumThreadsandIntraOpNumThreadsbased on your CPU - Database Size: SQLite performs well up to several million embeddings
Ensure the ONNX model file exists at the configured ModelPath. Download from Hugging Face if needed.
Reduce MaxCacheItems or MaxSequenceLength in configuration.
- Use a smaller BERT model (base vs. large)
- Increase thread count if you have more CPU cores
- Enable GPU support via ONNX Runtime GPU packages
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the Apache License 2.0 - see LICENSE.txt for details.
- Built with ONNX Runtime
- Uses FastBertTokenizer for tokenization
- BERT models from Hugging Face
- GPU acceleration support
- More embedding models (Sentence Transformers, etc.)
- Vector database integration options
- REST API interface
- Multi-language support
For questions and issues, please open an issue on GitHub