- [01/20/2026] SimpleMem is now available on PyPI! π¦ You can now install SimpleMem directly via
pip install simplememfor easier setup and integration. View Package Usage Guide β - [01/19/2026] Added Local Memory Storage for SimpleMem Skill! πΎ SimpleMem Skill now supports local memory storage, enabling seamless memory retention and management directly within Claude Skills.
- [01/18/2026] SimpleMem now supports Claude Skills! π Use SimpleMem in claude.ai to remember long-term information and project history across conversations. Register at mcp.simplemem.cloud, add the domain to Claude's network whitelist, configure with your token in the skill file, and import the skill!
- [01/14/2026] SimpleMem MCP Server is now LIVE and Open Source! π Experience SimpleMem as a cloud-hosted memory service at mcp.simplemem.cloud. Easily integrate with your favorite chat platforms (LM Studio, Cherry Studio) and AI agents (Cursor, Claude Desktop) using the Streamable HTTP MCP protocol. The MCP implementation features production-ready optimizations including multi-tenant user isolation, faster response times, and enhanced security. View MCP Documentation β
- [01/08/2026] We've set up a Discord server and WeChat group to make it easier to collaborate and exchange ideas on this project. Welcome to join the Group to share your thoughts, ask questions, or contribute your ideas! π₯ Join our Discord and WeChat Group Now!
- [01/05/2026] SimpleMem paper was released on arXiv!
- π Overview
- π― Key Contributions
- π Performance Highlights
- π¦ Installation
- β‘ Quick Start
- π MCP Server
- π Evaluation
- π File Structure
- π Citation
- π License
- π Acknowledgments
SimpleMem achieves superior F1 score (43.24%) with minimal token cost (~550), occupying the ideal top-left position.
SimpleMem is an efficient memory framework based on semantic lossless compression that addresses the fundamental challenge of efficient long-term memory for LLM agents. Unlike existing systems that either passively accumulate redundant context or rely on expensive iterative reasoning loops, SimpleMem maximizes information density and token utilization through a three-stage pipeline:
|
Semantic Structured Compression Distills unstructured interactions into compact, multi-view indexed memory units |
Online Semantic Synthesis Intra-session process that instantly integrates related context into unified abstract representations to eliminate redundancy |
Intent-Aware Retrieval Planning Infers search intent to dynamically determine retrieval scope and construct precise context efficiently |
The SimpleMem Architecture: (1) Semantic Structured Compression filters low-utility dialogue and converts informative windows into compact, context-independent memory units. (2) Online Semantic Synthesis consolidates related fragments during writing, maintaining a compact and coherent memory topology. (3) Intent-Aware Retrieval Planning infers search intent to adapt retrieval scope and query forms, enabling parallel multi-view retrieval and token-efficient context construction.
Speed Comparison Demo
SimpleMem vs. Baseline: Real-time speed comparison demonstration
LoCoMo-10 Benchmark Results (GPT-4.1-mini)
| Model | β±οΈ Construction Time | π Retrieval Time | β‘ Total Time | π― Average F1 |
|---|---|---|---|---|
| A-Mem | 5140.5s | 796.7s | 5937.2s | 32.58% |
| LightMem | 97.8s | 577.1s | 675.9s | 24.63% |
| Mem0 | 1350.9s | 583.4s | 1934.3s | 34.20% |
| SimpleMem β | 92.6s | 388.3s | 480.9s | 43.24% |
π‘ Key Advantages:
- π Highest F1 Score: 43.24% (+26.4% vs. Mem0, +75.6% vs. LightMem)
- β‘ Fastest Retrieval: 388.3s (32.7% faster than LightMem, 51.3% faster than Mem0)
- π Fastest End-to-End: 480.9s total processing time (12.5Γ faster than A-Mem)
SimpleMem applies a semantic density gating mechanism via LLM-based qualitative assessment to filter redundant interaction content. The system reformulates raw dialogue streams into compact memory units β self-contained facts with resolved coreferences and absolute timestamps. Each unit is indexed through three complementary representations for flexible retrieval:
| π Layer | π Type | π― Purpose | π οΈ Implementation |
|---|---|---|---|
| Semantic | Dense | Conceptual similarity | Vector embeddings (1024-d) |
| Lexical | Sparse | Exact term matching | BM25-style keyword index |
| Symbolic | Metadata | Structured filtering | Timestamps, entities, persons |
β¨ Example Transformation:
- Input: "He'll meet Bob tomorrow at 2pm" [β relative, ambiguous]
+ Output: "Alice will meet Bob at Starbucks on 2025-11-16T14:00:00" [β
absolute, atomic]Unlike traditional systems that rely on asynchronous background maintenance, SimpleMem performs synthesis on-the-fly during the write phase. Related memory units are synthesized into higher-level abstract representations within the current session scope, allowing repetitive or structurally similar experiences to be denoised and compressed immediately.
β¨ Example Synthesis:
- Fragment 1: "User wants coffee"
- Fragment 2: "User prefers oat milk"
- Fragment 3: "User likes it hot"
+ Consolidated: "User prefers hot coffee with oat milk"This proactive synthesis ensures the memory topology remains compact and free of redundant fragmentation.
Instead of fixed-depth retrieval, SimpleMem leverages the reasoning capabilities of the LLM to generate a comprehensive retrieval plan. Given a query, the planning module infers latent search intent to dynamically determine retrieval scope and depth:
The system then executes parallel multi-view retrieval across semantic, lexical, and symbolic indexes, and merges results through ID-based deduplication:
|
πΉ Simple Queries
|
πΈ Complex Queries
|
π Result: 43.24% F1 score with 30Γ fewer tokens than full-context methods.
π¬ High-Capability Models (GPT-4.1-mini)
| Task Type | SimpleMem F1 | Mem0 F1 | Improvement |
|---|---|---|---|
| MultiHop | 43.46% | 30.14% | +43.8% |
| Temporal | 58.62% | 48.91% | +19.9% |
| SingleHop | 51.12% | 41.3% | +23.8% |
βοΈ Efficient Models (Qwen2.5-1.5B)
| Metric | SimpleMem | Mem0 | Notes |
|---|---|---|---|
| Average F1 | 25.23% | 23.77% | Competitive with 99Γ smaller model |
- Ensure you are using Python 3.10 in your active environment, not just installed globally.
- An OpenAI-compatible API key must be configured before running any memory construction or retrieval, otherwise initialization may fail.
- When using non-OpenAI providers (e.g., Qwen or Azure OpenAI), verify both the model name and
OPENAI_BASE_URLinconfig.py. - For large dialogue datasets, enabling parallel processing can significantly reduce memory construction time.
- π Python 3.10
- π OpenAI-compatible API (OpenAI, Qwen, Azure OpenAI, etc.)
# π₯ Clone repository
git clone https://github.com/aiming-lab/SimpleMem.git
cd SimpleMem
# π¦ Install dependencies
pip install -r requirements.txt
# βοΈ Configure API settings
cp config.py.example config.py
# Edit config.py with your API key and preferences# config.py
OPENAI_API_KEY = "your-api-key"
OPENAI_BASE_URL = None # or custom endpoint for Qwen/Azure
LLM_MODEL = "gpt-4.1-mini"
EMBEDDING_MODEL = "Qwen/Qwen3-Embedding-0.6B" # State-of-the-art retrievalAt a high level, SimpleMem works as a long-term memory system for LLM-based agents. The workflow consists of three simple steps:
- Store information β Dialogues or facts are processed and converted into structured, atomic memories.
- Index memory β Stored memories are organized using semantic embeddings and structured metadata.
- Retrieve relevant memory β When a query is made, SimpleMem retrieves the most relevant stored information based on meaning rather than keywords.
This design allows LLM agents to maintain context, recall past information efficiently, and avoid repeatedly processing redundant history.
If you encounter issues while setting up or running SimpleMem for the first time, check the following common cases:
- Ensure your API key is correctly set in
config.py - For OpenAI-compatible providers (Qwen, Azure, etc.), verify that
OPENAI_BASE_URLis configured correctly - Restart your Python environment after updating the key
- SimpleMem requires Python 3.10
- Check your version using:
python --version
from main import SimpleMemSystem
# π Initialize system
system = SimpleMemSystem(clear_db=True)
# π¬ Add dialogues (Stage 1: Semantic Structured Compression)
system.add_dialogue("Alice", "Bob, let's meet at Starbucks tomorrow at 2pm", "2025-11-15T14:30:00")
system.add_dialogue("Bob", "Sure, I'll bring the market analysis report", "2025-11-15T14:31:00")
# β
Finalize atomic encoding
system.finalize()
# π Query with intent-aware retrieval (Stage 3: Intent-Aware Retrieval Planning)
answer = system.ask("When and where will Alice and Bob meet?")
print(answer)
# Output: "16 November 2025 at 2:00 PM at Starbucks"For large-scale dialogue processing, enable parallel mode:
system = SimpleMemSystem(
clear_db=True,
enable_parallel_processing=True, # β‘ Parallel memory building
max_parallel_workers=8,
enable_parallel_retrieval=True, # π Parallel query execution
max_retrieval_workers=4
)π‘ Pro Tip: Parallel processing significantly reduces latency for batch operations!
SimpleMem is available as a cloud-hosted memory service via the Model Context Protocol (MCP), enabling seamless integration with AI assistants like Claude Desktop, Cursor, and other MCP-compatible clients.
π Cloud Service: mcp.simplemem.cloud
| Feature | Description |
|---|---|
| Streamable HTTP | MCP 2025-03-26 protocol with JSON-RPC 2.0 |
| Multi-tenant Isolation | Per-user data tables with token authentication |
| Hybrid Retrieval | Semantic search + keyword matching + metadata filtering |
| Production Optimized | Faster response times with OpenRouter integration |
{
"mcpServers": {
"simplemem": {
"url": "https://mcp.simplemem.cloud/mcp",
"headers": {
"Authorization": "Bearer YOUR_TOKEN"
}
}
}
}π For detailed setup instructions and self-hosting guide, see MCP Documentation
# π― Full LoCoMo benchmark
python test_locomo10.py
# π Subset evaluation (5 samples)
python test_locomo10.py --num-samples 5
# πΎ Custom output file
python test_locomo10.py --result-file my_results.jsonUse the exact configurations in config.py:
- π High-capability: GPT-4.1-mini, Qwen3-Plus
- βοΈ Efficient: Qwen2.5-1.5B, Qwen2.5-3B
- π Embedding: Qwen3-Embedding-0.6B (1024-d)
If you use SimpleMem in your research, please cite:
@article{simplemem2025,
title={SimpleMem: Efficient Lifelong Memory for LLM Agents},
author={Liu, Jiaqi and Su, Yaofeng and Xia, Peng and Zhou, Yiyang and Han, Siwei and Zheng, Zeyu and Xie, Cihang and Ding, Mingyu and Yao, Huaxiu},
journal={arXiv preprint arXiv:2601.02553},
year={2025},
url={https://github.com/aiming-lab/SimpleMem}
}This project is licensed under the MIT License - see the LICENSE file for details.
We would like to thank the following projects and teams:
- π Embedding Model: Qwen3-Embedding - State-of-the-art retrieval performance
- ποΈ Vector Database: LanceDB - High-performance columnar storage
- π Benchmark: LoCoMo - Long-context memory evaluation framework
