Persistent memory infrastructure for AI agents. Temporal tree architecture, entity-aware retrieval, and structured context that scales with your agents.
Same method names. Temporal tree memory.
from mem0 import MemoryClient
client = MemoryClient(api_key="m0-...")
from aivery import MemoryClient
client = MemoryClient(api_key="aivery-...")
Current systems treat memory as a flat list. Retrieval is a cosine search across everything you've ever stored — with no sense of time, relationships, or context.
With 1,500+ memories per agent, cosine similarity at K=50 covers only 3% of the corpus. The right memory is unreachable 70% of the time.
Existing systems can't answer "what was she thinking last March?" or "how has her plan changed since she started this job?" Time is invisible.
Questions that require connecting two facts — "who introduced Caroline to her startup contact?" — require structural memory, not keyword matching.
Every memory is placed into a tree at write time. Related memories become parent/child. Contradictions fork. Time flows from root to leaf — retrieval becomes branch traversal.
Each query activates entity clusters. Memories tagged with multiple queried entities rank first (intersection ordering). Hot branches surface recent context; cold branches preserve history.
Wide candidate retrieval (up to 200) is reranked by a cross-encoder that reads query and memory together — bridging the semantic gap between question-form queries and statement-form memories.
New memories are written immediately, validated asynchronously. Duplicates are deduplicated, contradictions flagged, and the tree kept clean — without blocking your agent's response time.
From personal chat memory to enterprise agent infrastructure — a complete stack built on the same temporal tree foundation.
A chat interface with real persistent memory. Think remembers what you tell it, how your plans evolve, and who matters to you — across every session.
The memory API for developers building AI agents and applications. Nine endpoints covering write, retrieve, context, validate, share, and more.
The agent runtime that connects memory to reasoning. Cortex wraps any LLM with a memory-aware loop — context retrieval, tool dispatch, and response generation in one place.
Bulk import your documents, notes, and data into memory. Upload a file, paste text, or POST to the API — Path extracts, validates, and structures everything automatically.
Continuous memory sync from your existing data sources. Pluggable connectors for filesystems, GitHub, Slack, and more — keeps memory current without lifting a finger.
Evaluated on LOCOMO — the industry's most comprehensive conversational memory benchmark. 1,540 questions across single-hop, temporal, multi-hop, and open-domain categories.
LLM Judge on same harness, same judge model (GPT-4.1-mini). Reranker identity contributes +0.002 in isolation — the gains come from temporal tree structure and retrieval width, not the reranker choice. Wide-K = 200 candidates → Cohere selects top 50.
Read the full paper →mem0 scores 0.137 on temporal questions. Aivery scores 0.623. When questions involve time — sequences, changes, relative dates — temporal tree structure is decisive.
Flat retrieval scores 0.29 on multi-hop. Tree + entity heatmap reaches 0.68. Full feature stack reaches 0.792. Intersection-count ordering — not reranking — enables cross-entity reasoning.
Expanding from K=50 to K=200 before reranking adds +0.123 — the single largest ablation step. Retrieval coverage is a first-class architectural concern, not a hyperparameter.
Personal plans for individuals, API plans for developers and teams. All plans include access to the core temporal tree architecture.
Unlimited agents, dedicated Fabric cluster, on-prem or VPC deployment, custom SLAs, and a co-development roadmap. Built for teams that can't afford to forget.
Start free. No credit card required. Five minutes to your first persistent memory.