Now in early access

AI that
actually
remembers.

Persistent memory infrastructure for AI agents. Temporal tree architecture, entity-aware retrieval, and structured context that scales with your agents.

+48.6% temporal reasoning vs mem0
+25.0% multi-hop reasoning vs mem0
0.823 LLM judge score (LOCOMO)
Memory · Caroline · conv_003
Caroline has been exploring career changes
2024-03-01 · root · topic:career
Applied for senior role at tech startup downtown
2024-03-14 · update · entities:caroline,startup
Received offer — $145k, 4 weeks vacation
2024-03-22 · refinement
Considering negotiating for remote flexibility
2024-03-23 · fork
Also interviewing at two other companies
2024-03-15 · parallel · entities:caroline
Retrieval confidence
0.87
Drop-in replacement

Already using mem0?
One line to switch.

Same method names. Temporal tree memory.

Before
from mem0 import MemoryClient
client = MemoryClient(api_key="m0-...")
After
from aivery import MemoryClient
client = MemoryClient(api_key="aivery-...")

Memory is the missing layer
in AI infrastructure

Current systems treat memory as a flat list. Retrieval is a cosine search across everything you've ever stored — with no sense of time, relationships, or context.

🔍

Flat retrieval fails at scale

With 1,500+ memories per agent, cosine similarity at K=50 covers only 3% of the corpus. The right memory is unreachable 70% of the time.

No sense of time

Existing systems can't answer "what was she thinking last March?" or "how has her plan changed since she started this job?" Time is invisible.

🕸

Multi-hop reasoning breaks

Questions that require connecting two facts — "who introduced Caroline to her startup contact?" — require structural memory, not keyword matching.

Memory with structure,
not just storage

01

Temporal tree placement

Every memory is placed into a tree at write time. Related memories become parent/child. Contradictions fork. Time flows from root to leaf — retrieval becomes branch traversal.

02

Entity heatmap activation

Each query activates entity clusters. Memories tagged with multiple queried entities rank first (intersection ordering). Hot branches surface recent context; cold branches preserve history.

03

Reranking + context stitching

Wide candidate retrieval (up to 200) is reranked by a cross-encoder that reads query and memory together — bridging the semantic gap between question-form queries and statement-form memories.

04

Async validation pipeline

New memories are written immediately, validated asynchronously. Duplicates are deduplicated, contradictions flagged, and the tree kept clean — without blocking your agent's response time.

Entity heatmap · query: "Caroline's job offer"
🔥 hot Activated branches
Caroline · career startup · offer salary · negotiation remote work
🌿 warm Related entities
Melanie · friend apartment search
❄ cold Preserved history
previous job · retail college · 2019
Reranking · top 50 of 200 candidates
Received offer $145k
0.94
Considering remote
0.81
Applied to startup
0.73

One memory layer,
four surfaces

From personal chat memory to enterprise agent infrastructure — a complete stack built on the same temporal tree foundation.

⚙️
Developer API

Aivery Fabric

The memory API for developers building AI agents and applications. Nine endpoints covering write, retrieve, context, validate, share, and more.

POST /memory/context — structured context block for your LLM
POST /memory/validate — semantic dedup before storage
Hybrid vector + lexical retrieval, org-scoped
Plug-in reranker: Cohere or self-hosted ONNX
🤖
Agent Runtime

Aivery Cortex

The agent runtime that connects memory to reasoning. Cortex wraps any LLM with a memory-aware loop — context retrieval, tool dispatch, and response generation in one place.

📥
Ingestion

Aivery Path

Bulk import your documents, notes, and data into memory. Upload a file, paste text, or POST to the API — Path extracts, validates, and structures everything automatically.

🔄
Sync

Aivery Wind

Continuous memory sync from your existing data sources. Pluggable connectors for filesystems, GitHub, Slack, and more — keeps memory current without lifting a finger.

Benchmarked against
leading memory systems

Evaluated on LOCOMO — the industry's most comprehensive conversational memory benchmark. 1,540 questions across single-hop, temporal, multi-hop, and open-domain categories.

System LLM Judge vs mem0
Aivery · Sonnet · wide-K 0.823 +0.184
Aivery · GPT-4.1 · wide-K 0.800 +0.162
mem0 0.638 baseline
Flat retrieval + Cohere 0.557
Flat retrieval (no reranker) 0.555

LLM Judge on same harness, same judge model (GPT-4.1-mini). Reranker identity contributes +0.002 in isolation — the gains come from temporal tree structure and retrieval width, not the reranker choice. Wide-K = 200 candidates → Cohere selects top 50.

Read the full paper →
+48.6%

Temporal reasoning

mem0 scores 0.137 on temporal questions. Aivery scores 0.623. When questions involve time — sequences, changes, relative dates — temporal tree structure is decisive.

+25.0%

Multi-hop reasoning

Flat retrieval scores 0.29 on multi-hop. Tree + entity heatmap reaches 0.68. Full feature stack reaches 0.792. Intersection-count ordering — not reranking — enables cross-entity reasoning.

+12.3%

Wide retrieval uplift

Expanding from K=50 to K=200 before reranking adds +0.123 — the single largest ablation step. Retrieval coverage is a first-class architectural concern, not a hyperparameter.

Start free.
Scale as you grow.

Personal plans for individuals, API plans for developers and teams. All plans include access to the core temporal tree architecture.

Personal
For individuals exploring AI-assisted memory
$0 forever
500 memories
Think chat interface
1 agent
Core retrieval
Get started
Pro+
For local-first users and advanced builders
$29 /mo
$249/yr — save 28%
Unlimited memories
Local model support
Local agent mode
Path bulk import
Priority support
Get started
Growth
For small teams and startups building with agents
$149 /mo
$1,490/yr — save 17%
3 users, 2 agents
500K memories, 20GB
Fabric API
Cortex agent runtime
Path bulk import
1 Wind connection
Standard support
Get started
Limited availability

Founding Team

$500/mo
Price locks in permanently when you join. Retails at $2,500/mo after founding slots close.
10 users, 5 agents
5M memories, 100GB
Full Fabric API
Cortex + Path + Wind
3 Wind connectors
Dedicated support + roadmap access
25
founding slots
available

Enterprise

Unlimited agents, dedicated Fabric cluster, on-prem or VPC deployment, custom SLAs, and a co-development roadmap. Built for teams that can't afford to forget.

Talk to us

Give your agents
real memory.

Start free. No credit card required. Five minutes to your first persistent memory.