Rift: a memory MCP that captures decisions across every model I use

The first version of Rift was a folder of Markdown files.

I was switching between clients, Linc projects, and my own projects all day. I wanted the simplest possible memory layer: plain text, local files, easy search, no API dependency, and no tool that owned the shape of my notes. I could point Claude Code or Codex at the folder and say: read this before helping me.

It worked for a while.

Then the real problem showed up. The best decisions were not always the ones I remembered to write down. A lot of important work happened inside Claude Code, Codex, ChatGPT, Claude.ai, Grok, Gemini, and web UIs, then disappeared as soon as the session ended. I had a knowledge base, but my actual work was leaking around it.

The short version of the frustration is this: every model I use forgets what every other model and I decided yesterday.

You can fight that with discipline. Keep DECISIONS.md. Re-paste context every morning. Summarize every conversation manually. I tried. The cost is invisible, which makes it worse. You stop doing ambitious things because the ramp-up cost is too high, then forget that you stopped.

A diagram of the Rift MCP contract: rift_status, rift_context_pack, rift_search, rift_conversations_search, rift_save — Five tools, one corpus, every harness. The product is the memory contract.

What Rift is

Rift is a local-first memory and context server for LLM work.

Local-first means the corpus starts on my machine. MCP means Model Context Protocol, a standard way for an AI assistant to call tools and retrieve context. In plain language, Rift is not another chat app. It is a local server that lets different models ask the same memory system what matters.

The three core ideas are:

Capture work automatically across the tools I already use.
Turn messy conversations into structured memory: decisions, constraints, todos, examples, and evidence.
Expose that memory through a small contract, not by dumping raw transcripts back into the model.

The last point is the product. The corpus is the substrate.

What the Markdown wiki got right

The wiki phase was not a mistake.

It proved three constraints I still believe. Readable source files matter because memory you cannot inspect becomes superstition. Local access matters because the best context is often private client work, unfinished strategy, or half-formed ideas I would not paste into a random SaaS tool. Model-agnostic storage matters because I do not want one vendor's memory feature deciding which assistant remembers my life.

But Markdown alone made me the ingestion system. I had to decide what to write, where to put it, when to update it, how to name it, and what to paste back later.

That is not memory. That is admin.

The missing layer was capture and triage. Capture means the work lands in memory without me remembering to journal. Triage means the system separates durable signal from noise before a model sees it.

This is also why Rift connects to my broader local-first setup. Local files give each project a contract. Rift tries to capture the conversations and decisions that happen around those files.

The five-tool contract

Rift exposes a small MCP surface. Each tool has one job.

`rift_status`

This returns the capability map: what memory classes exist, which tool to use for which job, and what the default behavior should be.

I want agents to call this first because it teaches them how to use the rest of the system. Without a meta-tool, every model has to guess the right entry point, and models guess wrong often enough that the extra tool earns its place.

`rift_context_pack`

This is the default tool.

It returns a compact, pre-budgeted bundle for the active task: prior decisions, project constraints, relevant examples, and rule documents. It is bounded by max_bytes, with a default around 4KB and a hard ceiling. The goal is not perfect recall. The goal is enough context to act without filling the whole context window.

`rift_search`

This is search across watched files, project docs, conversations, and digests.

It can scope to documents, conversations, projects, clients, or everything. It also uses the current working directory to narrow results when that makes sense. That matters because "search my entire life" is rarely the right starting point. Usually the agent needs the project it is already inside.

`rift_conversations_search`

This is conversation-specific search with filters.

It can filter by source, like claude_code, codex_cli, chatgpt_web, or gemini_cli. It can filter by intent, like decision, build, brainstorm, troubleshoot, research, or learn. It also has hot and cold tiers. Hot is fast, recent, and summary-shaped. Cold is slower and closer to raw content.

That tier split is important. Most agent questions need recent summarized decisions, not a full replay of an old session.

`rift_save`

This is the escape hatch.

Auto-capture is good, but sometimes a decision needs to be saved explicitly. rift_save writes a discrete fact, decision, or note into memory with the right tags instead of relying on the pipeline to infer that a sentence mattered.

A diagram of the hot/cold tier split: conversations_hot for fast summary access, structured docs and cold conversations for deeper retrieval — The tier split keeps normal retrieval fast and bounded, while leaving a path to inspect older source material.

Why raw transcripts are the wrong default

Every memory product is tempted to return raw conversations because raw feels complete.

It usually fails in three ways.

First, token budget collapse. A long coding session can be enormous. Give a model two old sessions and you can spend the whole context window before doing any new work.

Second, signal-to-noise rot. Raw transcripts contain tool output, wrong turns, partial plans, repeated explanations, and old mistakes. The decision is in there, but so is everything that led to the decision.

Third, re-confusion. If you show a model its old wrong answer next to the corrected answer, you sometimes re-anchor it on the wrong thing. Old reasoning is not neutral context.

Structured memory avoids that. A DTO is a structured object with predictable fields. The reason it matters is that the model receives the decision, source, timestamp, and tags instead of a messy transcript. It can still expand a source when needed, but the default is the conclusion, not the whole deliberation.

Auto-capture is the hard part

Per-tool memory is not enough.

ChatGPT remembers what happened in ChatGPT. Claude project memory remembers that project. Cursor remembers some of what happened in Cursor. None of them reliably remember what was decided in the other place.

That is the bug Rift is trying to fix. The capture layer sits underneath the harnesses. The model I use today does not need to be the model that captured the decision yesterday.

The sources I care about today are claude_code, claude_web, codex_cli, chatgpt_web, gemini_cli, gemini_web, and grok_web. Whatever I use, the work lands in the same corpus. Whatever I switch to tomorrow can inherit the useful parts.

Manual capture is not just inconvenient. It fails exactly when the work is busiest and most worth remembering.

How I actually use it

The most common use is project entry.

When I open a project after a few days away, the agent can call rift_context_pack with the current working directory and return the decisions, constraints, and examples relevant to that project. I do not have to re-read my own notes before asking for help.

The second use is "what did we decide about X?" That used to mean tab archaeology across chats, Slack, docs, and memory. With Rift, it should be a search query with a few scoped results.

The third use is mid-task grounding. Before an agent does something irreversible, I want it to pull the relevant constraints. Not because I enjoy ceremony, but because "do not touch production data" and "do not promote raw chats" are the kind of rules that should slow the model down automatically.

This post itself used Rift during drafting. The "harness over model" thread, the structured-memory framing, and the contract language all came from prior conversations Rift had captured. I still rewrote the post by hand, but I did not have to rediscover the argument from scratch.

What does not work yet

Cross-device sync does not exist. Rift is local-first today, and my phone is not part of the loop.

Team memory is not solved. Single-player memory is already hard enough. Shared team memory adds permissioning, source trust, disagreement, decay, and social problems that I am not pretending to have solved.

Mac ergonomics are ahead of everything else because that is where I work. Linux is possible. Windows is rough.

Memory drift is still a live problem. Old decisions become stale. Some tools have since parameters and explicit decay rules, but contradiction and eviction are not finished. A memory system that remembers too much becomes another form of noise.

The UI is also not the product yet. I need enough UI to debug the corpus, inspect sources, and understand what the model received. But the main surface is still the MCP contract.

What I would copy first

If you want to build a small version, do not start with a UI.

Start with one source and one capture script. Save conversations or notes into a local folder or small database. Extract only three memory types at first: decisions, constraints, and todos. Then expose one function that returns a bounded context pack for the current project.

Bound everything. Every context pack needs a byte limit. Every search needs a top-k. Every raw source expansion should be deliberate. Without limits, you will flood the model and call it memory.

Then add sources one by one. Do not build the perfect schema before capture works. Capture first, contract second, UI third.

The broader frame is simple: models change, harnesses change, but the durable asset is what you decided, why, and when. Rift is not the assistant. It is the memory layer underneath the assistants, so I do not have to re-explain my work every time I switch tools.