9 min1,912 words

Everything local: the setup that gives AI enough context to be useful

The first tax of every useful AI session is context.

Not the model. Not the prompt. Context.

What project is this? What did we decide last week? Which files are dangerous? What tool can the agent use? What evidence supports this recommendation? What should not be changed? Which client constraint sounds minor but matters a lot?

For a while I paid that tax manually. I would open a coding agent or a fresh chat, type three paragraphs of setup, hit send, then realize halfway through the answer that I had forgotten the important constraint from the previous Thursday. So I would dig through Slack, find the missing sentence, paste it back, and restart the real work.

Do that across client work, personal projects, product experiments, content drafts, outreach systems, and codebases, and a strange thing happens: the model is fast, but the work still feels slow because you keep loading the model's head by hand.

That is why I started treating local-first as infrastructure. Not nostalgia. Not a productivity aesthetic. Infrastructure.

Andrej Karpathy wrote about an LLM-maintained Markdown wiki: raw sources, a wiki, and a schema file that tells the model how to keep it organized. I liked the idea immediately because it was boring in the right way: local files, plain text, inspectable memory, no SaaS knowledge base deciding the shape of my work.

My problem was adjacent but more operational. I do not only need a knowledge base. I need the work itself to carry enough context that any model can show up useful without a ten-minute onboarding ritual.

The setup in one view

The current version has five layers.

  1. Two local roots: /Users/clement/clients/ and /Users/clement/projects/.
  2. Project state files in active workspaces: TODO.md, PROJECT_STATE.md, sometimes WORKFLOW.md.
  3. Agent rules in repos: AGENTS.md for Codex, CLAUDE.md for Claude Code.
  4. Machine-level MCPs so shared tools are available across harnesses.
  5. Rift on top, to capture decisions and retrieve useful memory later.

MCP means Model Context Protocol. In plain language, it lets an AI assistant call tools: search memory, query analytics, inspect a design file, or read a local service. The reason it matters here is that context stops being something I paste manually and becomes something the model can request from the right place.

Nothing in this setup is magical. That is the point. The useful parts are files, folders, contracts, and tools that survive when a chat closes.

Two roots remove one dumb decision

My filesystem starts with one boring split:

/Users/clement/clients/   # client work
/Users/clement/projects/  # personal projects, product experiments, code repos

That is the whole mental model. Paid work lives under clients. My own projects live under projects. Both roots can be opened in Obsidian, searched by command-line tools, and read by agents with local access.

This sounds small because it is small. But small defaults compound. I do not want to remember where a client strategy doc, raw export, agent rule, or product experiment lives. I want to know that the current working directory already tells the agent what kind of world it is inside.

If I am in /Users/clement/clients/linc, the agent should behave like it is inside the Linc operating system. If I am in /Users/clement/projects/fresh, it should behave like it is inside the Fresh product workspace. Same machine, different local context.

The project contract lives next to the work

Every active workspace gets a few plain Markdown files.

TODO.md is the live state of motion: now, next, later, blocked, done. It is not a project-management ceremony. It is a short note for future me and for the model.

PROJECT_STATE.md is the current snapshot: what the project is, what matters now, recent decisions, risks, and constraints that should not be forgotten.

WORKFLOW.md appears in more operational workspaces. It explains where things belong: raw inputs, exports, research, docs, client messages, metrics, snapshots. This matters more than it sounds because real work produces claims, and claims need a trail.

The pattern I want is:

raw export or conversation
  -> analysis
  -> decision
  -> client message or product change
  -> todo
  -> later reference

Provenance is the technical word for "where did this come from?" The reason it matters is practical: when a client asks why I recommended something three months later, the answer should not depend on my memory of a Slack thread.

Agent rules are onboarding docs for machines

AGENTS.md is what Codex reads. CLAUDE.md is what Claude Code reads.

I think of them as the house rules for a workspace: what the project is, what commands matter, what files are dangerous, what should be verified before calling work done, and what should happen at the end of a session.

In simple repos, the two files can be almost identical. In more mature workspaces, I let them differ. Codex is often my planner and reviewer. Claude Code is often my builder and deep implementation loop. The point is not that one is always better. The point is that each model should know its role inside the workspace.

That same role split shows up in claude-relais, where Claude scopes and judges while another agent does bounded implementation. It also shows up earlier in the double-tap, where I ask a different model to challenge a plan before any file is touched.

The value of AGENTS.md and CLAUDE.md is repetition. I do not want to re-teach the same rules in every chat. The repo should teach them for me.

Machine-level tools should not be rediscovered every day

Some tools belong to a project. Others belong to the machine.

In my setup, the durable tool layer sits near the root:

/Users/clement/.codex/config.toml
/Users/clement/.claude.json
/Users/clement/.cursor/mcp.json
/Users/clement/.env.mcp
/Users/clement/mcp/

That is where shared tools and environment variables live: Rift, Google Search Console, Google Analytics, Google Ads, HubSpot, Ahrefs, Webflow, Paper, and custom local MCPs. The exact list changes. The rule does not: global tools live globally, project-specific tools live with the project, and secrets do not get copied into random repos.

The win is not "more tools." Too many tools make agents worse. The win is a stable tool layer that follows the work. If I ask about prior decisions, the agent should know Rift exists. If I ask about search traffic, it should know analytics tools exist. If I ask about a client repo, it should have local files plus the right shared tools.

Handy made input local too

The obvious version of local-first is files and code. The less obvious version is input.

I wrote about Handy and local LLMs separately, but it belongs in this system. Handy lets me press a hotkey, talk, transcribe locally, clean the transcript with a local Qwen model, and paste the result into whatever app I am using.

That is useful because dictation is faster than typing. But the more interesting thing is capture. Every recording, raw transcript, and processed transcript can be mirrored into a local corpus. My input layer stops being only convenience and starts becoming source material.

A cloud dictation product can make me faster. A local one can make me faster and leave the corpus behind.

That is the recurring pattern: rent intelligence where it makes sense, keep the substrate when the work produces value.

Rift covers the conversations

Files cover project state. They do not cover everything that happens inside conversations.

That is the gap Rift is meant to fill. Rift is my local-first memory layer for AI work. The goal is not to keep every chat transcript as if every token is sacred. The goal is to capture the durable parts that normally disappear: decisions, todos, constraints, examples, evidence trails, and project context.

The current shape has five ideas:

  1. Auto-capture across Claude Code, Codex, ChatGPT, Claude.ai, Gemini, Grok, and other surfaces.
  2. Ingestion, which means turning messy inputs into records the system can work with.
  3. Indexing, which means cataloging records so they can be found later.
  4. Embeddings, which are numeric fingerprints of meaning that help search by idea, not only exact words.
  5. MCP tools that let models ask for a bounded context pack instead of raw history.

The last point is the most important. Raw transcripts are noisy. A long agent session includes tool output, wrong turns, partial plans, and fixes. Useful memory is not "replay the whole thing." Useful memory is "what did we decide, what evidence supported it, and what should the agent know before acting?"

Fresh is the same pattern pointed at content

This setup also feeds Fresh, the content system I am building.

Work already creates editorial material: customer objections, product tradeoffs, sharp decisions, surprising metrics, internal debates, screenshots, mistakes, and lessons from delivery. Most teams lose those because they happen inside Slack, calls, docs, chats, and throwaway analysis.

My local-first setup is the personal version of that idea. Capture the work. Preserve the evidence. Retrieve the useful pieces. Analyze the patterns later.

The personal workflow and the content system are the same bet at different scales: the source material is more valuable than the generated draft.

What I would copy first

I would not copy the whole stack.

Start with the boring version:

mkdir -p ~/clients ~/projects

Then add four files to every active project:

TODO.md
PROJECT_STATE.md
AGENTS.md
CLAUDE.md

Keep them short. TODO.md should answer what is next, what is blocked, and what just happened. PROJECT_STATE.md should explain what the project is, what matters now, and what decisions should not be forgotten. AGENTS.md and CLAUDE.md should explain what the assistant should read first, what is safe to edit, how to verify work, and what rules matter.

Only add MCPs when the need repeats. Do not install twenty tools because it feels advanced. Add one when you keep needing the same capability across sessions.

The advanced version is Rift, local models, corpus capture, evidence trails, and content mining. The basic version is just putting context next to the work.

The model is still rented. The context should be yours.

More to read