Back to Blog

Why Git Context Beats Memory Agents in Client Work

Why auditable git-tracked context beats agent memory you cannot easily inspect, review, or scope when many client repos run in parallel for paid work.

Jakub Czechowski

Builds websites and e-commerce at JC Web Studio, runs StackCompass – a publication on content architecture and stack decisions – and co-organizes CMS Conf, a conference on content systems.

/ / 8 min read

A salesperson for Hermes would tell me my problem is forgetting. The 2026 promise is clean: a personal agent that remembers everything across sessions, builds a model of how I work, and stops me re-explaining my stack every morning. Elsewhere in the category, mem0 reports roughly a 26% relative improvement on a memory benchmark. Memory is no longer an extra feature. It is the baseline expectation for an agent that wants to be taken seriously.

I run JC WebStudio. On any given week I have a WooCommerce plugin for one client with acme_ prefixes open in one window, a globex_-prefixed WordPress plugin for another in the next, a couple of Astro 5 marketing sites, and a Laravel app that pays a chunk of the bills. The promise of an agent that just knows all of that, without me typing it, is genuinely attractive. I am turning down one specific version of that promise: opaque memory written by the model, shared across sessions, and difficult to audit or scope. Not because memory is a bad idea, but because I already have it, and mine lives in git.

The Real Promise of a Memory Agent Is “You Lack Context”

Strip the marketing and Hermes makes one assumption about me: that my working context is trapped in my head, ephemeral, and re-typed each session. For many people that is true. A memory agent’s value scales with how disorganized your context is. If every project lives only in your short-term recall, an agent that quietly accumulates a model of your habits is a real upgrade - and the gain may exceed what a benchmark captures.

That assumption is wrong about me specifically. My context is not trapped. It is written into plain files an orchestrator loads on demand: a version-controlled skills-vault for cross-project methods, per-repo .cursor/rules, a CLAUDE.md at each project root, and a PRD per client project. The shared layer carries general workflow conventions; client facts stay local to the relevant repo. The sales story therefore lands on someone who already solved the memory problem through a different mechanism. Opaque agent memory competes with my git history, and git history wins on the axis I care about most: I can read it.

”Memory” in My Workflow Is Four Roles, Not One Brain

I deliberately split work across four agents, and the split only works because none of them owns the context. Claude Chat does planning and long-form thinking. Claude Code does implementation and refactors. Codex CLI handles rescue work and asynchronous jobs I kick off and walk away from. Cursor does iterative code edits where I steer keystroke by keystroke. The Pragmatic Engineer’s March 2026 analysis points in the same direction: Claude Code and Cursor are complementary, not competing. How I split AI-assisted work across tools follows the same pattern. The split is sustainable only because the context they share is a file, not a private store inside each tool.

This is the part the memory promise misses. Cursor has its own memories, but the part I actually lean on is the codebase index. The semantic search that makes Cursor edits feel smart is reading my code, not a model of me. When I want the agent to know my conventions, I do not wait for it to infer them over twenty sessions. I write them into .cursor/rules once, commit, and every agent on every machine reads the same truth. The index finds code; .cursor/rules and CLAUDE.md state conventions. Both read artifacts in the repo. Neither builds a private model of me. The orchestrator routes the right skill file in at the right time. That is retrieval over an artifact I authored, not recall from an artifact a model authored about me.

File-Based Context Beats Learned Memory Because I Can See the Diff

Here is the failure mode that decides it for me. Agent auto-memory - the pattern where an LLM writes notes about your project between sessions, similar to Cursor’s optional background memory feature - is itself an LLM writing prose. It drifts. It records a convention I used once as if it were a rule. It misses the thing I actually care about. Worst of all, it poisons silently. Feed it one wrong fact early, and that error compounds across later sessions with no diff showing where the problem started.

A CLAUDE.md has real, honest costs and I will not pretend otherwise. At large sizes it stops being loaded reliably. There is no semantic search over it. It is a flat document, and the agent reads what is near the top better than what is buried. It goes stale the moment my conventions change and I forget to update it. Maintaining it is manual labor that no learned-memory system would show me as a line item.

But it is markdown in git. When the agent does something wrong, I open the file and see the line that told it to. I review context changes in a PR the same way I review code. The symlink-to-AGENTS.md pattern lets one file serve every tool without duplication. The cost of upkeep buys me auditability, and in client delivery auditability is not a nice-to-have. When a globex_-prefixed plugin ships a bug, I need to reconstruct why the agent made a decision. A diff answers that. An opaque embedding store does not.

Across Many Client Repos, Cross-Session Memory Is a Context-Bleed Liability

This is where the math flips hard for my situation. A persistent memory shared across all my work, without strict project scoping, is structurally a context-bleed risk. The agent that “remembers” learned that acme_ is one client’s prefix, and on a bad day applies it inside the globex_ client’s repo. It remembers a checkout customization one WooCommerce client paid for and helpfully suggests it to another who did not. That is not an outage fantasy. It is the predictable result of one memory store spanning competing clients with similar problem shapes.

My file-based approach has a useful default boundary: the repo. One client’s CLAUDE.md is absent from another client’s working tree, while the shared skills layer contains methods rather than client-specific facts. This is not an absolute security guarantee; an orchestrator can still load the wrong file. But the scope is explicit, reviewable, and aligned with how client confidentiality and billing actually work. A global personal agent may offer profiles and memory layers, but unless those boundaries are equally visible and enforceable, I cannot audit them with the same confidence. For personal life-ops, dissolving boundaries can be the point. For paid client delivery across parallel repos, explicit boundaries are part of the product. This is the same discipline behind treating reusable skills as workflow infrastructure: the constraint is the feature.

Where a Memory Agent Would Actually Beat My Setup

I would be dishonest selling git context as universally superior. There is a real shape of work where Hermes wins, and it is the inverse of mine.

Solo, long-running personal projects with no confidentiality boundary and no second reviewer: that is where the upkeep cost of CLAUDE.md can outweigh its audit benefit. If it is just me in one codebase, the cost of a mistaken memory is lower and the value of a reviewable diff drops. A memory agent that accrues understanding over months of one project could genuinely save the typing.

Non-code life-ops is the stronger case. Tracking that I prefer morning deep work, that a recurring client emails on Fridays, that I keep abandoning the same kind of side project: that is exactly the fuzzy, cross-session pattern a learned store handles better than any file I would bother maintaining. I am not going to write a LIFE.md and diff my own habits in a PR. For the unstructured texture of a working life, opaque memory’s weakness barely costs me, because there is nothing to audit and no client to protect.

The pattern is simple: learned memory wins where boundaries matter less and the context is too fuzzy to maintain by hand. My client work is the opposite: hard boundaries and structured conventions worth writing down.

When I Would Reconsider

I would switch the day learned memory becomes diffable. If Hermes exposed its store as version-controlled, human-readable artifacts that I could review in a PR and scope to a directory, the distinction I am defending would collapse. That would be my system with better ergonomics, and I would take it. The same goes for semantic search over my own markdown: retrieval can solve the size problem without surrendering ownership, provided I can still inspect every source and change. That is the convergence I am betting on. It follows the same rule as my reviewable AI quality gates: own the source of truth, let tools index it.

The Question Is Not “Memory or Not”, It Is “Who Owns the Context”

Framing this as “stateful agent vs. stateless agent” is the wrong axis. I am not stateless. I am aggressively stateful. The difference is that my state consists of artifacts I authored and can read, diff, and scope to a directory. Model-authored memory may contain useful context, but when I cannot fully inspect its sources or boundaries, I cannot give it the same authority.

For a solo developer on one long project, ceding that ownership for convenience is a fair trade. For a studio running many client repos in parallel, an incorrectly remembered fact can become a billing or confidentiality problem. Memory is not the question. Custody is. Until a memory agent gives me context that is readable, reviewable, and scoped to the right repo, I will keep mine in git, where I can see exactly what it thinks it knows.