← Blog

The private agent I had been meaning to run is finally up. I wrote about wanting it in Two Services I Never Opened: the instinct to pull things in-house instead of renting one more tab. So I built the in-house version. One job: a morning briefing (the weather, my calendar, the three things I said yesterday I would do today), waiting for me before coffee.

The night I set it up, I watched it work, and then I did the thing I should have done first. I looked at where its thoughts actually came from. Not my closet. Every time it reasons, it reaches out over the wire to a frontier model. Anthropic, sometimes OpenAI. The container is mine. The cognition is rented. I had built a vault and kept the valuables somewhere else.

And the briefing itself? A hosted agent does it better. No box humming in the corner, no 2 a.m. restart when the provider changes something on their end. So the self-hosting bought me what, exactly?

I localized the wrong half

On the ledger, my private build lost on all three axes. Privacy was gone. The prompts still leave the house, same as if I had used the app. Capability was identical: the same model on the other end of the wire. Convenience was worse. The hosted version does not need me to babysit it.

So I had localized the wrong half. The model is rentable and the harness disposable. That was the last post’s lesson. What neither captured is that the only thing that compounds, the only thing actually mine, is the accumulated knowledge of me: the notes I have kept for years, the journals, the half-finished thoughts I keep returning to, the record of the decisions I have talked myself into and out of. A model is the same for everyone who rents it. That archive is mine alone, and it is the one part of the whole arrangement that grows more valuable the longer it sits there. That is the second brain. Everything else is a tenant passing through.

I was drawing one line, not two

The fix is not to drag the whole thing local. It is to keep the right parts local. Looking things up, tidying my notes, the routine glue: a small model on my own machine handles all of it. The genuinely hard reasoning, the part that has to think rather than just fetch, is the only thing that sometimes needs to go out to the frontier.

And here is what took me a second to see: the question “is this hard enough to need the big model” is the same question as “is this allowed to leave the house.” Two lines I thought I was drawing separately turned out to be one line drawn twice. The surprise is that they ever lined up at all. Difficulty is a property of the task and privacy is a property of me, and there is no reason the two should agree. But mostly they do. The hard tasks tend to be impersonal: world knowledge, real reasoning over things anyone could ask. The personal tasks tend to be easy: search my own notes, pull the right paragraph, summarize a week.

A good harness almost closes the gap

This is where I almost talked myself out of the whole worry. Because a good harness offloads the mechanical work (the fetching, the formatting, the procedural stepping) and leaves the local model with a much smaller job: pull the relevant pieces, glue them together. And it does that fine. Most of what looks like “this needs the frontier on my private data” turns out to be a task that was just badly scaffolded. Build the harness well and the tradeoff seems to dissolve.

I came across a setup that made the case better than I could, and it was a careful one, not a strawman. It does all the searching on your own machine, builds its own private map of your notes there, and never lets the library itself touch a server. Genuinely privacy-preserving, by its own lights. You route by difficulty, the personal work stays home, the hard reasoning goes where hard reasoning is best, and the leak you were worried about never shows up. For a while I believed it.

What the harness cannot fetch

Then I looked at the last step. To actually answer me, it sends the passages it found out to a frontier model, which means it keeps private where things are in my library, but not what they say. The map stays home. The sentences themselves, the ones that matter, the ones it pulled because they matter, go out over the wire. The leak is not a bug in the design. It is a side effect of doing it right.

Because a harness only ever moves the fetching. It never moves the thinking. And there is one query that is the entire reason to build a second brain at all: read everything I have written and tell me the pattern in myself I cannot see. No script makes that. The deliverable is not retrieved; it is the inference. You cannot scaffold your way to an insight that, by definition, I could not already reach on my own. That is the distance between my local model and the frontier one as it stands now. I picked up this framing in From Prompts to Harnesses, but here it cuts the other way: the harness moves everything except the part I most need moved.

And it is worse than a single hard question, because the hardest ones fuse two things that live in different places. Ask it something that actually matters: given my health history and these new symptoms, what should I be asking my doctor; given my finances and the way I really behave with money, is this the right call. Each of those needs my private record and the broad reasoning only the frontier model has, held in the same head at the same time. I cannot split them across machines (retrieve at home, reason in the cloud) because the answer only exists when both are present at once. The cost is built into the design: I can own every word I have written and still have to surrender it to get the one reading worth having.

They are not after the easy part

Once I saw that, I understood what the providers are actually after. They are not competing for my file search. They know I will run that locally forever; it is cheap and it is solved. They are after the residue: the deep read of my private self, the one thing local cannot do yet, and the one thing you can only do if you are holding the whole archive.

And in the last year they have all moved to hold it. Every major assistant now keeps a memory, not a short list of facts you save by hand, but a synthesized model of you, assembled in the background from everything you have ever typed at it. ChatGPT now reads across years of your past conversations and carries forward what it learns. Claude builds the same kind of running summary and refreshes it on a regular cadence. Gemini turned it on by default. All three pushed it down to their free tiers inside the last year. None of that is aimed at the file search I can already do in my closet. It is aimed squarely at the one thing my closet cannot reach.

“Let us remember everything for you” is not a convenience feature, then. It is a bid for the one query that is worth anything. And the switching cost they are building is not the usual product lock-in. Not my workflows, not my integrations, the things I could rebuild in a weekend. It is my own accumulated past, which I cannot rebuild at all. The longer they hold my memory, the better the answer only they can give becomes, and the less leaving makes any sense. The lock-in compounds on its own, every day, whether or not I open the app.

That is the quiet shift in personal AI. Personal used to mean mine: on my machine, under my control. It is starting to mean about me. Held elsewhere, learned from me, improving for them the longer I stay.

I would rather wait

There is one reason this is not simply a loss: the residue is shrinking. Local models are closing the gap year over year, doing more of that private inference well enough that the distance between what runs in my closet and what runs in a datacenter keeps narrowing. It is the same floor rising I wrote about in The Average of the Public Web, except this time it is climbing toward the one private thing I meant to keep. So the real contest is about speed: whether the local brain grows capable of that deep read before convenience talks me into sending myself out to get it sooner.

There is an answer I could have tonight. Point the agent at the journals, let it send the lot to someone else’s model, and by morning I would have the briefing I actually want: not the weather and my calendar, but the pattern in myself I keep missing. It would go out under contract, not made public, not training data, but it would still leave my hands, and this is the one thing I am not willing to send. So I am leaving the question unasked, refusing the most useful thing the tool could do, not because it would fail but because it would work. I would rather keep the memory mine and wait for the brain to come home than rent myself out to get the answer a year early.