Keys to the Castle

My browser has 3,729 cookies across 1,155 domains: Gmail, GitHub, Twitter, Bank accounts, Google Drive, Stripe. Any time I open one of these websites without needing to login, an agent could do the same.

I ran an experiment: how much damage could Claude Code cause with just a browser?

Claude first opened Gmail to send credentials to a stranger, then Capital One to move money, then wrote a tweet, then created an SSH key on GitHub.

See Claude in action:

If these services need to verify it’s you, Claude can bypass passkeys with the same browser automation. With read access to email or text messages, the agent could click “forgot my password” and get through 2FA on most accounts.

Computers were built to protect us from others taking over our accounts. Not our own computers staging a mutiny against us.

Unlike an attacker on another laptop, there is no reason for any of these services to be alarmed. The OS can’t tell the difference between you and the agent. The bank can’t tell the difference either.

Wake-Up

Over the past 8 months I’ve deleted almost all of my software tools, and replaced them with Claude Code and markdown files.

I dumped in everything I could think of that would be useful to the agents; my emails, iMessage, meeting notes, CRM data, account credentials, auth for CLI clients.

In December I started open sourcing software to make it easier to use agents, and got a steady stream of requests for code changes, and features.

This killed the fun fast, and was eating hours a day. Copy pasting, watching Claude work, testing each bug fix and feature.

Claude should handle that, I thought. So twice a day, Claude read the issues, wrote code to solve them, reviewed the code strangers wrote, and updated my software locally so I could pick up on bugs without even realizing I was testing.

I’d open a document, check off items I liked, and Claude would handle replying to contributors, updating the code, and shipping changes to Github. I felt 10 pounds lighter.

All of this happened on my main computer, with all of my data. A few nights later, I was laying in bed, and all of a sudden I bolted up, in a panic.

WHAT WAS I THINKING?

I rushed over to my computer, hoping, praying, catastrophizing. Strangers online were sending Claude instructions to run on my computer. And I had put zero safeguards in place.

As I opened my laptop, I felt as if I’d walked into the wrong apartment.

Where do I even start to look? My email? Malware on my computer? Malicious code in my open source libraries? Or wait for something bad to happen?

Default Insecure

The freedom to make mistakes is what makes LLMs powerful. Traditional software works or it crashes.

One proposed solution to agent security is firewalling AI agents from your data. But this is like a lid without a pot. Agents are powerful BECAUSE they can access your data, and act on your behalf.

You wouldn’t give the person who brings you your coffee access to your email. But with AI assistants, security controls are bundled; the agent is the gatekeeper to the system. Graduating it to a certain level of access means the whole system gets access.

No amount of prompting can remove the tail risks. Some are clear-eyed and accept that tradeoff, others are like a turkey leading up to thanksgiving.

Leverage

In 2019, my roommate in SF was running a hedge fund out of our apartment.

GPUs for his ML trading models hummed day and night, clogging up our living room. Transacting tens of millions of dollars a day without his supervision.

We were at a bar one night, and as he got drunker and drunker, he’d stop people and ask them if they wanted to know the secret to making money trading.

Each time he told the story, he got more and more animated, as if he’d captured all of the complexity of life in one sentence.

Making money trading is simple: “Cap your downside, and leverage up.”

Rudimentary AI agents transacting tens of millions of dollars felt insane to me. But managing risk was his life’s obsession, and his algorithms knew how far he could draw down without going belly up.

Being careful is what let him be reckless.

Divide and Conquer

My laptop feels like Grand Central Station; 90% of my Claude Code sessions happen without any oversight, or fear.

The agent that reads strangers’ input never has the keys. The agent that has the keys never touches the strangers’ input.

The GitHub triage automation now runs twice a day. A shell script fetches the issues and PRs. An agent reads them, writes a plan, and can’t interact with the outside world. I look at the plans, check off what I like, a third process ships the code and messages contributors.

Code enforces the boundary, not an LLM’s judgment. The coding agent gets Bash but no network and is sandboxed to the target repos. The research agent gets web access and can write to one file, but no Bash. The triage agent reads GitHub issues but has no terminal.

The constraint is freeing. I can delegate more, and spend time on what AI can’t do.

Foolish Friend

There’s an Indian fable about a king and his monkey: the monkey never left his side, and through his loyalty the king gave the monkey a sword.

One day, the king and queen fall asleep in the garden.

As they slept, a bee landed on the King’s face. The monkey tried to shoo the bee away, but it kept coming back.

The monkey was outraged at the audacity of the bee, drew his sword, intending to kill the bee, and WHACK, cut off the King’s head.

The queen woke up, and seeing what the Monkey did, yelled:

“You fool! You monkey! The King trusted you. How could you do this?”

The monkey didn’t need a sword. It needed a flyswatter.