Sand Castle and the Coming Wave of Autonomous Software Factories

Matt Pocock — "I Open-Sourced My Own AFK Software Factory" — YouTube, E5-QK3CDVQM

For six months, Matt Pocock has been running coding agents while he sleeps. Not one agent. Many. In parallel. Picking up backlog tasks, implementing features, running QA — all without him in the loop. The secret is not better prompts or a smarter model. It is sandboxing, and the orchestration layer he built to make it trivial.

The tool is Sand Castle, a TypeScript library for running AI coding agents inside isolated sandboxes. And it exposes the simplest possible API: a single function that takes an agent, a sandbox, and a prompt. From that one primitive, Pocock has constructed what he calls a "mini software factory" — a pipeline of planner, implementer, reviewer, and merger agents that turn GitHub issues into merged code while he drinks tea.

Why Sandboxing Changes Everything

The central blocker for autonomous agents is not intelligence. It is trust. Agents that can write code can also delete your home directory, exfiltrate secrets, or send proprietary code to a third party. The naive fix — "YOLO mode," where you disable all permission checks — is exactly how you wake up to a destroyed system.

Pocock tried Docker sandboxes, but found them brittle for unattended execution. Every other tool he evaluated was trying to sell him a managed third-party service. What he wanted was a simple TypeScript function he could own end to end. So he built it.

Sand Castle runs agents inside Docker containers you define. That means you install whatever you want — system dependencies, the GitHub CLI, Claude Code, your own tooling — and the agent operates inside a disposable environment where the worst it can break is the container. The host machine stays untouched.

The Four-Agent Assembly Line

Where Sand Castle gets interesting is not the sandbox. It is the orchestration. Pocock open-sourced not just the library but the workflow templates he uses in his own repos. The most elaborate is a four-stage pipeline built on a deceptively simple primitive: sandcastle.run(agent, sandbox, prompt).

Here is how it works in practice:

Stage	Agent Role	What It Does
Planner	Claude Code	Reads open GitHub issues tagged `sandcastle`, filters for unblocked items, and outputs a JSON plan of what can be worked on right now.
Implementer	Claude Code (per task)	For each planned issue, checks out a branch, implements the feature, writes tests, runs type-checking — all inside its own sandbox.
Reviewer	Another Claude Code instance	Runs a diff review against project-specific coding standards, catches mistakes the implementer inevitably makes.
Merger	A senior "merger developer" agent	Resolves merge conflicts between parallel branches, runs final checks, and merges everything back to `main`.

The result is a system where multiple agents commit code simultaneously, review each other, and merge their own work. Pocock describes it as having a team that never sleeps — and never asks for permission.

Prompts as First-Class Code

One of the most elegant details is how prompts are handled. Each stage reads from a markdown prompt file. The planner prompt, for instance, instructs the agent to fetch labeled issues, examine comments and labels, identify blockers, and output a JSON plan wrapped in <plan> tags. The implementer prompt takes a task ID, issue title, and branch name as arguments and tells the agent exactly how to work.

Pocock even borrowed a trick from Claude skills: prefixing code blocks with an exclamation mark (e.g., !\`\`\`git diff source..branch) causes the prompt resolver to execute that command inline. This means review prompts can dynamically pull diffs at resolution time without hardcoding state.

GitHub Issues as the Backlog Manager

Rather than building a custom queue, Sand Castle uses GitHub issues as the source of truth. Any issue tagged with the sandcastle label becomes a candidate for autonomous execution. This is a small but telling design choice: Pocock did not rebuild project management. He co-opted what engineers already use.

The demo in the video is vivid. Pocock creates a GitHub issue requesting a basic TypeScript template with VTest, type-checking, a CLI using Commander, and a CI script. He runs npm run sandcastle. Within minutes, a planner agent has read the issue, an implementer has scaffolded the code, a reviewer has signed off, and a merger has committed it to main and closed the issue — all without human intervention.

Agent Agnosticism Is the Point

Sand Castle is deliberately unopinionated about which agent you run. Claude Code is the default, but swapping in Codex or any future CLI agent is a one-line change. This matters because the agent landscape is shifting fast. Owning your orchestration means you are not locked into Anthropic's subscription model, OpenAI's pricing, or anyone else's roadmap. The orchestration is yours. The sandbox is yours. The prompts are yours.

As Pocock puts it: "That's the power of owning your own process."

Why This Matters for Diffie

The immediate read for Diffie — an AI browser testing tool — is about agent orchestration for quality assurance. If Sand Castle can turn GitHub issues into merged code autonomously, the same primitive can turn bug reports into verified fixes. A Diffie user could tag an issue diffie-auto and have an agent spin up a sandboxed browser, reproduce the visual regression, generate a test, and open a PR — all while the team is asleep.

But the deeper lesson is about owning the workflow layer. Pocock rejected every managed sandbox service because they wanted to own the interface. For Diffie's ICP — frontend engineers at fast-moving startups — the value is not just testing. It is trustworthy automation. Engineers will pay for tools that remove toil without removing control. Sand Castle proves that the winning architecture is a simple, composable primitive (sandbox + prompt + agent) that users can wire into their own pipelines.

For your GTM motion specifically: Diffie should position itself as the testing primitive in an agent-driven workflow, not as a standalone dashboard. Integrate with GitHub issues. Expose a CLI. Let teams tag regressions and have Diffie agents verify fixes inside ephemeral environments. The customers who will love this most are the ones already experimenting with AI agents — the same audience Pocock is speaking to.

The future of software development is not one super-agent. It is small, specialized agents running in parallel, reviewable, reversible, and entirely under your control. Sand Castle is the orchestration backbone. Diffie could be the quality backbone. That is the frame worth selling.