Full, running clones of the SaaS tools your agent acts in. Reproduce any bug as a clone id, reset in seconds, and ship without the release-day fire drill.
Every asymmetric session is the same shape — spin a clone, seed it to the exact state your test needs, point your agent at it, read the result straight from the database, then reset for the next run.
Creates a clone, allocates ports, runs migrations, and waits until healthy. You get a live API endpoint back.
Load a deterministic fixture for repeatable evals, or generate realistic data with an LLM. Either leaves the clone in a known starting state.
Point your agent at the clone's HTTP API or MCP server. Every action lands in the clone's own database.
Read the database directly with query to score exactly what the agent did. Stream logs to debug.
reset drops and re-seeds to the identical starting state for the next trial. destroy tears it down and frees the ports.
A dummy workspace you maintain by hand can't be reset, shared, or reproduced. A pile of mocks is never faithful enough to trust. asymmetric is the middle ground — real enough your agent can't tell, disposable enough to throw away.
Real NestJS backends on real Postgres with real JWT auth. Stateful, referentially consistent, returning authentic errors — not canned responses.
Every clone has a stable id, its own database, and a recorded seed. Hand a teammate the id to reproduce a bug exactly — no "which channels, which users, what config." reset rebuilds it byte-for-byte.
The agent's work is just rows in a database you own. Read them with query to grade a run objectively — not by scraping output.
Everything runs through Docker on your machine. No cloud account, no external dependencies. A connected control-plane mode is planned.
A clone is the unit you build and compose. An environment is the unit you point an agent at and score.
One running instance of a SaaS product: a real backend, its own database, real auth. The unit you build and compose.
A named composition of clones on shared infrastructure, defined declaratively so you can version-control your agent's target. Environments are the primitive: as their entropy rises, agent behavior shifts — and that's the data almost no one has yet. Read the thesis →
Same commands drive local Docker today and a remote control plane later. Swap the engine, keep the loop.
The spin → seed → inspect → reset toolchain you drive. One binary, the whole loop.
The shared CloneProvider interface, types, and named errors. The CLI depends on the contract — not on any clone.
LocalDockerProvider today (shells to docker compose). CloudProvider planned. Swap the engine, keep the commands.
One shared Postgres + one shared Redis on an asym-shared network; each clone isolated by its own database clone_<id> and Redis prefix. A fleet stays cheap to run, fully separate.
Honest about what's roadmap — but this is the direction. The compounding flywheel: every run makes the next agent better.
Several real products sharing one identity graph — the same user, org, and account consistent across Slack, Jira, GitHub… so an agent operates a whole connected workspace, not one app. This shared cross-clone identity is the moat.
As a transaction — the real eval & RL primitive: snapshot → rollout → restore. Branch a world, run an agent, roll it back instantly.
Record every agent trajectory against an environment. It becomes your eval dataset, and later your RL training data. Every run makes the next agent better.
A real environment for your agent in one command. Reproduce bugs as a clone id, reset to a known state, and ship without the release-day fire drill.