4 Jul 2026 · 5 min read

From a chat message to a merged pull request

How chat-driven AI coding agents work in practice: describe a change, get a reviewable draft PR — with plan, approval and a build step before anything merges. What they're good at, and what still needs a human.

You describe a change in plain language in a chat window. A few minutes later you get a branch and a reviewable draft pull request waiting for your approval.

That round trip is the whole promise of an AI coding agent, and it is worth understanding exactly how the pieces fit together before you trust one against your repositories.

The loop: describe, plan, approve, execute

The interaction starts with a message: "add rate limiting to the login endpoint" or "the CSV export drops the last row, fix it." An AI coding agent treats that as an intent, not a command to run blindly.

First it plans. It reads the relevant files, restates what it thinks you want, and proposes concrete steps: which files change, what the diff will roughly look like, what could go wrong. This is where most misunderstandings surface, before any code is written.

Then you approve or correct the plan. If the agent misread the spec, you say so in the same thread and it revises. Nothing has touched your codebase yet.

Only then does it execute. The agent creates a branch, makes the edits, runs the build and tests, and opens a draft pull request. You get a normal git diff to read, the same review surface you already use every day.

Why draft PRs, not direct commits

A draft pull request is the important design choice here. The agent proposes; you dispose.

Direct commits to a working branch would put an autonomous process one step from your history with no checkpoint. A draft PR inserts exactly the pause you want: you read the diff, run it locally if you like, request changes, and merge only when you are satisfied.

This keeps the human in the loop where it counts. The agent does the mechanical work of finding files, writing the change, and wiring up tests. You keep the decision about whether the change is correct and whether it ships. Your existing review culture, CI checks, and branch protections all still apply, because the output is an ordinary PR.

What it's good at

In practice, an AI pair programmer earns its keep on well-scoped, verifiable work:

Small features with a clear acceptance criterion.
Bug fixes where you can point at the symptom and a failing case.
Refactors that are mechanical but tedious, like renaming across a module or extracting a function.
Boilerplate: new endpoints, migrations, test scaffolding, config files.
Cross-repo chores: bumping a shared dependency or applying the same edit across several repositories.
Keeping a build green: chasing down a lint failure or a broken test after an upstream change.

The common thread is that success is checkable. When there is a build to pass or a test to satisfy, the agent has a signal to work against and you have a signal to review against.

What still needs a human

Being honest about the limits is what makes the tool usable.

An AI coding agent is weak exactly where judgment dominates. Architecture calls — how services should be split, which abstraction to commit to for the next two years — are yours. Ambiguous specs produce confident but wrong work; if you cannot state the acceptance criterion, the agent cannot meet it.

Security-critical changes deserve a human owner. An agent can draft an auth change, but you should read every line of it and reason about the threat model yourself. And taste — API ergonomics, naming, what to leave out — is still a human's job. The agent is a fast, tireless implementer, not a substitute for the engineer who understands the product.

Treat its output as a strong first draft from a capable junior, and review it that way.

Guardrails that make it safe

The difference between a useful agent and a liability is the guardrails around it.

Scoped credentials. The agent gets only the access a given task needs, not a standing key to everything.
Isolated workspaces per task. Each job runs in its own checkout, so a bad run cannot corrupt another task or your local tree.
A verify step before the PR. The build and tests run first; if they fail, you see that in the draft rather than discovering it after merge.
Review gates. A human approves the plan up front and the diff at the end. Merge stays a deliberate human action.

None of these are exotic. They are the same disciplines a careful team already uses, applied to a non-human contributor.

Trying it

Figaro is one implementation of this loop. You message it in Telegram, it plans a change across your repositories, waits for your approval, then executes in an isolated worker with scoped secrets and a build step, and opens a draft pull request for you to review.

It has been run end to end this way — a message in, a reviewable draft PR out, with a human approving before anything merges. It does not merge to production on its own and it does not replace your engineers; it hands you a diff and lets you decide.