The pipeline decides

Forty PRs in six days. Five thousand lines of net change. Zero direct pushes to main. The model writes the code. The pipeline decides if it ships. The morning gate decides if it stays shipped.

Share
The pipeline decides

Vibe coding has a reputation. Speed without rigour. Ship it, debug it later. There are blog posts. There is, in my experience, a reasonable amount of broken software. This is about what happens when you refuse that trade-off.

The model writes the code. The pipeline decides if it ships.

On Wednesday morning, I had a daily gate due to a fire at 07:00 UTC, wake the Azure VM, and deploy the RNLI AI Agent demo. It would have refused.
The build for agent-live-graph had finished forty seconds before the build for rnli-a2a-agent, both tagged :latest, both from different commits because two PRs merged within minutes of each other. The verification step would have caught the mismatch, opened a morning-gate-failure issue, deallocated the VM, and gone back to sleep. Total Azure spend: a minute of compute. Total human input: zero.

That gate did not exist on Wednesday. It does now. It exists because PR #175 added it, and PR #175 was written almost entirely by Claude.

This is the meta-application of the governance of the last few articles. The argument was: policy governs intent, infrastructure governs behaviour. The wrapper around the demo intercepts every prompt, validates every plan claim, and rate-limits every IP. Boundaries do the work; the agents inside are trusted
with nothing.

Three weeks later, the same shape holds for the way the demo gets built.

The loop

I wrote almost none of the recent code. Forty PRs landed in the last six days, somewhere around five thousand lines of net change. Each one
followed the same path:

  1. I describe the change in a sentence. Sometimes two.
  2. Claude opens a feature branch, makes the change, opens a PR. I look at the diff, comment in plain English, Claude iterates.
  3. A Haiku reviewer runs against the diff. Small, low-risk changes get auto-approved. Anything over a hundred lines, anything that touches
    .github/, anything that deletes a test — manual review only.
  4. Lint, secret scan, docs check, validate. Standard CI floor.
  5. Auto-merge if green. The bot itself cannot post an APPROVE review on its own PRs — GitHub returns a 422 every time, treats it as a self-review
    on a personal repo. So the bot calls pulls.merge directly when the conditions are met.
  6. The build pipeline rebuilds eleven Docker images to ghcr.io. Eleven, because the demo runs eleven custom services on top of the Gravitee
    gateway.
  7. At 07:00 UTC the next morning the gate fires to deploy to the VM.

The morning gate is the part I am proudest of. It compares the VM's git SHA to origin/main. If unchanged, it does nothing — the VM stays deallocated, no Azure spend, no GitHub Actions minutes wasted past the SHA check. If the VM is behind, it waits up to ten minutes for the matching build to finish, starts the VM, pulls, brings the stack up, polls until the gateway responds, runs the full E2E suite — health checks, MCP tool discovery, A2A streaming, guard rails, multi-turn context, Kafka SSE, caching, rate-limit headers — opens a GitHub issue if anything fails, closes the issue if recovery is detected, writes a GitHub Deployment record, and shuts the VM down. Whether the tests passed or not.

That last clause is the discipline. The VM does not stay running because the tests passed. The VM does not stay running for "convenience". The VM goes down. If I want to demo, I start it manually.

Why this matters

Every boundary that matters — code review, test gate, deploy gate, runtime guard rail — runs with no human in the loop. The boundaries are dumb scripts. They do not improvise. They do not get bored. They do not skip the rate-limit header check because it is late.

That is what makes the model safe to use here. Not the model's judgement. The boundaries around it.

The same shape, again. Governance is not the absence of intelligence; governance is intelligence at the trust boundary.

The receipts

This article is being written on the morning that PR #175 is open. PR #174 merged the day before at 08:23 UTC and fixed three bugs. It introduced
one — a CSS refactor that turned --cached into var(--cached), a self-referential variable that resolves to nothing and broke every cached-flow
badge in the inspector. The Haiku reviewer caught it. I opened PR #175 with the fix. The same PR also fixed the GitHub 422, added the morning
gate, added the rate-limit header probe, added GitHub Deployment tracking, and added build-mismatch detection. Six commits, one branch, currently
going through the same loop as everything else.

The branch was opened by Claude. The diff was reviewed by Haiku. The merge will be done by the workflow. The deploy will be done by the morning
gate at 07:00 UTC tomorrow. The tests will run, an issue may open, an issue may close, a Deployment record will be created. None of that is me.

What I did was: I described the changes. In English.

Numbers that hold up the claim

  • 40 PRs merged in 6 days. Cadence is the point.
  • 162 closed issues, 0 open — the four that were open when this was written were closed by PR #174.
  • PR #175 is the one this article is about. It is currently in flight.
  • 16 GitHub Actions workflows running review, build, deploy, test, drift detection, monthly upgrade, weekly smoke tests, demo warmup, nightly
    shutdown, secret scanning. The full list is in .github/workflows/.
  • One Anthropic API key. The Haiku reviewer is small enough that the cost does not register.
  • Zero direct pushes to main in the last six days. Everything went through a PR. This is enforced by feedback to my agent, not by branch
    protection — the discipline holds because the path of least resistance is the right one.

What I will not be doing

I will not be hand-merging changes "just this once". I will not be SSHing in to "just check something". I will not be turning off the morning gate
because a demo is the next day. The discipline only works if it is the path of least resistance, which means it has to be the only path. Bypasses kill it.

I will, occasionally, force-deploy via workflow_dispatch. That is allowed. It is logged. It still creates a Deployment record. The audit trail
does not care that I was in a hurry.

The governance trilogy ended with: the policy and the infrastructure together are the wrapper. Three weeks later, the wrapper has eaten the build pipeline too.

There is a version of this where someone reading the above thinks the work has been automated away. There is another version where someone reads it and thinks the work has been moved up the stack — from typing to specification, from code to constraint, from reviewing diffs to designing the gates that review them.

Both versions are correct. The second one is why it works. I write the spec. That's all that's left — and it turns out that's the hard part.

The model writes the code. The pipeline decides if it ships. The morning gate decides if it stays shipped.


What the demo itself is

A reminder for context. The RNLI AI Agent demo is a reference implementation of the governance wrapper, dressed up as a maritime safety assistant. It exposes:

  • Eleven MCP tools auto-derived from a plain REST API at the gateway, with no agent-side code change.
  • Agent-to-agent (A2A) routing through the gateway — the RNLI agent delegates sea-conditions queries to a Weather agent, with X-Agent-Key
    validated by a Groovy policy at the trust boundary, not by either agent.
  • Three layers of in-gateway guard rails, all CPU-only: Llama Prompt Guard 2 (ONNX) for injection, a regex deny-list for compliance-auditable
    content rules, and DistilBERT multilingual toxicity as the ML backstop.
  • Plan-tier access — Free, Silver, Gold — gated at the gateway via JWT plan claim, normalised into X-User-Plan before reaching the agent. The data
    shape itself is decided by the gateway, not the agent.
  • Rate limiting — fifteen requests per five minutes per identity, returns 429 before the agent ever sees the request.
  • Gateway response caching with a five-minute TTL on the Lifeboat API, ~44% latency improvement on cache hits.
  • Kafka/Redpanda event streaming through Gravitee MESSAGE APIs — rnli.launches, rnli.sea-conditions, rnli.tides.
  • Real-time observability via the AI Agent Inspector, which renders every step of every flow — tool selection, A2A hop, LLM call, cache hit,
    guard-rail block — with latency breakdown, end to end.
  • OAuth2/OIDC through Gravitee AM, token-by-token SSE streaming, multi-turn context, voice input, mobile-responsive UI.

The shorthand: fourteen governed APIs, eleven custom services, every boundary visible.


Continue this conversation

Open a pre-loaded prompt in your preferred AI. Edit it before you send.

Continue in Claude Continue in ChatGPT Continue in Grok Continue in Perplexity

Pre-loaded with context from this article. Opens in a new tab.