Guardrails and memory
for your AI coding agent.
Your AI coding agent will skip the test, leak the secret, and forget yesterday. That's not a prompting issue - it's a harness problem. goat-flow is the opinionated harness for teams shipping with Claude Code, Codex, Gemini CLI, and Copilot CLI - not just demoing them.
Terminal output showing goat-flow audit results: 17 of 17 build checks passing, harness scores for Claude Code (94%), Codex (91%), Gemini CLI (87%), and Copilot CLI (85%), plus five-concern coverage for context, constraints, verification, recovery, and feedback loop.
Agents need better control systems.
Files it can read. Commands it can run. Rules it must obey. Memory it keeps across sessions. That's the harness - and it matters more than which model you pick. goat-flow gives you one, opinionated, out of the box.
Four pieces. One harness.
Audit tells you what's missing. Skills give the agent workflows. Hooks stop dangerous actions. The learning loop remembers what happened.
Pass/fail checks, no wiggle room
Validates every file, skill, and hook the agent needs. Either it's installed or it isn't. Scores each agent's harness completeness across the five concerns.
Structured slash commands
Seven workflows with defined phases, named artefacts, and stopping points. Debug, plan, review, critique, security, QA - plus a dispatcher that routes your intent to the right skill.
Safety nets that can't be skipped
Pre-action guards block dangerous commands before they run. Post-action guards catch silent breakage after. deny-dangerous ships by default, blocking destructive filesystem commands, all git push, secret exfiltration, and risky subshells.
Persistent memory across sessions
Footguns, lessons, decisions, session logs. Every mistake becomes next session's context. The compounding bet: every session that hits a problem makes the next one harder to trip.
The execution loop
Every agent action follows four steps. Each one prevents a specific failure mode that free-running agents reliably hit.
Load the files first
Pull in the actual code before reasoning about it.
Declare what changes
List files that will be touched, and files that won't.
Make the change
Edit only within the declared scope. Nothing else.
Prove it works
Run linters, re-read changed files, confirm nothing drifted.
Workflows, not suggestions.
Free-form prompting is how agents get lost. Skills are structured slash commands with defined phases and clear stopping points. Use /goat as the default entry point and it routes to the right one.
Block dangerous actions before they run.
A system prompt is a suggestion. A hardcoded boundary is a rule. Hooks enforce boundaries at a layer the model cannot talk its way past.
Ships with sensible defaults
deny-dangerous catches the patterns agents hit most often when they go off-script: destructive filesystem commands, all git push, secret file reads, subshell escapes, and database truncation.
Extend with your own
Drop linters, format-on-save, custom validators, or project-specific rules into the hooks directory. They register automatically and run in parallel with the defaults.
The harness gets smarter every session.
Two things failed. Nothing remembered, and nothing stopped them. The learning loop fixes both.
Footguns
Architectural traps captured with semantic-anchor evidence. Stops the agent from hitting the same code landmine twice.
Lessons
Behavioural mistakes the agent made - logged so the same error pattern is recognised and avoided next time.
Decisions
Architecture Decision Records. Captures why a choice was made so future agents don't quietly reverse it.
Session logs
End-of-session summaries provide continuity between work sessions - across agents, across days, across context compactions.
The five concerns of AI harness engineering.
The common ground across the public harness engineering literature. goat-flow scores every installed harness against these five.
Sources: Mitchell Hashimoto, Birgitta BΓΆckeler (martinfowler.com), Anthropic engineering, and HumanLayer. goat-flow synthesises these into a working system with strong defaults.
From zero to passing audit in two commands.
Set up on any project, verify the harness, then start running skills through your agent of choice.
npx @blundergoat/goat-flow@latest dashboard
npx @blundergoat/goat-flow@latest audit --harness
Supports Claude Code, Codex, Gemini CLI, and Copilot CLI. Read the CLI docs β