Set up production-ready AI code review using GitHub Agentic Workflows' safe-outputs security model, threat detection, and compile-time constraints.
You want AI to review pull requests. Not as a rubber stamp — as a real reviewer that catches logic errors, security gaps, and missing test coverage before a human ever opens the diff. And you want it to post feedback directly to the PR, seconds after the commit lands.
The tension is obvious: AI agents are unpredictable. They hallucinate. They respond to prompt injection. Give them write access to your repo, and a misconfigured prompt or a malicious issue description could trigger a flood of broken comments, spam PRs, or worse. Restrict them to read-only, and they cannot post feedback at all.
GitHub Agentic Workflows, launched in technical preview in February 2026, solve this problem by separating agent reasoning from write authority. The agent runs read-only, proposes changes through a structured artifact, and a separate permission-scoped job validates, sanitizes, and executes only what the workflow author explicitly allows. It's a paradigm shift from 'clever prompts' to 'strong guardrails.'
This guide walks you through building a production-ready AI code review workflow using GitHub Agentic Workflows, explains why the security model matters, and shows you how to avoid the trap of overpermissioning.
Most teams today use a standard GitHub Actions workflow. The permission problem: you grant the review job pull-requests: write. If the agent gets confused — or if an attacker injects a malicious instruction through a PR description — it can comment on any PR, add arbitrary labels, or even approve or close PRs. The agent holds broad write authority from the moment it starts.
The audit problem: by the time you notice something went wrong, the damage is already done. There's no staged decision point where a human or automated validator can review the agent's intent before it executes.
The secret problem: the agent process holds your OpenAI API key in memory. If prompt injection tricks it into running shell commands, it can read and exfiltrate that key.
Agentic Workflows change all three.
GitHub Agentic Workflows use a 'safe-outputs' architecture: the agent and executor are separate jobs with separate permissions, connected only through a structured data artifact.
The benefit: even if the agent is compromised, it cannot write to GitHub. The blast radius shrinks from uncontrolled write access to proposed changes that must pass validation first.
Prerequisites: GitHub repository with Actions enabled, GitHub CLI, gh-aw extension, and shell access.
Step 1: Install gh-aw
gh extension install github/gh-aw
gh aw --version
Step 2: Create fine-grained Personal Access Token scoped to your repository with Copilot Requests permission.
Step 3: Initialize
gh aw init
gh aw auth --engine claude
Step 4: Create .github/agentic-workflows/pr-review.md with YAML frontmatter defining permissions, tools, engine, and safe-outputs.
Step 5: Compile
gh aw compile .github/agentic-workflows/pr-review.md
Step 6: Test on a PR.
The safe-outputs block defines guardrails. Example: max: 3 limits agent to 3 review comments per run. max: 1 limits summary comments. If agent tries to post 100 comments, safe-outputs blocks it.
Other constraints: title-prefix for issues, labels to apply, expires for auto-closing, draft mode for PRs.
Before output is applied, threat detection scans for prompt injection, secret leaks, and malicious code. If detected, workflow fails and nothing is posted.
When PR opens: trigger fires, agent job runs with read-only permissions, fetches diff, analyzes code, records intended comments in JSON. Threat detection scans output. Safe-outputs job posts approved comments, enforcing max limits. Audit trail shows every decision.
Good: large diffs, pattern violations, security checklist items, test coverage gaps, performance issues. Not good: architecture, novel algorithms, security-critical code, domain-specific logic.
Rule: AI for first-pass filtering, humans for final gate.
Merge to main, enable in other repos, monitor costs, adjust based on feedback, keep human review required.
Agent hallucination: may suggest incorrect fixes. Large diffs confuse context. Threat detection strong but not foolproof. Limits prevent spam, not bad reviews.
GitHub Agentic Workflows bring safe AI code review to CI/CD. Separate reasoning from write authority. Stage, validate, execute. Audit trails and guardrails throughout. AI does grunt work, humans focus on architecture and logic.
Comments
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.