You want AI to review pull requests. Not as a rubber stamp — as a real reviewer that catches logic errors, security gaps, and missing test coverage before a human ever opens the diff. And you want it to post feedback directly to the PR, seconds after the commit lands.
The tension is obvious: AI agents are unpredictable. They hallucinate. They respond to prompt injection. Give them write access to your repo, and a misconfigured prompt or a malicious issue description could trigger a flood of broken comments, spam PRs, or worse. Restrict them to read-only, and they cannot post feedback at all.
GitHub Agentic Workflows, launched in technical preview in February 2026, solve this problem by separating agent reasoning from write authority. The agent runs read-only, proposes changes through a structured artifact, and a separate permission-scoped job validates, sanitizes, and executes only what the workflow author explicitly allows. It's a paradigm shift from 'clever prompts' to 'strong guardrails.'
This guide walks you through building a production-ready AI code review workflow using GitHub Agentic Workflows, explains why the security model matters, and shows you how to avoid the trap of overpermissioning.
The problem with traditional Actions + AI#
Most teams today use a standard GitHub Actions workflow. The permission problem: you grant the review job pull-requests: write. If the agent gets confused — or if an attacker injects a malicious instruction through a PR description — it can comment on any PR, add arbitrary labels, or even approve or close PRs. The agent holds broad write authority from the moment it starts.
The audit problem: by the time you notice something went wrong, the damage is already done. There's no staged decision point where a human or automated validator can review the agent's intent before it executes.
The secret problem: the agent process holds your OpenAI API key in memory. If prompt injection tricks it into running shell commands, it can read and exfiltrate that key.
Agentic Workflows change all three.
How safe-outputs flip the security model#
GitHub Agentic Workflows use a 'safe-outputs' architecture: the agent and executor are separate jobs with separate permissions, connected only through a structured data artifact.
- The agent job runs read-only. It has contents: read, pull-requests: read, nothing else. No write tokens. No secrets.
- The agent proposes changes. Instead of calling GitHub APIs directly, it records intended actions in JSON: 'create a review comment on line 42 with this message.'
- A separate validation job runs. Before anything is written to GitHub, threat-detection scans for prompt injection, secret leaks, and malicious code patterns.
- A permission-controlled execution job runs approved changes. This job holds actual write credentials — but only if safe-outputs configuration allows it.
The benefit: even if the agent is compromised, it cannot write to GitHub. The blast radius shrinks from uncontrolled write access to proposed changes that must pass validation first.
Setting up GitHub Agentic Workflows#
Prerequisites: GitHub repository with Actions enabled, GitHub CLI, gh-aw extension, and shell access.
Step 1: Install gh-aw
gh extension install github/gh-aw
gh aw --version
Step 2: Create fine-grained Personal Access Token scoped to your repository with Copilot Requests permission.
Step 3: Initialize
gh aw init
gh aw auth --engine claude
Step 4: Create .github/agentic-workflows/pr-review.md with YAML frontmatter defining permissions, tools, engine, and safe-outputs.
Step 5: Compile
gh aw compile .github/agentic-workflows/pr-review.md
Step 6: Test on a PR.
Understanding safe-outputs#
The safe-outputs block defines guardrails. Example: max: 3 limits agent to 3 review comments per run. max: 1 limits summary comments. If agent tries to post 100 comments, safe-outputs blocks it.
Other constraints: title-prefix for issues, labels to apply, expires for auto-closing, draft mode for PRs.
Threat detection#
Before output is applied, threat detection scans for prompt injection, secret leaks, and malicious code. If detected, workflow fails and nothing is posted.
Complete workflow execution#
When PR opens: trigger fires, agent job runs with read-only permissions, fetches diff, analyzes code, records intended comments in JSON. Threat detection scans output. Safe-outputs job posts approved comments, enforcing max limits. Audit trail shows every decision.
Common mistakes#
- Giving agent job write permissions — defeats purpose
- Not setting max limits — agent could spam
- Trusting agent on file importance — can hallucinate
- Running on draft PRs — flags known issues
When to use AI code review#
Good: large diffs, pattern violations, security checklist items, test coverage gaps, performance issues. Not good: architecture, novel algorithms, security-critical code, domain-specific logic.
Rule: AI for first-pass filtering, humans for final gate.
Before deploying#
- Test on fork
- Review lock file
- Verify max limits
- Confirm read-only permissions
- Test prompt injection
- Monitor initial runs
Deployment#
Merge to main, enable in other repos, monitor costs, adjust based on feedback, keep human review required.
Limitations#
Agent hallucination: may suggest incorrect fixes. Large diffs confuse context. Threat detection strong but not foolproof. Limits prevent spam, not bad reviews.
Conclusion#
GitHub Agentic Workflows bring safe AI code review to CI/CD. Separate reasoning from write authority. Stage, validate, execute. Audit trails and guardrails throughout. AI does grunt work, humans focus on architecture and logic.
Comments
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.