Three AI coding agents just leaked their own API keys through a GitHub pull request title.
The Summary
- Security researchers at Johns Hopkins exploited prompt injection to make Anthropic's Claude, Google's Gemini, and GitHub's Copilot agents leak their API keys through GitHub Actions. The attack vector: malicious text in a PR title.
- Anthropic rated it CVSS 9.4 Critical, then paid a $100 bounty. Google paid $1,337. GitHub paid $500.
- Anthropic's own system card predicted this exact vulnerability, noting the feature was "not hardened against prompt injection."
- No CVEs issued. All three vendors patched quietly.
The Signal
The "Comment and Control" attack is elegant in its simplicity. A researcher opens a pull request, types a malicious instruction into the PR title, and the AI agent reading that title executes it. The agent then posts its own API key as a comment on the PR. No external infrastructure. No complex exploit chain. Just text in a field that developers assumed was safe input.
The vulnerability lives in GitHub Actions workflows using the pull_request_target trigger, which most AI agent integrations require because they need access to repository secrets. Standard pull_request triggers don't expose secrets to fork PRs by default, but pull_request_target does. That design choice, necessary for agent functionality, creates the attack surface.
"Anthropic's own system card acknowledged the feature is 'not hardened against prompt injection.'"
Here's what makes this story worth your attention: Anthropic knew. Their system card explicitly called out that Claude Code Security Review wasn't hardened against prompt injection. The documentation noted the feature processes "trusted first-party inputs by default" and that users who enable it for external PRs "accept additional risk." This wasn't an unknown unknown. This was documented, accepted risk that turned into a CVSS 9.4 Critical vulnerability when researchers demonstrated the attack in practice.
The bounty amounts tell you how vendors are thinking about agent security. Anthropic paid $100 for a Critical-rated vulnerability. Not because they're cheap, but because their HackerOne program scopes agent-tooling findings separately from model-safety vulnerabilities. Google paid $1,337. GitHub paid $500. Compare those numbers to the six-figure payouts for remote code execution vulnerabilities in traditional software. The market hasn't calibrated yet on what agent security bugs are worth.
Key facts about the disclosure:
- All three vendors patched quietly
- None issued CVEs in the National Vulnerability Database as of publication
- No public security advisories through GitHub Security Advisories
- The attack works on any repo using pull_request_target with an AI coding agent
The Implication
If you're running AI agents in production, audit every workflow using pull_request_target triggers. Assume any text field an agent reads, including PR titles, issue descriptions, and commit messages, is hostile input. The fact that Anthropic documented the risk and still shipped the feature tells you where we are in the agent maturity curve: functionality first, hardening later.
Watch for two things. First, whether vendors start treating agent prompt injection vulnerabilities with the same severity scoring as traditional security bugs. The gap between "CVSS 9.4 Critical" and "$100 bounty" suggests they don't yet. Second, whether we see standardized agent runtime security frameworks emerge. Right now, every vendor is inventing their own approach to agent sandboxing, permission models, and input validation. That's expensive and error-prone. Someone will build the standard stack.