AppSec2026-05-069 min read

Prompt Injection in Agentic AI: The 2026 Vulnerability Class That Acts Like Remote Code Execution

Agentic AI systems combining LLMs with tool use and persistent memory have created a new vulnerability class. When the agent has shell or API access, prompt injection behaves like RCE.

Agentic AI Crossed a Threshold in 2025

Agentic AI systems combine three capabilities: an LLM as the reasoning engine, tool use for action (filesystem, shell, APIs, browser), and persistent memory or context across sessions. Anthropic's Claude with computer use (released October 2024), OpenAI's o-series with function calling and code execution, and the proliferation of Model Context Protocol (MCP) servers through 2025 moved this from research demo to production infrastructure.

Production agentic AI introduces a vulnerability class that traditional AppSec scanners were not built for. The vulnerability is not in the LLM weights. It is in the application code that grants the LLM tool access without treating LLM output as untrusted input.

Prompt Injection Is the New Injection Class

Three patterns have emerged:

Direct prompt injection: A user tells the agent to ignore its system prompt and do something else. This is the chatbot-era version. Mitigations are mostly product-design choices: refuse to follow user-supplied instructions that contradict the system prompt.

Indirect prompt injection: A document, web page, email, or file the agent processes contains instructions that the agent follows. The user did not write the instructions. The agent received them through retrieval or tool output. This is the version that breaks production systems.

Stored prompt injection: An attacker inserts instructions into a data store (vector database, document repository, ticket system) that the agent later retrieves. The injection persists across sessions and triggers when the agent processes the poisoned content.

The threat model that matters: any data flowing into the agent's context window from a source the user did not directly type is potentially attacker-controlled. PDF attachments, web pages, search results, database rows, log entries, even file names returned from a directory listing.

Why Tool Use Turns This Into RCE

A chatbot that follows malicious instructions can output bad text. An agent with shell access can run arbitrary commands. An agent with API access can make requests on behalf of a privileged identity. An agent with filesystem access can read and write files at the agent's permission level.

Representative attack patterns documented in security research and red-team writeups (Simon Willison's prompt injection series, Embrace The Red, NCC Group LLM tooling research, and the OWASP LLM Top 10 case studies):

  1. Agent reads a support ticket or email containing injected instructions, then calls an internal API that returns customer PII, then writes that PII to an attacker-controlled URL via a fetch tool.

  2. Agent processes a document or code review containing comments instructing it to read sensitive files (for example, SSH private keys or environment files) using its shell or filesystem tool.

  3. Agent retrieves a document from a poisoned knowledge base or vector store that instructs it to send the conversation history to an external webhook.

These scenarios are drawn from published proof-of-concept work and threat modeling guidance rather than disclosed production incidents. The class of attack is well established; the question for any team deploying agentic AI is whether their tool surface makes their variant of these scenarios viable.

The CWE Mapping

The OWASP LLM Top 10 codified prompt injection as LLM01, but the more useful framing for AppSec is mapping it to existing CWE categories:

CWE-20 (Improper Input Validation): LLM output is unvalidated input to downstream tool invocations. The agent's "decision" to call a tool is functionally an unvalidated user input from the application's perspective.

CWE-77 (Command Injection): When the LLM constructs a shell command from untrusted context, classic command injection applies. Sanitization is harder because the injection point is the LLM's reasoning, not a single user-controlled string.

CWE-918 (Server-Side Request Forgery): Agents with fetch capabilities are SSRF vectors. The agent can be instructed to request internal endpoints, cloud metadata services (AWS IMDS at 169.254.169.254), or attacker-controlled URLs.

CWE-863 (Incorrect Authorization): Tools invoked by the agent typically run with the agent's privileges, not the end user's. Authorization checks that assume "the agent has decided this is appropriate" fail when the agent has been manipulated.

CWE-94 (Improper Control of Generation of Code): Code execution tools (Python sandboxes, code interpreter, dynamic eval) called by manipulated agents are direct code injection vectors.

The MCP Server Problem

Model Context Protocol, introduced by Anthropic in November 2024 and widely adopted through 2025, standardized how agents connect to tools and data sources. The protocol's success made MCP servers a high-leverage attack surface. A single compromised MCP server can affect every agent that connects to it.

Specific concerns at the code level:

  1. Server tool descriptions are trusted. Most MCP clients render tool descriptions to the agent as authoritative. A malicious or compromised MCP server can describe a benign-looking tool that performs malicious actions.

  2. No standard authorization model. Tool invocations across MCP servers often inherit the agent's full identity. There is no MCP-native equivalent of OAuth scopes per tool, so authorization decisions land in the application layer.

  3. Tool output flows into context. Output from one MCP tool becomes input to the agent's next reasoning step. This is the indirect prompt injection vector at protocol scale.

Code-Level Defenses

The defenses that actually work in production:

Tool sandboxing: Each tool runs with the minimum privilege required, not the agent's full privilege. Filesystem tools chroot to a specific directory. Shell tools run in containers with no network access. API tools authenticate per-call with scoped credentials.

Output validation at tool boundaries: When tool output flows back into the agent's context, validate it. Strip executable patterns. Verify URLs against allowlists before fetch tools follow them. Detect injection markers in retrieved documents.

Action confirmation for sensitive operations: High-impact tools (delete, send, deploy, transfer funds) require human confirmation regardless of the agent's confidence. This is a UX cost worth paying.

Audit logging for agent decisions: Every tool invocation is logged with the context that drove it. Forensics requires reconstructing what the agent saw when it chose to act.

Context segmentation: System prompts, user input, and tool output occupy distinct context regions that the application code can distinguish. Trusting "everything in the context window equally" is the design error.

How Deva Addresses Agentic AI Risk

Deva's detection rules cover the code patterns that connect LLM output to tool invocations without validation. The OWASP LLM Top 10 preset includes LLM01 (Prompt Injection) and LLM02 (Insecure Output Handling), mapped to the underlying CWE categories so remediation guidance is concrete.

For MCP server code, the scanner identifies tools that pass untrusted context directly to shell execution, fetch, or eval-equivalent functions. The fix generator suggests tool sandboxing patterns appropriate to the runtime.

The local model also matters here. An agentic AI security scanner that sends code to a cloud LLM is auditing one supply chain with another. Deva's local-first model lets security teams analyze their agentic systems without expanding the trust boundary they are trying to evaluate.

FAQ

Frequently asked questions

What is prompt injection?
Prompt injection is an attack where text supplied to an LLM (directly by a user or indirectly through retrieved content) overrides the model's instructions or causes it to perform unintended actions. In an agentic system with tool access, those actions can include arbitrary shell, filesystem, or API operations.
Why is prompt injection like remote code execution?
When an LLM has tool access (shell, fetch, filesystem, code execution) and follows injected instructions, the model becomes a remote command interpreter for the attacker. The agent's privileges, not the user's, determine the blast radius, which is functionally identical to traditional RCE.
What is indirect prompt injection?
Indirect prompt injection happens when the agent processes attacker-controlled content from a document, web page, email, ticket, or database row, and that content contains instructions the agent follows. The user never typed the instructions. The agent received them through retrieval or tool output.
How do you defend against prompt injection in agentic AI?
The defenses that work in production: sandbox each tool to least privilege, validate tool output before it re-enters the agent's context, require explicit human confirmation for high-impact operations (delete, send, deploy, transfer), and segment system prompts, user input, and tool output into distinguishable context regions.
PostShare

Summer Ann

Threat research, application security analysis, and defensive engineering insights from the DevSecCode team.

Related Articles

Discussion

Loading comments...