Triage Security: Addressing architectural security risks in Model Context Protocol integrations

Organizations rushing to connect large language models (LLMs) to external data sources and services using the Model Context Protocol (MCP) are inadvertently expanding their exposure areas in ways that traditional security controls are not designed to manage.

Because these risks stem from the foundational architecture of both LLMs and MCP, security teams cannot resolve them through standard patching or basic configuration changes. Gianpietro Cutolo, a cloud threat researcher at Netskope, is scheduled to detail these findings during a session at the RSAC 2026 Conference in San Francisco, emphasizing the need for structural safeguards.

Core architectural challenges

The primary issue lies in how an LLM's operational behavior changes when integrated with MCP. In a standard deployment, an LLM receives a prompt and generates a text response for a user to review. Historically, the primary security risk in this dynamic was an inaccurate or hallucinated response.

MCP fundamentally alters this paradigm. Instead of merely generating text, the LLM executes actions on behalf of the user. In an MCP-enabled environment, an LLM can autonomously access enterprise data, trigger workflows, and interact with APIs.

For example, a user might ask an AI assistant, such as Claude or ChatGPT, to schedule a meeting. The model can use an MCP connector for Google Calendar to check availability, create the event, and set a reminder without manual intervention. The model itself selects which published functions to use—such as fetching emails, creating calendar events, or searching local files—and determines the exact parameters for those actions. While MCP connectors allow organizations to extend the utility of their AI services, this execution capability introduces new security requirements.

One foundational challenge is that LLMs process content and instructions through the same context window. When an MCP connector retrieves content from an external source, such as a document or email, the LLM evaluates the entire payload as input. This creates an opening for an unauthorized party to embed hidden instructions within otherwise legitimate content.

If a threat actor sends an email containing both standard text and hidden instructions, and the user asks their AI assistant to summarize that email, the MCP connector injects the entire message into the LLM's context. Unable to separate the data from the directive, the LLM may execute the hidden instruction. This could result in the model exporting local files or sending emails without the user's knowledge. The impact of this process, known as indirect prompt injection, scales significantly in environments where a single agent maintains active MCP connections to local drives, Jira tickets, and cloud storage. A single email containing unauthorized instructions could initiate coordinated actions across all connected services simultaneously.

A second risk category involves tool metadata manipulation, sometimes referred to as tool poisoning. When an LLM connects to an MCP server, it requests a list of supported tools, including their names and input requirements. This metadata feeds directly into the LLM context window. An unauthorized party can embed unsafe instructions within this tool metadata, which the LLM will again process as functional directives.

A third risk, categorized by Cutolo as a "Rug Pull," occurs when an MCP server undergoes an unauthorized modification. The current protocol lacks a native mechanism to notify an MCP client or AI agent when a server's underlying logic changes. If an established MCP server is altered via a modified update, it can begin serving unsafe tool descriptions that direct the AI agent to take unintended actions, with the client having no immediate visibility into the alteration.

Developing an architectural defense

Because these behaviors are inherent to how LLMs and MCP operate, organizations must implement defense-in-depth strategies rather than relying on software patches.

To mitigate indirect prompt injection, organizations should physically or logically separate MCP servers that handle public data including those with access to private enterprise information. Security teams should implement scanning mechanisms and detect instruction-like patterns, hidden text, and unusual formatting within any context the agent might process. Additionally, maintaining strict human-in-the-loop requirements for all sensitive actions ensures that critical operations are explicitly authorized.

For broader environmental protection, organizations should maintain a comprehensive inventory of every MCP server and rigorously enforce least-privilege permissions, ensuring each connector can only access the specific resources required for its function. Logging all MCP traffic and establishing behavioral baselines will allow security teams to detect when an AI agent's activity deviates from expected patterns. Finally, to defend against metadata manipulation, security teams should systematically scan all tool metadata for unauthorized instructions before approving the installation of any MCP server.