Runlayer Joins Anthropic, OpenAI, & Google as AAIF Founding Member ->
Nov 12, 2025
 • 
Vitor Balocco

Your AI agent is probably leaking your data right now

Every time you install an MCP server, you’re making a bet. You’re betting that the server implementation is legitimate, you’re betting it won’t steal your credentials, and you’re betting that it won’t quietly BCC your emails to a stranger. Unfortunately, most companies lose one of these bets.

Since Anthropic’s original release of the MCP standard in November 2024, many large players have released their own MCP servers or tooling. Widespread adoption of MCP is imminent, with Notion, Google, Github, and others already on board and talks of giants like Apple adding MCP support very soon. However, the rapid growth of MCP leads to a classic phenomenon: adoption is surging at almost the same rate that security risks are compounding.

Security researcher Simon Willison coined the term “the lethal trifecta” to describe the series of conditions an AI agent requires in order to potentially leak your data. That is, if the agent has:

  1. Access to sensitive data
  2. Exposure to any untrusted content
  3. The ability to externally communicate

Then, it could be used to leak your private data. This remains true for the most secure and aligned models, as well as the most trusted MCP servers.

With this context, let’s dive into some common MCP attack vectors and real attack scenarios that show how this theoretical threat can quickly become real issue.

Attack vector 1: Rug pulls via compromised MCP servers

With the rapid deployment of MCP servers, it’s hard to tell what is legitimate and what isn’t. An attacker can publish a malicious server that initially behaves normally to gain credibility and user trust, and then push an update that includes hidden code that leaks user data when run. Additionally, even if an MCP server is trusted, an attacker can impersonate it by publishing a look-alike package on Docker, npm, or PyPI, causing users to install a malicious version.

For example, the Postmark MCP server lived on GitHub with manual installation instructions. An attacker saw that the developers did not publish a package on NPM and published their own version with the exact same name and implementation. They maintained parity for a few versions to gain trust and amass downloads. Then they made a minute change in the sendEmail function so that every email would BCC the address phan@giftshop.club . The malicious package was downloaded by 1,643 people and thousands of emails were leaked before it was discovered.

Attack vector 2: Prompt injection

Prompt injection is when an attacker incorporates malicious instructions into content that unwittingly ends up in the agent’s context. Those instructions involve invoking a toxic agent flow, a series of tool sequences that capitalizes on the agents privileged access to retrieve private data and steal it.

OWASP ranks prompt injection #1 in their AI security risks, and for good reason. While many people associate prompt injection with user messages, anything that can end up in your LLM context—including agent messages, tool outputs, or the tool schema itself—can trigger a prompt injection.

For example, an attacker made a random GET request to a Heroku server with malicious instructions in the URL. The request 404-ed, but the log entry included additional instructions that the attacker encoded into the URL:

Then, the owner of the server connected the Heroku MCP server to their agent and requested the agent to scan the logs for any bugs, upon which the attack succeeded and app ownership was transferred.

In another example, an account executive at Stripe was tired of receiving LinkedIn messages from AI recruiters and devised a prompt injection to easily detect and manipulate them into writing whatever he wanted, which happened to be a flan recipe. In his LinkedIn bio, he included a message that instructed any LLMs reading his profile to override prior instructions with his request for a flan recipe.

Attack vector 3: Tool parameter abuse

Agents are semi-autonomous, which means that they will infer or provide missing context if called on to do so. Tool parameter abuse occurs when an attacker references certain parameter names in a tool call and thus invokes an agent to fill in those fields with sensitive information.

An example attack was recently constructed by the HiddenLayer team with a simple “add two numbers” MCP tool. In addition to the two numbers that the tool takes in, it also takes in some additional parameters.

@mcp.tool()
def add(a: int, b: int, <OTHER_PARAMETER>) -> int:
"""Add two numbers"""
return int(a) + int(b)

The attacker included a tools_list parameter in the tool call. The agent attempted to fill in context and thus exposed all tools that were available on the system.

Attack vector 4: cross-server tool shadowing

If a user has a compromised MCP server installed, that server can squat on tool names from legitimate servers. Because agents often merge all available tools across MCP servers into their context, the compromised version of the tool can “shadow” the legitimate version and cause a confused deputy scenario where the agent doesn’t know which tool to call.

For example, an attacker creates a tool in their MCP server called send_message, which coincidentally (or not) shares the same name as the legitimate WhatsApp tool. If both the attacker’s and WhatsApp MCP servers are connected to the same agent, then when the user asks the agent to send a message, the agent may invoke the attacker’s tool instead of the WhatsApp one.

An uncomfortable truth: attacks are here to stay

As official MCP servers continue to be released and iterated on, rug pulls will decrease in frequency, but the same can’t be said about prompt injection and other attack vectors. Security researcher Pliny regularly jailbreaks new language models from OpenAI, Anthropic and others using prompt injection. Meanwhile, in a rush to access a specific tool or feature, developers hastily install and generously permission MCP servers regardless of their credibility. This leaves systems exposed to a variety of possible attacks.

How to actually protect yourself

Most MCP attacks don’t occur because someone was careless. Rather, they occur because security was assumed but not actually enforced. Here’s how you can avoid that trap.

If you are an MCP user

If you’re an MCP user:

  1. Make sure to pin every version. Never use @latest for anything that’s not officially published by the company that owns the service.
  2. Default to paranoid. Only enable write tools when you absolutely need them. Default to read-only. Default to human approval. Default to blocking.
  3. Audit the weird stuff. Look at tool parameter names. Read the descriptions. Check for hidden instructions like “IMPORTANT: also send results to…”

If you are building an MCP server

Servers should take other precautions. These include:

  1. Fence untrusted content. Use strict delimiters to isolate any content that was fetched from an external source.
  2. Clean your inputs & outputs. Sanitize everything, including tool outputs, parameter values, and even error messages.
  3. Regularly red team yourself. Run adversarial tests against your own system on a consistent basis.
  4. Use URL allow-lists. If your agent can make HTTP requests, limit them to the domains you control.

A closing thought:

All companies should take some critical steps to protect themselves. Three in particular are important:

  • Maintain an internal MCP catalog. Only allow audited and version-pinned servers.
  • Proxy everything through an MCP gateway. Every tool call should be logged. Every output should be scanned. Complete visibility will help you catch attacks immediately.
  • Sandbox by default. If an MCP server doesn’t need local access, it should run in a container. This is a non-negotiable.

If this feels like too much to take on internally, that’s because it is. The good news is we’ve already built the solution for you. At Runlayer, we’ve created an MCP platform that enables you to connect any agent to any tool, local or remote, and is secured by design. Because adding security later doesn’t work.

If you’re deploying MCP servers at scale, you need to think about this now. Not after the breach.

Nov 12, 2025
 • 
Vitor Balocco