Runlayer named to Rising in Cyber 2026 List by Morgan Stanley →
Vitor Balocco
MCP Security Risks: Your AI Agent is Probably Leaking Data Right Now

MCP Security Risks: Your AI Agent is Probably Leaking Data Right Now

Every time you install an MCP server, you’re making a bet. You’re betting that the server implementation is legitimate, you’re betting it won’t steal your credentials, and you’re betting that it won’t quietly BCC your emails to a stranger. Unfortunately, most companies lose one of these bets.

Since Anthropic’s original release of the MCP standard in November 2024, many large players have released their own MCP servers or tooling. Widespread adoption of MCP is imminent, with Notion, Google, Github, and others already on board and talks of giants like Apple adding MCP support very soon. However, the rapid growth of MCP leads to a classic phenomenon: adoption is surging at almost the same rate that security risks are compounding.

Security researcher Simon Willison coined the term “the lethal trifecta” to describe the series of conditions an AI agent requires in order to potentially leak your data. That is, if the agent has:

  1. Access to sensitive data
  2. Exposure to any untrusted content
  3. The ability to externally communicate

Then, it could be used to leak your private data. This remains true for the most secure and aligned models, as well as the most trusted MCP servers.

With this context, let’s dive into some common MCP attack vectors and real attack scenarios that show how this theoretical threat can quickly become real issue.

Attack vector 1: Rug pulls via compromised MCP servers

With the rapid deployment of MCP servers, it’s hard to tell what is legitimate and what isn’t. An attacker can publish a malicious server that initially behaves normally to gain credibility and user trust, and then push an update that includes hidden code that leaks user data when run. Additionally, even if an MCP server is trusted, an attacker can impersonate it by publishing a look-alike package on Docker, npm, or PyPI, causing users to install a malicious version.

For example, the Postmark MCP server lived on GitHub with manual installation instructions. An attacker saw that the developers did not publish a package on NPM and published their own version with the exact same name and implementation. They maintained parity for a few versions to gain trust and amass downloads. Then they made a minute change in the sendEmail function so that every email would BCC the address [email protected] . The malicious package was downloaded by 1,643 people and thousands of emails were leaked before it was discovered.

Attack vector 2: Prompt injection

Prompt injection is when an attacker incorporates malicious instructions into content that unwittingly ends up in the agent’s context. Those instructions involve invoking a toxic agent flow, a series of tool sequences that capitalizes on the agents privileged access to retrieve private data and steal it.

OWASP ranks prompt injection #1 in their AI security risks, and for good reason. While many people associate prompt injection with user messages, anything that can end up in your LLM context—including agent messages, tool outputs, or the tool schema itself—can trigger a prompt injection.

For example, an attacker made a random GET request to a Heroku server with malicious instructions in the URL. The request 404-ed, but the log entry included additional instructions that the attacker encoded into the URL:

Then, the owner of the server connected the Heroku MCP server to their agent and requested the agent to scan the logs for any bugs, upon which the attack succeeded and app ownership was transferred.

In another example, an account executive at Stripe was tired of receiving LinkedIn messages from AI recruiters and devised a prompt injection to easily detect and manipulate them into writing whatever he wanted, which happened to be a flan recipe. In his LinkedIn bio, he included a message that instructed any LLMs reading his profile to override prior instructions with his request for a flan recipe.

Attack vector 3: Tool parameter abuse

Agents are semi-autonomous, which means that they will infer or provide missing context if called on to do so. Tool parameter abuse occurs when an attacker references certain parameter names in a tool call and thus invokes an agent to fill in those fields with sensitive information.

An example attack was recently constructed by the HiddenLayer team with a simple “add two numbers” MCP tool. In addition to the two numbers that the tool takes in, it also takes in some additional parameters.

@mcp.tool()
def add(a: int, b: int, <OTHER_PARAMETER>) -> int:
"""Add two numbers"""
return int(a) + int(b)

The attacker included a tools_list parameter in the tool call. The agent attempted to fill in context and thus exposed all tools that were available on the system.

Attack vector 4: cross-server tool shadowing

If a user has a compromised MCP server installed, that server can squat on tool names from legitimate servers. Because agents often merge all available tools across MCP servers into their context, the compromised version of the tool can “shadow” the legitimate version and cause a confused deputy scenario where the agent doesn’t know which tool to call.

For example, an attacker creates a tool in their MCP server called send_message, which coincidentally (or not) shares the same name as the legitimate WhatsApp tool. If both the attacker’s and WhatsApp MCP servers are connected to the same agent, then when the user asks the agent to send a message, the agent may invoke the attacker’s tool instead of the WhatsApp one.

An uncomfortable truth: attacks are here to stay

As official MCP servers continue to be released and iterated on, rug pulls will decrease in frequency, but the same can’t be said about prompt injection and other attack vectors. Security researcher Pliny regularly jailbreaks new language models from OpenAI, Anthropic and others using prompt injection. Meanwhile, in a rush to access a specific tool or feature, developers hastily install and generously permission MCP servers regardless of their credibility. This leaves systems exposed to a variety of possible attacks.

How to actually protect yourself

Most MCP attacks don’t occur because someone was careless. Rather, they occur because security was assumed but not actually enforced. Here’s how you can avoid that trap.

If you are an MCP user

  1. Make sure to pin every version. Never use @latest for anything that’s not officially published by the company that owns the service.
  2. Default to paranoid. Only enable write tools when you absolutely need them. Default to read-only. Default to human approval. Default to blocking.
  3. Audit the weird stuff. Look at tool parameter names. Read the descriptions. Check for hidden instructions like “IMPORTANT: also send results to…”

If you are building an MCP server

Servers should take other precautions. These include:

  1. Fence untrusted content. Use strict delimiters to isolate any content that was fetched from an external source.
  2. Clean your inputs & outputs. Sanitize everything, including tool outputs, parameter values, and even error messages.
  3. Regularly red team yourself. Run adversarial tests against your own system on a consistent basis.
  4. Use URL allow-lists. If your agent can make HTTP requests, limit them to the domains you control.

A closing thought:

All companies should take some critical steps to protect themselves. Three in particular are important:

  • Maintain an internal MCP catalog. Only allow audited and version-pinned servers.
  • Proxy everything through an MCP gateway. Every tool call should be logged. Every output should be scanned. Complete visibility will help you catch attacks immediately.
  • Sandbox by default. If an MCP server doesn’t need local access, it should run in a container. This is a non-negotiable.

If this feels like too much to take on internally, that’s because it is. The good news is we’ve already built the solution for you. At Runlayer, we’ve created an MCP platform that enables you to connect any agent to any tool, local or remote, and is secured by design. Because adding security later doesn’t work.

If you’re deploying MCP servers at scale, you need to think about this now. Not after the breach.

Nov 12, 2025
 • 
Vitor Balocco
Read more
Runlayer named to Rising in Cyber 2026

Runlayer named to Rising in Cyber 2026

Runlayer was named to Notable Capital & Morgan Stanley's 2026 Rising in Cyber list, voted on by 150 sitting CISOs. Andy Berman on why the recognition matters, and what it signals about how AI-native companies are getting built.
May 12, 2026
 • 
Andy Berman
Why production AI systems need MCP gateways

Why production AI systems need MCP gateways

An MCP gateway acts as the centralized proxy layer for agent-to-tool communications, handling tool discovery, authentication, input/output filtering, and observability across an organization's agentic systems.
May 11, 2026
 • 
Tal Peretz
The MCP STDIO RCE class, and why Runlayer doesn't run what the LLM asks it to

The MCP STDIO RCE class, and why Runlayer doesn't run what the LLM asks it to

OX Security found a design-level flaw in Anthropic's Model Context Protocol. MCP's STDIO transport turns a config file into a command executor. Here's how Runlayer's control plane breaks each of the four attack vectors.
Apr 22, 2026
 • 
Alex Frazer
Runlayer and AARM Partner to Secure Enterprise Agents

Runlayer and AARM Partner to Secure Enterprise Agents

Runlayer achieves AARM Extended Conformance (R1–R9), partnering with the Vanta-backed open specification to define how enterprises secure AI agents at runtime.
Apr 15, 2026
 • 
Tal Peretz
What Project Glasswing means for enterprise security

What Project Glasswing means for enterprise security

What Project Glasswing and Claude Mythos mean for enterprise security teams, and why your patch workflows, dependency management, and MCP governance need to evolve now.
Apr 11, 2026
 • 
Tal Peretz
The Danger of Fake MCP Servers

The Danger of Fake MCP Servers

Fake MCP servers pose a growing security risk, enabling data leaks, tool poisoning, and compromised AI behavior. Learn how these attacks work and how organizations can prevent them with proper controls and monitoring.
Apr 7, 2026
 • 
Tal Peretz
Runlayer + 1Password: Secure Credential Access for AI Agents

Runlayer + 1Password: Secure Credential Access for AI Agents

Runlayer and 1Password partner to bring secure, auditable credential access to autonomous AI agents. The integration lets enterprises inject secrets from 1Password vaults into agent sessions managed by Runlayer, replacing plaintext .env files with centralized governance, real-time retrieval, and full audit logging across human and non-human identities.
Mar 17, 2026
 • 
Tal Peretz
Honestly, MCP doesn’t “suck”

Honestly, MCP doesn’t “suck”

Garry Tan recently argued that MCP “sucks,” citing context-window bloat and weak authentication. This article breaks down why those criticisms miss the mark—and why MCP remains the better foundation for agents operating at enterprise scale.
Mar 12, 2026
 • 
Vitor Balocco
FGA is not enough for your agent authorization

FGA is not enough for your agent authorization

PBAC beats FGA for agent authorization — context-aware, auditable, asymmetric access control without graph complexity.
Mar 9, 2026
 • 
Alvaro Inckot
Scale MCP with Dynamic Tool use

Scale MCP with Dynamic Tool use

Dynamic tool use cuts token waste from MCP by replacing bulk tool loading with lightweight search, saving cost without custom implementation.
Feb 20, 2026
 • 
Vitor Balocco
OpenAI Agent Builder’s MCP Problem

OpenAI Agent Builder’s MCP Problem

OpenAI AgentKit/Agent Builder launched in Oct 2025 but, despite early hype, its limited integrations and weak security (e.g., unverified MCP servers, no namespace isolation, insufficient guardrails) create a large enterprise attack surface—prompting calls for controls like a trusted MCP catalog, tool gateway auditing, RBAC/least privilege, and stronger governance (e.g., via Runlayer).
Feb 19, 2026
 • 
Tal Peretz
Pwning OpenClaw in 50 Messages: Social Engineering Claude Opus Into Handing Over the Keys

Pwning OpenClaw in 50 Messages: Social Engineering Claude Opus Into Handing Over the Keys

A Claude Opus–powered OpenClaw agent with Slack and shell access was social-engineered in ~50 messages to rebind its UI, install ngrok, expose the dashboard publicly, reveal its gateway token, and approve the attacker’s device.
Feb 16, 2026
 • 
Alex Frazer
Unpacking the OWASP Top 10 for MCP

Unpacking the OWASP Top 10 for MCP

An overview of the OWASP MCP Top 10, highlighting the biggest security risks in MCP-enabled AI systems and the key safeguards teams can use to prevent them.
Feb 10, 2026
 • 
Alex Frazer
MCP Apps highlight the power of protocol governance

MCP Apps highlight the power of protocol governance

MCP Apps let tools render interactive UIs directly in chat via the same MCP protocol—not a new execution path. With Runlayer intercepting tool calls, resource fetches, and auth headers, existing MCP security controls apply from day one.
Jan 30, 2026
 • 
Tal Peretz
Announcing Box and Runlayer's partnership on Enterprise MCP

Announcing Box and Runlayer's partnership on Enterprise MCP

Connect AI agents to Box content with enterprise security. The official Box MCP server is live in the Runlayer marketplace, with identity enforcement, audit logging, and threat detection built in. Box customers can find Runlayer in the Box Integrations Center. Setup takes minutes.
Jan 27, 2026
 • 
Aidan Sochowski
MCP vs CLI Tools: Which is best for production applications?

MCP vs CLI Tools: Which is best for production applications?

CLI tools feel familiar to AI agents, but they break down in production due to brittle syntax, poor state management, and dangerous security assumptions. This post explains why CLI-based agent workflows fail and how a single-tool MCP using a known programming language offers a more reliable and secure alternative.
Jan 25, 2026
 • 
Vitor Balocco
Runlayer Product Update: 1.25.0

Runlayer Product Update: 1.25.0

This update is about momentum: moving faster in the CLI, getting clearer visibility into what’s running, and debugging with less friction. Expect smoother workflows, better control, and fewer surprises as you build and ship.
Jan 23, 2026
 • 
Engineering
MCP Prompt Injection Attacks: How to Protect Your AI Agents

MCP Prompt Injection Attacks: How to Protect Your AI Agents

Two near-invisible prompt injection attacks showed how attackers can bypass default enterprise guardrails and trigger silent, ongoing data exfiltration by exploiting user and model trust. Runlayer blocks these attacks by treating every input as untrusted until it passes continuously updated security models trained on the latest real-world exploits.
Jan 19, 2026
 • 
Jake Moghtader
Cursor Hooks + MCP Security: Official Runlayer Partnership Announcement

Cursor Hooks + MCP Security: Official Runlayer Partnership Announcement

Runlayer is an official Cursor Hooks launch partner. With Cursor Hooks, securely allow or deny MCP tool calls with Runlayer's enterprise MCP platform.
Dec 18, 2025
 • 
Marcin Jan Puhacz
The main takeaways from GitHub’s MCP Vulnerability

The main takeaways from GitHub’s MCP Vulnerability

GitHub’s MCP vulnerability revealed how AI agents can be weaponized through poisoned context in public repositories. This post analyzes the exploit, explains why permissions alone aren’t enough, and shares practical guardrails for preventing and mitigating agent-driven data exfiltration.
Dec 16, 2025
 • 
Vitor Balocco
Runlayer Joins Anthropic, OpenAI, & Google as AAIF Founding Member

Runlayer Joins Anthropic, OpenAI, & Google as AAIF Founding Member

The Linux Foundation has launched the Agentic Artificial Intelligence Foundation (AAIF), with Runlayer joining sponsors Anthropic, OpenAI, Google, AWS, Microsoft. AAIF now oversees the Model Context Protocol (MCP), reinforcing MCP as a rising standard for AI agent integration. Runlayer supports AAIF’s open, secure, and scalable AI development mission.
Dec 9, 2025
 • 
Andy Berman
Runlayer Raises $11M to Scale Enterprise MCP Infrastructure

Runlayer Raises $11M to Scale Enterprise MCP Infrastructure

Nov 17, 2025
 • 
Andy Berman
Why MCP builders are transitioning from DCR to OAuth CIMD

Why MCP builders are transitioning from DCR to OAuth CIMD

Over the last year, MCP has surged in adoption. To little surprise, this has introduced some scaling issues. One of these is client registration; previously, systems were rigged together by humans. Today, AI agents discover and interface with MCP servers freely, requiring a new paradigm for client communications.
Nov 7, 2025
 • 
Vitor Balocco
What is Dynamic Client Registration?

What is Dynamic Client Registration?

Tired of manually registering every AI agent with every OAuth server? Dynamic Client Registration (DCR) lets your agents authenticate with MCP servers at runtime, no human clicks required. Learn how DCR works, when to use it over traditional OAuth, and why it's becoming essential for scalable agentic systems.
Nov 6, 2025
 • 
Vitor Balocco