Runlayer named to Rising in Cyber 2026 List by Morgan Stanley →
Jake Moghtader
MCP Prompt Injection Attacks: How to Protect Your AI Agents

MCP Prompt Injection Attacks: How to Protect Your AI Agents

Last week, there were two near-invisible prompt injection attacks that slipped past the default enterprise guardrails. Last Wednesday, Varonis Threat Labs revealed a massive exploit where attackers could steal endless private information via Microsoft Copilot. Hours later, PromptArmor revealed how Claude Cowork can be manipulated to share sensitive information despite built-in prompt injection prevention.

Both of these attacks exploit trust. The AI trusts the user’s input, and the user unwittingly trusts a convincing Copilot phishing link or free resource to upload to Claude Code.

Both of these attacks were also foiled by Runlayer. Reprompt fails at the first phishing link. The Claude Cowork .docx exploit gets caught before the file upload. Here's how: Runlayer sits between users and AI systems. All inputs—even those trusted by users—aren’t trusted by Runlayer until they pass our security models that are continuously trained on the daily exploits.

Trusting built-in safeguards isn’t enough

In the past, stealing data relied on a mistake. Attackers might’ve exploited an open API gateway, dispatched a privileged script, or phished users with a deceptive OAuth link. However, these attacks were limited by nature: attackers could only access whatever was within the exploit’s domain. If an attacker gained entry to a list-documents endpoint, they could only extract document file names.

Now, attackers just need to dupe a user into including their natural language prompts. By hijacking the user’s privileged context, attackers can often access anything. A single mistake could escalate into an ongoing silent data leak.

Agent providers, like Microsoft and Anthropic, have responded to this with passable safeguards to scan for exfiltrated data or untrustworthy prompts. However, these safeguards have shortcomings. Previously, APIs could be secured by carefully defining the authorization boundary. Now, developers need to play a fuzzy war of words. Security measures are reduced to a “does this look right?” analysis that could be evaded by trial-and-error. It’s akin to how video game servers forbid profanity, but kids get clever with emojis or special characters to sneak past them. Universal security guardrails are going to be tricked.

Today’s attackers have one priority: evading this first barrier of guardrails. After that, they need to do the easy part: convincing at least one user to fall for a phishing link or legitimate-appearing file.

Last Week’s Attack 1: The infinite Microsoft Copilot exploit

Varonis Threat Labs demonstrated how attackers can bypass Copilot’s built-in mechanisms that would otherwise stop prompt injections. Dubbed Reprompt, this attack is dangerous. The user just needs to be fooled once into clicking a phishing link. It might appear legitimate, such as a link with an icon on an imitative website or email. After the user clicks the link, the attacker uses URL parameters and a chain of requests to continuously exfiltrate secret data.

Reprompt starts with a legitimate-appearing prompt. Only on further requests, which might happen after the user closes the tab, does the attacker’s server siphon sensitive data.

Given that users are easily phished, Reprompt turns a single mistake by any employee into a nonstop data leak. Even worse, this attack happens instantly. The user doesn’t stand a chance to stop it, even if they realize after-the-fact that the link wasn’t trustworthy. Soon, their company’s files, user’s data, and logged personal data are all stolen via recursive prompts dispatched by the attacker’s server.

This isn’t a hypothetical attack. Varonis demonstrated how this could happen to anyone today. Microsoft has since patched the exploit, but it’s only a matter of time before attackers find a way to sidestep the patch.

Last Week’s Attack 2: Fooling Claude Cowork with a .docx

PromptArmor showcased a similar exploit with Claude Cowork. Claude Cowork is a local agentic application that can do tasks on the user’s Desktop. Users will frequently give Claude Cowork access to their confidential local files, including business data (e.g. a Dropbox folder, a stash of financial statements etc). Additionally, users might often upload files that they found online to expedite their work. For example, a user might download a free NDA template or research paper. Today’s attackers are smart they could imitate these same resources.

Claude transforms any file upload into a flattened Markdown file. For example, a Microsoft Word file might appear legitimate when opened on Microsoft Word, but Claude will treat any text—including 1 pt font or invisible white text—as normal text. It’s a common technique known to pranksters and lazy students; now, attackers could use it to exfiltrate personal file data. By including their own Anthropic API key, attackers could then fool Claudex5 code execution to share files with their Claude instance. Soon, a bad actor is chatting with Claude to extract bank details or customer PII.

The most dangerous aspect of this attack is that Claude Cowork users are often non-technical. They routinely upload files that they find on the Internet. Some of these files are written by attackers.

These attacks rely three shared tenets

Both attacks share three common tenets:

  1. It’s easy to trick users into clicking or uploading a resource that appears legitimate to the naked eye, but actually isn’t. When dispatched via spoofed emails or downloaded from innocuous websites, these resources can conceal their secret intent. AI agents trust authenticated users, so attackers just need to gain trust of any user.
  2. Copilot and Claude Cowork have thin guardrails that attackers bypass with clever instructions. For example, Copilot forbids exfiltrating plaintext sensitive data and Cowork refuses to execute arbitrary curl requests, but both platforms failed to apply these rules in edge cases.
  3. Users could be stolen from without ever realizing it. Both of these attacks don’t leave a noticeable trace as they rely on a single mistake, not an ongoing oversight.

Attackers are clever. They spend weeks engineering prompts that are designed to sidestep public, well-documented guardrails from model and agent providers.

Even the most security savvy individuals can be tricked by these attacks. They can easily trust assets that appear “legit enough”. Today’s exploits don’t live on sketchy websites with nonstop pop-up ads; they’re instead concealed on everyday websites, documents, and newsletters.

The only way to protect against these bad actors is with a security layer that aggressively thwarts attacks by being trained on the latest exploits and their respective variants.

Runlayer detected both of these attacks

Runlayer blocked both Reprompt and the Claude Cowork exploit—without any additional fine-tuning.

The product checks every input and output for industry-wide attack patterns. Unlike Copilot and Claude Cowork’s built-in safeguards, which make concessions in favor of streamlined UX, Runlayer is aggressive at identifying attacks by looking for the latest attack patterns.

Runlayer employs four strategies to keep our security analysis models up-to-date:

  1. Continuous Monitoring: We track security disclosures from industry researchers and threat intelligence sources, such as this week’s attacks from Varonis and PromptArmor.
  2. Rapid Response: When new attack patterns emerge, we generate diverse training examples and update our models quickly.
  3. Generalization Testing: We test against novel variants—not just disclosed examples—to ensure robust detection.
  4. Low False Positive Rate: Our testing includes benign examples to ensure legitimate tool calls aren’t blocked.

With Runlayer, you have that extra layer that prevents emerging threats from stealing your data or causing infrastructure damage to your systems. If you are interested in following today’s evolving exploits, track it on our security threat coverage tracker.

Jan 19, 2026
 • 
Jake Moghtader
Read more
Runlayer and Anthropic MCP Tunnels: connecting Claude to systems behind your firewall

Runlayer and Anthropic MCP Tunnels: connecting Claude to systems behind your firewall

Runlayer and Anthropic collaborated on MCP Tunnels, which invert the connection direction so your network reaches out to Anthropic instead of exposing inbound ports, removing the security wall that blocks Claude from accessing internal systems like Jira, databases, and telemetry.
May 20, 2026
 • 
Andy Berman
Don’t build your own MCP gateway

Don’t build your own MCP gateway

Senior engineers look at an MCP gateway and see a reverse proxy with auth and logs. That instinct is wrong. MCP attack vectors shift constantly, performance breaks at scale in specific ways, and threat detection requires MCP-specific signals that generic tools miss.
May 18, 2026
 • 
Alex Frazer
Fine-Grained Permissions and Identity Management for AI Agents

Fine-Grained Permissions and Identity Management for AI Agents

MCP adoption has exploded inside enterprises, with shadow servers and over-provisioned agents creating an attack surface most security teams haven't caught up to. Traditional IAM, OAuth, and RBAC weren't built for non-deterministic agents that delegate to other agents.
May 18, 2026
 • 
Tal Peretz
Runlayer named to Rising in Cyber 2026

Runlayer named to Rising in Cyber 2026

Runlayer was named to Notable Capital & Morgan Stanley's 2026 Rising in Cyber list, voted on by 150 sitting CISOs. Andy Berman on why the recognition matters, and what it signals about how AI-native companies are getting built.
May 12, 2026
 • 
Andy Berman
Why production AI systems need MCP gateways

Why production AI systems need MCP gateways

An MCP gateway acts as the centralized proxy layer for agent-to-tool communications, handling tool discovery, authentication, input/output filtering, and observability across an organization's agentic systems.
May 11, 2026
 • 
Tal Peretz
The MCP STDIO RCE class, and why Runlayer doesn't run what the LLM asks it to

The MCP STDIO RCE class, and why Runlayer doesn't run what the LLM asks it to

OX Security found a design-level flaw in Anthropic's Model Context Protocol. MCP's STDIO transport turns a config file into a command executor. Here's how Runlayer's control plane breaks each of the four attack vectors.
Apr 22, 2026
 • 
Alex Frazer
Runlayer and AARM Partner to Secure Enterprise Agents

Runlayer and AARM Partner to Secure Enterprise Agents

Runlayer achieves AARM Extended Conformance (R1–R9), partnering with the Vanta-backed open specification to define how enterprises secure AI agents at runtime.
Apr 15, 2026
 • 
Tal Peretz
What Project Glasswing means for enterprise security

What Project Glasswing means for enterprise security

What Project Glasswing and Claude Mythos mean for enterprise security teams, and why your patch workflows, dependency management, and MCP governance need to evolve now.
Apr 11, 2026
 • 
Tal Peretz
The Danger of Fake MCP Servers

The Danger of Fake MCP Servers

Fake MCP servers pose a growing security risk, enabling data leaks, tool poisoning, and compromised AI behavior. Learn how these attacks work and how organizations can prevent them with proper controls and monitoring.
Apr 7, 2026
 • 
Tal Peretz
Runlayer + 1Password: Secure Credential Access for AI Agents

Runlayer + 1Password: Secure Credential Access for AI Agents

Runlayer and 1Password partner to bring secure, auditable credential access to autonomous AI agents. The integration lets enterprises inject secrets from 1Password vaults into agent sessions managed by Runlayer, replacing plaintext .env files with centralized governance, real-time retrieval, and full audit logging across human and non-human identities.
Mar 17, 2026
 • 
Tal Peretz
Honestly, MCP doesn’t “suck”

Honestly, MCP doesn’t “suck”

Garry Tan recently argued that MCP “sucks,” citing context-window bloat and weak authentication. This article breaks down why those criticisms miss the mark—and why MCP remains the better foundation for agents operating at enterprise scale.
Mar 12, 2026
 • 
Vitor Balocco
FGA is not enough for your agent authorization

FGA is not enough for your agent authorization

PBAC beats FGA for agent authorization — context-aware, auditable, asymmetric access control without graph complexity.
Mar 9, 2026
 • 
Alvaro Inckot
Scale MCP with Dynamic Tool use

Scale MCP with Dynamic Tool use

Dynamic tool use cuts token waste from MCP by replacing bulk tool loading with lightweight search, saving cost without custom implementation.
Feb 20, 2026
 • 
Vitor Balocco
OpenAI Agent Builder’s MCP Problem

OpenAI Agent Builder’s MCP Problem

OpenAI AgentKit/Agent Builder launched in Oct 2025 but, despite early hype, its limited integrations and weak security (e.g., unverified MCP servers, no namespace isolation, insufficient guardrails) create a large enterprise attack surface—prompting calls for controls like a trusted MCP catalog, tool gateway auditing, RBAC/least privilege, and stronger governance (e.g., via Runlayer).
Feb 19, 2026
 • 
Tal Peretz
Pwning OpenClaw in 50 Messages: Social Engineering Claude Opus Into Handing Over the Keys

Pwning OpenClaw in 50 Messages: Social Engineering Claude Opus Into Handing Over the Keys

A Claude Opus–powered OpenClaw agent with Slack and shell access was social-engineered in ~50 messages to rebind its UI, install ngrok, expose the dashboard publicly, reveal its gateway token, and approve the attacker’s device.
Feb 16, 2026
 • 
Alex Frazer
Unpacking the OWASP Top 10 for MCP

Unpacking the OWASP Top 10 for MCP

An overview of the OWASP MCP Top 10, highlighting the biggest security risks in MCP-enabled AI systems and the key safeguards teams can use to prevent them.
Feb 10, 2026
 • 
Alex Frazer
MCP Apps highlight the power of protocol governance

MCP Apps highlight the power of protocol governance

MCP Apps let tools render interactive UIs directly in chat via the same MCP protocol—not a new execution path. With Runlayer intercepting tool calls, resource fetches, and auth headers, existing MCP security controls apply from day one.
Jan 30, 2026
 • 
Tal Peretz
Announcing Box and Runlayer's partnership on Enterprise MCP

Announcing Box and Runlayer's partnership on Enterprise MCP

Connect AI agents to Box content with enterprise security. The official Box MCP server is live in the Runlayer marketplace, with identity enforcement, audit logging, and threat detection built in. Box customers can find Runlayer in the Box Integrations Center. Setup takes minutes.
Jan 27, 2026
 • 
Aidan Sochowski
MCP vs CLI Tools: Which is best for production applications?

MCP vs CLI Tools: Which is best for production applications?

CLI tools feel familiar to AI agents, but they break down in production due to brittle syntax, poor state management, and dangerous security assumptions. This post explains why CLI-based agent workflows fail and how a single-tool MCP using a known programming language offers a more reliable and secure alternative.
Jan 25, 2026
 • 
Vitor Balocco
Runlayer Product Update: 1.25.0

Runlayer Product Update: 1.25.0

This update is about momentum: moving faster in the CLI, getting clearer visibility into what’s running, and debugging with less friction. Expect smoother workflows, better control, and fewer surprises as you build and ship.
Jan 23, 2026
 • 
Engineering
Cursor Hooks + MCP Security: Official Runlayer Partnership Announcement

Cursor Hooks + MCP Security: Official Runlayer Partnership Announcement

Runlayer is an official Cursor Hooks launch partner. With Cursor Hooks, securely allow or deny MCP tool calls with Runlayer's enterprise MCP platform.
Dec 18, 2025
 • 
Marcin Jan Puhacz
The main takeaways from GitHub’s MCP Vulnerability

The main takeaways from GitHub’s MCP Vulnerability

GitHub’s MCP vulnerability revealed how AI agents can be weaponized through poisoned context in public repositories. This post analyzes the exploit, explains why permissions alone aren’t enough, and shares practical guardrails for preventing and mitigating agent-driven data exfiltration.
Dec 16, 2025
 • 
Vitor Balocco
Runlayer Joins Anthropic, OpenAI, & Google as AAIF Founding Member

Runlayer Joins Anthropic, OpenAI, & Google as AAIF Founding Member

The Linux Foundation has launched the Agentic Artificial Intelligence Foundation (AAIF), with Runlayer joining sponsors Anthropic, OpenAI, Google, AWS, Microsoft. AAIF now oversees the Model Context Protocol (MCP), reinforcing MCP as a rising standard for AI agent integration. Runlayer supports AAIF’s open, secure, and scalable AI development mission.
Dec 9, 2025
 • 
Andy Berman
Runlayer Raises $11M to Scale Enterprise MCP Infrastructure

Runlayer Raises $11M to Scale Enterprise MCP Infrastructure

Nov 17, 2025
 • 
Andy Berman
MCP Security Risks: Your AI Agent is Probably Leaking Data Right Now

MCP Security Risks: Your AI Agent is Probably Leaking Data Right Now

MCP adoption is accelerating across major platforms, but security risks—like malicious servers, prompt injection, and tool-level exploits—are growing just as fast. This post breaks down real attack scenarios that show how easily data can leak when MCP implementations are trusted by default. It also outlines practical defenses for users and builders, plus why companies need audited MCP catalogs, gateway proxies, and sandboxing to stay secure at scale.
Nov 12, 2025
 • 
Vitor Balocco
Why MCP builders are transitioning from DCR to OAuth CIMD

Why MCP builders are transitioning from DCR to OAuth CIMD

Over the last year, MCP has surged in adoption. To little surprise, this has introduced some scaling issues. One of these is client registration; previously, systems were rigged together by humans. Today, AI agents discover and interface with MCP servers freely, requiring a new paradigm for client communications.
Nov 7, 2025
 • 
Vitor Balocco
What is Dynamic Client Registration?

What is Dynamic Client Registration?

Tired of manually registering every AI agent with every OAuth server? Dynamic Client Registration (DCR) lets your agents authenticate with MCP servers at runtime, no human clicks required. Learn how DCR works, when to use it over traditional OAuth, and why it's becoming essential for scalable agentic systems.
Nov 6, 2025
 • 
Vitor Balocco