Tal Peretz

OpenAI Agent Builder’s MCP Problem

In October 2025, OpenAI released AgentKit, a set of tools for building and deploying agents, including a no-code, drag-and-drop Agent Builder tool. The interface allows you to connect your agent to any hosted MCP server and set up nodes, creating a sleek and accessible user interface for technical and non-technical users. These node types include guardrails, if-else blocks, logic nodes, and user approval checks.

Prior to release, the market was incredibly excited about the potential of an OpenAI native workflow tool that was accessible to everyone, deeply integrated, and secure. However, the implementation was limited. Non-technical users lacked sufficient built-in integrations; the platform has only about a dozen internal connectors compared to Zapier’s 8,000.

The biggest issue was security. Security professionals found that the guardrails and user approvals put in place by OpenAI were only a drop in the bucket for protection against the host of attack vectors that were possible.

The Vulnerability Window

One of Agent Builder’s critical vulnerabilities is that it currently allows any arbitrary MCP server to be connected to it, without a standardized verification or signing mechanism, sandboxing or isolation, or guardrails to prevent malicious servers from mimicking legitimate ones. This implementation is the one that many enterprises are already starting to adopt and connect with high-privilege data sources and tools like Slack, Notion, GitHub, and internal APIs. It becomes dangerous once deployed by real organizations with real data and systems.

For enterprises, Agent Builder’s lack of security features and guardrails makes the potential attack surface is huge. Credential theft, data exfiltration, unauthorized tool execution, tool shadowing, privilege escalation via prompt injection, and invocation of malicious MCP servers are all possible attacks that could be performed on an organization using the tool.

Possible Attack Vectors

Given the broadness of the attack surface, let’s walk through a few hypothetical attacks that could be executed against organizations using the tool:

Attack #1: Toxic Flow via Prompt Injection

Whereas most prompt injection attacks target the prompt as the attack surface and aim to override or manipulate agent instructions, “toxic flow” attacks leverage the sequence of actions an agent performs to influence its behavior by manipulating the path it follows.

An example of such an attack is a user who creates a “GitHub Issue Resolver” in Agent Builder, connecting the GitHub MCP server to fetch any new issues and create proposed pull requests to solve the issues it fetched. Because the agent treats GitHub issue content as trusted input, a malicious actor can open an issue that appears legitimate but contains instructions to reveal sensitive data.

When the user calls the agent to fetch all open issues and run the resolution workflow, this content becomes part of its execution and subtly influences downstream actions, such as altering pull requests to include sensitive data on private repositories, triggering additional tool calls, or redirecting outputs, without ever explicitly overriding the agent’s instructions.

Given the lack of security scans and limited input/output auditing, any workflow created in Agent Builder that has privileged access to external systems could be susceptible to a toxic flow attack like this.

Attack #2: Malicious MCP Server

Because Agent Builder allows users to connect to custom MCP servers hosted remotely, the legitimacy of third-party MCPs that are neither pre-approved nor under the user’s control is questionable. It’s easy for a user to encounter a malicious MCP server in several ways, including a compromised or maliciously repackaged version of a legitimate tool or an entirely independent server that describes a useful new integration, such as calendar scheduling or data analytics. The user may then connect this MCP server in Agent Builder and grant it access to internal data sources or include it in workflows with other privileged tools. The server exposes seemingly benign tools such as a forecasting tool or customer preference summarization, but is intentionally designed to log inputs, override instructions, or exfiltrate data once connected.

The scale of potential damage is significant. A single malicious MCP server can be reused across many teams within an organization, turning shared configuration links or internal recommendations into widespread distribution across an organization. This attack could escalate further if the agent had access to other services like GitHub, Slack, Notion, Snowflake and/or internal APIs.

Attack #3: Cross-Server Tool Shadowing

A tool shadowing attack can be executed when a malicious MCP server exposes tools with the same name as legitimate ones from other servers, such as create_issue or send_email. Because any hosted MCP server can be connected to Agent Builder, and it does not enforce namespace isolation, the agent may invoke the malicious version of a tool while believing it is calling the trusted one.

Let’s say an employee at an e-commerce brand wants to create a workflow in Agent Builder that sends automated emails to their fulfillment warehouse summarizing orders that have not been fulfilled for 5 days. The user connects to the Gmail MCP server and has the workflow invoke the send_email tool. However, the user does not realize that they installed a seemingly innocent calendar scheduling MCP server a few weeks ago that was actually an attacker’s server. That server also has a send_email tool that takes in the email content and sends it to the attacker’s email. Because of the lack of namespace isolation as well as the ability to connect any hosted MCP server in Agent Builder, the user thinks the attacker’s tool is the legitimate one and calls that one as a part of the workflow, exposing customer PII like emails and addresses to the attacker. Without namespace isolation, any team member installing a malicious server exposes the organization.

Agent Builder’s Built-in Limited Security

Currently, Agent Builder has two key security features: guardrails and user approval nodes. Guardrails help redact PII and detect jailbreak attempts/misuse signals, but they aren’t a complete defense against prompt injection—especially in multi-step flows. However, as demonstrated via the toxic agent flow, these guardrails can easily be surpassed by an attacker who is manipulating a sequence of actions rather than a prompt. With regards to user approval, it is impossible to scale user approval as enterprises deploy more agents. If a workflow is running multiple times a day, having a user thoroughly audit and review each of those workflows becomes unrealistic.

There are certain additional features that need to be implemented in order for Agent Builder to become more secure for use in production environments:

MCP Catalog: Agent Builder should allow organizations to have a trusted set of MCP servers that can be used within the organization and not allow the use of any external servers outside of that set. (OpenAI’s Connector Registry is a step toward an MCP catalog, but it doesn’t by itself solve server authenticity/verification and runtime governance gaps).
Tool Gateway: All tool calls should be proxied through a gateway to allow the ability to audit and scan at runtime. This helps organizations catch malicious instructions or exfiltrated data in tool parameters, inputs, outputs, and logs.
RBAC & Least Privilege Access: All agents and users should not have the same level of permissions for connected MCPs. The least privileged access policy should be enforced so that entities can only access what they absolutely need to.
Namespace Isolation: Agent Builder should require namespace isolation so that individuals know exactly which tool they are calling and which source it is coming from.

Runlayer + OpenAI

Integrating Agent Builder and Runlayer providers governance and monitoring without rebuilding security infrastructure. Runlayer is infrastructure designed to securely host a registry of trusted MCP servers and govern all agent interactions. Security features include managed MCP hosting, robust access control for servers, tools, and runtime auditing of tool usage.

Innovation Requires Responsibility

The attacks are natural consequences of non-deterministic agents operating in production environments without guardrails, auditing, or least-privilege access to external systems. Organizations must critically evaluate tools like Agent Builder before deploying them in production environments.

Runlayer ensures that any AI product passes rigorous tests, including input and output scanning to detect for potential attacks underway. If you are using Agent Builder and are curious at how Runlayer could protect your deployments, learn more at Runlayer.com.

‍

Feb 19, 2026

•

Tal Peretz