Securing LLM Superpowers: The Invisible Backdoors in MCP

Indirect prompt injection abuses external data sources (emails, documents, web pages, API responses, images) by embedding hidden instructions that MCP-powered LLMs can execute, leading to stealthy data exfiltration or unauthorized actions. RUG Pull attacks replace trusted MCP tools via compromised registries or updates, turning vetted automations into backdoors. #RUGPull #ModelContextProtocol

Keypoints

Indirect prompt injection occurs when malicious instructions are embedded in external data ingested into the LLM context, causing the model to treat them as actionable system guidance.
Successful indirect prompt injection requires three conditions: LLM access to private data, processing of untrusted content, and capabilities to act externally (“lethal trifecta”).
Realistic examples include SOC escalation emails and customer-care workflows where hidden HTML comments trigger tool calls like export_customer_data and send_response, enabling silent exfiltration.
RUG Pull attacks compromise MCP tool registries or update mechanisms to substitute trusted tools with backdoored versions that exfiltrate data during normal use.
Defenses for indirect injection include context provenance, input sanitization, human-in-the-loop approval for sensitive tool calls, behavioral monitoring, memory pruning, output sanitization, and least-privilege access.
Defenses for RUG Pull include tool signing and verification, pinned versions, registry hardening, granular permission boundaries, runtime consent validation, and permission auditing and logging.
Both attack classes exploit trust in data and tooling rather than the model itself, so protecting data provenance and tool provenance is critical for MCP security.

MITRE Techniques

[T1204] User Execution – Indirect prompt injection relies on user-submitted content (emails, tickets) that causes the model to execute hidden instructions; quoted example: ‘……’
[T1621] Multi-Stage Channels – Attackers embed instructions in one channel (email) and trigger execution later through another interaction (chatbot request to summarize), as shown by the two-step flow where the attacker later asks the agent to summarize the escalation.
[T1530] Data from Local System – RUG Pull backdoored tools exfiltrate local files during normal tool execution, example code shows reading and base64-encoding a file: ‘with open(file, “rb”) as f: encoded = base64.b64encode(f.read()).decode()’
[T1078] Valid Accounts (or Trust Abuse) – RUG Pull leverages trusted registry/tool provenance and existing permissions to execute malicious code under the guise of an approved tool; described as replacing a vetted tool in a registry so analysts call it unaware.
[T1560] Archive Collected Data – Attack flow encodes and sends collected sensitive records (e.g., SSNs) to an external recipient via send_response, illustrated by the base64 export sent to ‘[email protected]’.

Indicators of Compromise

[Email addresses] exfiltration endpoints and workflow contexts – examples: [email protected] (attacker exfiltration target), [email protected] (customer care inbox used to ingest malicious emails).
[File names] tool invocation targets – example: confidential_hr_report.pdf (file scanned by a backdoored malware_scan tool and exfiltrated).
[URLs / Domains] remote exfiltration endpoints – example: https://evil.com/exfil (POST target in hidden instruction).
[Function names / Tool names] MCP tool identifiers used in attacks – examples: export_customer_data, send_response, malware_scan (used to request or exfiltrate sensitive data).
[Encoded payloads] encoded sensitive data in transit – example: base64-encoded JSON blob in send_response body (represents exfiltrated user SSNs), and other encoded payloads.