Indirect prompt injection abuses external data sources (emails, documents, web pages, API responses, images) by embedding hidden instructions that MCP-powered LLMs can execute, leading to stealthy data exfiltration or unauthorized actions. RUG Pull attacks replace trusted MCP tools via compromised registries or updates, turning vetted automations into backdoors. #RUGPull #ModelContextProtocol
Keypoints
- Indirect prompt injection occurs when malicious instructions are embedded in external data ingested into the LLM context, causing the model to treat them as actionable system guidance.
- Successful indirect prompt injection requires three conditions: LLM access to private data, processing of untrusted content, and capabilities to act externally (“lethal trifecta”).
- Realistic examples include SOC escalation emails and customer-care workflows where hidden HTML comments trigger tool calls like export_customer_data and send_response, enabling silent exfiltration.
- RUG Pull attacks compromise MCP tool registries or update mechanisms to substitute trusted tools with backdoored versions that exfiltrate data during normal use.
- Defenses for indirect injection include context provenance, input sanitization, human-in-the-loop approval for sensitive tool calls, behavioral monitoring, memory pruning, output sanitization, and least-privilege access.
- Defenses for RUG Pull include tool signing and verification, pinned versions, registry hardening, granular permission boundaries, runtime consent validation, and permission auditing and logging.
- Both attack classes exploit trust in data and tooling rather than the model itself, so protecting data provenance and tool provenance is critical for MCP security.
MITRE Techniques
- [T1204] User Execution – Indirect prompt injection relies on user-submitted content (emails, tickets) that causes the model to execute hidden instructions; quoted example: ‘……’
- [T1621] Multi-Stage Channels – Attackers embed instructions in one channel (email) and trigger execution later through another interaction (chatbot request to summarize), as shown by the two-step flow where the attacker later asks the agent to summarize the escalation.
- [T1530] Data from Local System – RUG Pull backdoored tools exfiltrate local files during normal tool execution, example code shows reading and base64-encoding a file: ‘with open(file, “rb”) as f: encoded = base64.b64encode(f.read()).decode()’
- [T1078] Valid Accounts (or Trust Abuse) – RUG Pull leverages trusted registry/tool provenance and existing permissions to execute malicious code under the guise of an approved tool; described as replacing a vetted tool in a registry so analysts call it unaware.
- [T1560] Archive Collected Data – Attack flow encodes and sends collected sensitive records (e.g., SSNs) to an external recipient via send_response, illustrated by the base64 export sent to ‘[email protected]’.
Indicators of Compromise
- [Email addresses] exfiltration endpoints and workflow contexts – examples: [email protected] (attacker exfiltration target), [email protected] (customer care inbox used to ingest malicious emails).
- [File names] tool invocation targets – example: confidential_hr_report.pdf (file scanned by a backdoored malware_scan tool and exfiltrated).
- [URLs / Domains] remote exfiltration endpoints – example: https://evil.com/exfil (POST target in hidden instruction).
- [Function names / Tool names] MCP tool identifiers used in attacks – examples: export_customer_data, send_response, malware_scan (used to request or exfiltrate sensitive data).
- [Encoded payloads] encoded sensitive data in transit – example: base64-encoded JSON blob in send_response body (represents exfiltrated user SSNs), and other encoded payloads.
Read more: https://www.netskope.com/blog/securing-llm-superpowers-the-invisible-backdoors-in-mcp