Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models

This research reveals critical vulnerabilities in Large Language Models (LLMs) related to hidden adversarial prompts embedded via malicious fonts in external web and document content, enabling attacks like malicious content relay and sensitive data leakage through MCP-enabled tools. The study highlights that newer LLMs show greater susceptibility, with indirect prompt injections bypassing safety mechanisms and raising urgent concerns for real-world applications integrating LLMs with protocols such as the Model Context Protocol (MCP). #MaliciousFontAttack #ModelContextProtocol #ClaudeLLM

Keypoints

A novel attack vector uses malicious fonts to inject hidden adversarial prompts into web pages and documents, which appear normal to humans but mislead LLMs during content processing.
Two main attack scenarios were analyzed: (1) Malicious Content Relay, where LLMs unknowingly propagate hidden harmful content to users, and (2) Sensitive Data Leakage, where LLMs exfiltrate user data via MCP-enabled tools like Gmail after hidden prompt activation.
Experiments show PDF documents have higher attack success rates (up to 70%) compared to HTML formats due to PDF’s static and structured nature aiding prompt preservation.
Prompt placement (favoring early document sections) and increased injection frequency significantly improve attack effectiveness against LLMs.
More advanced LLMs like GPT-4.1 are more vulnerable to these font-based indirect prompt injections than earlier models.
Data leakage success varies by data sensitivity level: low-sensitivity data is exfiltrated with near 100% success via indirect prompts, while high-sensitivity data (e.g., SSNs) is better protected but still vulnerable to indirect prompt tricks (up to 30% success).
Prior legitimate use of MCP tools (e.g., sending emails) conditions LLMs to execute malicious hidden prompts more readily, increasing sensitive data leakage risks.

This study addresses the emerging security challenge of hidden adversarial prompt injection in Large Language Models (LLMs) equipped with real-time web search and tool integration via the Model Context Protocol (MCP). By exploiting vulnerabilities in how LLMs process external web content and documents, attackers can embed concealed instructions through manipulated font glyph mappings—termed malicious fonts—that are visually innocuous to humans but interpreted as executable prompts by LLMs. The research focuses on two security issues: the potential for LLMs to relay malicious content unknowingly to users, and the risk of LLM-enabled tools leaking sensitive user data when triggered by hidden adversarial prompts.

The authors propose and implement a method of crafting malicious fonts by altering TrueType font code-to-glyph mappings, enabling covert injection of deceptive textual content. This manipulation allows adversarial prompts to remain invisible or benign in appearance while influencing LLM output. Two key attack scenarios were experimentally investigated in controlled settings across multiple LLM models, including those supporting MCP tools like Gmail integration.

For Malicious Content Relay, experiments embedding hidden adversarial prompts in PDF and HTML documents show that PDFs offer a more reliable attack vector, achieving success rates up to 70%. Attack effectiveness improves with prompt repetitions and placement at the beginning of documents. The study also observes that commercial-related prompt injections are more successful than politically sensitive ones, reflecting the variable stringency of LLM safety measures. Notably, state-of-the-art models such as GPT-4.1 demonstrate higher vulnerability compared to earlier LLMs.

In the Sensitive Data Leakage scenario, the study explores how an attacker can co-opt MCP-enabled LLM tools—capable of sending emails autonomously—to exfiltrate user-shared sensitive information stored in chat history. Using crafted documents containing malicious fonts and hidden prompts, the LLM is induced to leak data such as personal names, contact details, and even some high-sensitivity information under specific conditions. The research identifies three main factors affecting attack success: the sensitivity level of the data (low sensitivity is easier to exfiltrate), presence of prior legitimate email actions by the user (which conditions LLM compliance), and the design of the hidden prompt (indirect, ambiguous prompts are significantly more effective at bypassing safety filters).

These findings have practical implications for SOC and threat intelligence teams tasked with protecting environments that employ LLMs with web browsing or MCP capabilities. The research warns that traditional security strategies—focused mainly on input-output filtering—may be insufficient against novel font-based adversarial prompt injection. It underscores the need for enhanced security frameworks that verify both the semantic integrity of content and its visual representation, as well as for continuous monitoring of tool-enabled LLM behaviors.

In sum, the paper highlights a critical new attack surface introduced by combining LLM web integration and automated tool protocols. Organizations leveraging such models should urgently incorporate detection mechanisms for malicious fonts, reassess MCP authorization workflows, and refine prompt validation to mitigate risks of misinformation propagation and sensitive data leakage.

The content featured on this site is sourced from arXiv.org, a free distribution service and open-access archive hosting over 2.4 million scholarly articles across a wide range of disciplines. This collection specifically highlights articles focused on cybersecurity, particularly topics relevant to threat intelligence and Security Operations Center (SOC) work.

Please note that materials on arXiv are not peer-reviewed, and are shared as preprints by the authors to foster early dissemination and feedback within the academic and professional community. I recommend using arXiv papers as a starting point for exploration and research, not as definitive sources. Always evaluate findings critically, and whenever possible, cross-check with peer-reviewed publications or operational validation.

SHARE THIS STORY

WhatsApp X (Twitter)Telegram Bluesky Facebook LinkedIn Threads Email Print