The article analyzes how LLM input transformations—tokenization, embeddings, positional encodings, and self-attention—create architectural attack surfaces that enable prompt injection, adversarial suffixes, gradient-based embedding manipulation, and attention hijacking. It reviews attack vectors (filter bypass, GCG-style gradient attacks, chunking/context-window exploits, attention hijacking) and defensive mitigations including randomized smoothing, suffix filtering, adversarial training, and vendor controls like instruction hierarchies. #OpenAI #Anthropic
Keypoints
- LLM input is transformed through tokenization, embeddings, positional encoding, and attention, and vulnerabilities can arise at each stage.
- Tokenization boundaries and subword tokenization (e.g., BPE) enable filter evasion by fragmenting blocked keywords like “powershell”.
- Gradient-based attacks (e.g., GCG) can optimize token sequences to shift embeddings and bypass safety guardrails across models.
- Context window limits and chunking introduce attacks where important alerts or instructions can be pushed out of model memory or processed incorrectly.
- Adversarial suffixes can hijack self-attention by producing Key vectors that dominate attention scores, steering model outputs despite existing safety rules.
- Mitigations include randomized smoothing (e.g., SmoothLLM), suffix filtering, and adversarial training, but these remain partial defenses as attackers adapt.
- Major providers (OpenAI, Anthropic, Google) deploy layered defenses like instruction hierarchies and constitutional classifiers, highlighting an evolving defensive landscape.
MITRE Techniques
- None mentioned – ‘The article does not explicitly reference MITRE ATT&CK technique names or identifiers.’
Indicators of Compromise
- [File name ] log and execution context – C:WindowsSystem32powershell.exe, powershell.exe
- [Event ID ] security log example used to illustrate tokenization and log parsing – EventID: 4688
- [Network port ] example used in a context-window attack scenario – Port 22 (e.g., “Port 22 Open”)
Read more: https://www.sentinelone.com/labs/inside-the-llm-understanding-ai-the-mechanics-of-modern-attacks/