Inside the LLM | Understanding AI & the Mechanics of Modern Attacks

The article analyzes how LLM input transformations—tokenization, embeddings, positional encodings, and self-attention—create architectural attack surfaces that enable prompt injection, adversarial suffixes, gradient-based embedding manipulation, and attention hijacking. It reviews attack vectors (filter bypass, GCG-style gradient attacks, chunking/context-window exploits, attention hijacking) and defensive mitigations including randomized smoothing, suffix filtering, adversarial training, and vendor controls like instruction hierarchies. #OpenAI #Anthropic

Keypoints

LLM input is transformed through tokenization, embeddings, positional encoding, and attention, and vulnerabilities can arise at each stage.
Tokenization boundaries and subword tokenization (e.g., BPE) enable filter evasion by fragmenting blocked keywords like “powershell”.
Gradient-based attacks (e.g., GCG) can optimize token sequences to shift embeddings and bypass safety guardrails across models.
Context window limits and chunking introduce attacks where important alerts or instructions can be pushed out of model memory or processed incorrectly.
Adversarial suffixes can hijack self-attention by producing Key vectors that dominate attention scores, steering model outputs despite existing safety rules.
Mitigations include randomized smoothing (e.g., SmoothLLM), suffix filtering, and adversarial training, but these remain partial defenses as attackers adapt.
Major providers (OpenAI, Anthropic, Google) deploy layered defenses like instruction hierarchies and constitutional classifiers, highlighting an evolving defensive landscape.

MITRE Techniques

None mentioned – ‘The article does not explicitly reference MITRE ATT&CK technique names or identifiers.’

Indicators of Compromise

[File name ] log and execution context – C:WindowsSystem32powershell.exe, powershell.exe
[Event ID ] security log example used to illustrate tokenization and log parsing – EventID: 4688
[Network port ] example used in a context-window attack scenario – Port 22 (e.g., “Port 22 Open”)

SHARE THIS STORY

WhatsApp X (Twitter)Telegram Bluesky Facebook LinkedIn Threads Email Print