Inside the LLM | Understanding AI & the Mechanics of Modern Attacks

Inside the LLM | Understanding AI & the Mechanics of Modern Attacks

The article analyzes how LLM input transformations—tokenization, embeddings, positional encodings, and self-attention—create architectural attack surfaces that enable prompt injection, adversarial suffixes, gradient-based embedding manipulation, and attention hijacking. It reviews attack vectors (filter bypass, GCG-style gradient attacks, chunking/context-window exploits, attention hijacking) and defensive mitigations including randomized smoothing, suffix filtering, adversarial training, and vendor controls like instruction hierarchies. #OpenAI #Anthropic

Keypoints

  • LLM input is transformed through tokenization, embeddings, positional encoding, and attention, and vulnerabilities can arise at each stage.
  • Tokenization boundaries and subword tokenization (e.g., BPE) enable filter evasion by fragmenting blocked keywords like “powershell”.
  • Gradient-based attacks (e.g., GCG) can optimize token sequences to shift embeddings and bypass safety guardrails across models.
  • Context window limits and chunking introduce attacks where important alerts or instructions can be pushed out of model memory or processed incorrectly.
  • Adversarial suffixes can hijack self-attention by producing Key vectors that dominate attention scores, steering model outputs despite existing safety rules.
  • Mitigations include randomized smoothing (e.g., SmoothLLM), suffix filtering, and adversarial training, but these remain partial defenses as attackers adapt.
  • Major providers (OpenAI, Anthropic, Google) deploy layered defenses like instruction hierarchies and constitutional classifiers, highlighting an evolving defensive landscape.

MITRE Techniques

  • None mentioned – ‘The article does not explicitly reference MITRE ATT&CK technique names or identifiers.’

Indicators of Compromise

  • [File name ] log and execution context – C:WindowsSystem32powershell.exe, powershell.exe
  • [Event ID ] security log example used to illustrate tokenization and log parsing – EventID: 4688
  • [Network port ] example used in a context-window attack scenario – Port 22 (e.g., “Port 22 Open”)


Read more: https://www.sentinelone.com/labs/inside-the-llm-understanding-ai-the-mechanics-of-modern-attacks/