Stop Multimodal Prompt Injection: JPEG, Re-Encode & Dual-LLM Fixes

Stop Multimodal Prompt Injection: JPEG, Re-Encode & Dual-LLM Fixes

Adversaries can embed executable instructions into images and audio so multimodal models read hidden directives from pixels and waveforms, bypassing text-only sanitization and leaving no visible logs. These techniques—typographic (FigStep), steganographic, semantic, and audio methods like WhisperInject—transfer across models, achieve high success rates in tests, and can be executed in the physical world. #FigStep #WhisperInject

Keypoints

  • Vision and audio inputs can carry hidden directives that multimodal models treat as regular instructions.
  • Typographic, steganographic, and semantic image attacks bypass OCR and text-based filters.
  • Audio attacks like the mute technique and WhisperInject replace or suppress transcriptions while remaining inaudible.
  • Payloads transfer across models and combine to increase success rates, including robust over-the-air delivery.
  • These attacks leave no searchable text logs, defeating existing monitoring and requiring a universal kill switch.

Read More: https://www.toxsec.com/p/multimodal-prompt-injection-attacks-images-audio