Apple Intelligence AI Guardrails Bypassed in New Attack

RSAC researchers demonstrated a reliable method to bypass Apple Intelligence’s on-device LLM safeguards by combining adversarial techniques. By using Neural Execs prompt injection plus a Unicode right-to-left override to encode outputs, they achieved a 76% success rate and forced manipulations of private app data before Apple deployed fixes in iOS 26.4 and macOS 26.4. #AppleIntelligence #RSAC

Keypoints

Researchers combined Neural Execs prompt injection with Unicode right-to-left override to evade input and output filters.
The attack could force the local Apple Intelligence LLM to produce offensive content or manipulate private third-party app data.
The method succeeded in 76% of 100 tested prompts.
RSAC estimates 100,000–1,000,000 app installs may be vulnerable across an estimated 200 million capable devices.
Apple was notified in October 2025 and issued protections in iOS 26.4 and macOS 26.4; no evidence of malicious exploitation has been observed.