ShadowLeak demonstrates how prompt injections can exploit LLMs through malicious content embedded in emails or documents. Despite mitigations, attackers can still exfiltrate sensitive data by exploiting workflows and AI features. #ShadowLeak #PromptInjection
Keypoints
- ShadowLeak begins with indirect prompt injections embedded in content from untrusted sources.
- Prompt injections manipulate LLMs into performing harmful actions by exploiting their desire to follow user instructions.
- Mitigations often respond after exploits are discovered, not preventatively, making vulnerability persistent.
- OpenAI and other providers have blocked channels like links and markdown to prevent data exfiltration.
- Attackers can bypass safeguards by using features like browser.open to exfiltrate data directly to malicious websites.