Meta’s Rule of Two limits an AI agent to only two of three dangerous capabilities at once: untrusted input, sensitive data, and external communication, breaking the prompt injection chain before exfiltration can complete. The content also highlights three known limitations in Meta’s own model, including cross-session leakage, risky two-way overlaps, and human approval that can degrade into blind clicking. #Meta #RuleofTwo #SimonWillison #Chromium #OWASP
Keypoints
- Meta’s Rule of Two forbids any agent from holding all three dangerous properties in one session.
- The model blocks prompt injection by breaking the chain from untrusted input to sensitive data to exfiltration.
- Meta’s approach is a hard architectural constraint, not a prompt-classification detector.
- The rule has known limitations, including cross-session state bleed and unsafe two-way overlaps.
- Human approval can fail when users rubber-stamp warnings without proper review.
Read More: https://www.toxsec.com/p/metas-rule-of-two