Meta,’s Rule of Two: The Fix for Agent Prompt Injection (do not use quotation, all characters should be english)

Meta,’s Rule of Two: The Fix for Agent Prompt Injection (do not use quotation, all characters should be english)
Meta’s Rule of Two limits an AI agent to only two of three dangerous capabilities at once: untrusted input, sensitive data, and external communication, breaking the prompt injection chain before exfiltration can complete. The content also highlights three known limitations in Meta’s own model, including cross-session leakage, risky two-way overlaps, and human approval that can degrade into blind clicking. #Meta #RuleofTwo #SimonWillison #Chromium #OWASP

Keypoints

  • Meta’s Rule of Two forbids any agent from holding all three dangerous properties in one session.
  • The model blocks prompt injection by breaking the chain from untrusted input to sensitive data to exfiltration.
  • Meta’s approach is a hard architectural constraint, not a prompt-classification detector.
  • The rule has known limitations, including cross-session state bleed and unsafe two-way overlaps.
  • Human approval can fail when users rubber-stamp warnings without proper review.

Read More: https://www.toxsec.com/p/metas-rule-of-two