Keypoints
- Prompt injection occurs when attacker-controlled input contains instructions that aim to override or alter system-level prompt instructions.
- Unlike deterministic SQL engines, most LLM setups use non-deterministic sampling (temperature, beam size, etc.), so the same malicious input can produce different outputs across attempts.
- A payload that fails once may succeed on later tries; testers must repeat payloads or small variations to reliably detect vulnerabilities.
- LLM hallucinations can produce fabricated evidence of success (e.g., invented prior instructions), causing false positives if not validated against the prompt template.
- Verifying exploitation requires checking actual model behavior and, when possible, obtaining the prompt template or configuration to distinguish hallucination from real effects.
- Coordinate with your security vendor to learn connection limits and model determinism settings, since fewer allowed connections and higher nondeterminism increase testing time.
MITRE Techniques
- [T1190] Exploit Public-Facing Application – Crafted user input is used to influence application behavior and override system instructions (‘Ignore your previous instructions and…’).
- [T1059] Command and Scripting Interpreter – Attacker-supplied instructions within the prompt act like commands the model executes, causing the assistant to respond as if following those commands (‘compelling the application to invariably respond with “Secure.”’).
- [T1110] Brute Force – Repeating the same or slightly varied payloads multiple times to account for non-deterministic outputs and achieve successful exploitation (‘repeating the same payload multiple times’).
- [T1565] Data Manipulation – LLM hallucinations can fabricate details or lists that misrepresent prior state or inputs, leading to misleading results or false positives (‘include an invented list of previous instructions or expanding on something that the attacker suggested but does not actually exist’).
- [T1499] Endpoint Denial of Service (testing impact of configuration limits) – Limited concurrent connections or rate limits on the service slow or prevent exhaustive testing, affecting the ability to probe non-deterministic behavior (‘consult with your security vendor about the maximum number of connections they can utilize’).
Indicators of Compromise
- No IOCs mentioned in the article – the post discusses methodology and behavior but does not list IPs, domains, file names, or hashes.
Prompt injection testing should treat the model as a probabilistic engine rather than a deterministic database. Unlike SQL injection where crafted payloads yield repeatable database errors or responses, LLM outputs depend on token scoring and sampling parameters (temperature, beam size, etc.), so identical inputs can produce different answers. Therefore, a single failed attempt does not prove the absence of a vulnerability; successful exploitation may require multiple attempts under the same input or slight variations.
Because LLMs can hallucinate, testers must verify that any apparent success is real rather than a fabricated response. Practical steps include repeating payloads many times, trying minor input variations, and validating outputs against the known prompt template or other ground truth. Obtaining the prompt template (when possible) and model configuration helps distinguish true behavior changes from hallucinated content and reduces false positives.
Operationally, coordinate with service providers about concurrency and model settings: highly non-deterministic models and low connection limits both increase the time and number of attempts needed for comprehensive coverage. Use deterministic model configurations when available for faster, more reliable testing, and design verification checks to confirm exploitation rather than relying on single-instance outputs.
Read more: https://research.nccgroup.com/2024/04/12/non-deterministic-nature-of-prompt-injection/