SentinelLABS evaluated OpenAI’s native Responses API compaction for long-running malware analysis workflows and found it cut input tokens by about 86% with no measurable change in overall evaluation score. The study concluded that compaction can reduce noise and cost while preserving task quality, though it may slightly weaken domain object modeling in some cases. #OpenAI #ResponsesAPI #SentinelLABS
Keypoints
- SentinelLABS tested OpenAI’s native compaction in the Responses API using an automated malware analysis evaluation harness.
- Compaction reduced input tokens by about 86% while overall evaluation performance stayed effectively unchanged.
- The research focused on long-running binary analysis tasks that require tracking hypotheses, evidence, and open questions over many tool-assisted steps.
- Compaction was used to preserve working memory, while exact artifacts and evidence were kept in durable storage.
- Output tokens, reasoning tokens, and model calls also decreased, showing operational efficiency gains.
- Domain object modeling declined somewhat, suggesting that compaction can flatten some structural reasoning if exact evidence is not preserved elsewhere.
- The article recommends treating compaction as lossy unless evaluations confirm that downstream behavior remains correct.
MITRE Techniques
- [T1005 ] Data from Local System – The workflow retrieves exact evidence such as logs, tool outputs, and decompiled functions from durable storage instead of relying on compacted context. (‘exact evidence…retrieves it from storage rather than relying on the compacted context’)
- [T1217 ] Path Discovery – The evaluation asks the model to identify important functions and follow code paths through the binary. (‘Identify important functions and follow code paths’)
- [T1027 ] Obfuscated Files or Information – The article centers on automated binary analysis of malware where decompilation and interpretation of strings, APIs, and data structures are required. (‘a model access to a decompiler’ and ‘Interpret strings, APIs, call relationships, and data structures’)
- [T1082 ] System Information Discovery – The model is tasked with determining what the malware is doing by analyzing functions, APIs, and behavior. (‘explain what the malware is doing’)
- [T1057 ] Process Discovery – The analysis framework follows code paths and behavior to understand the malware’s internal structure and activity. (‘follow code paths’)
Indicators of Compromise
- [Software / API] OpenAI Responses API context management – native compaction threshold and standalone /responses/compact endpoint used in the evaluation. – gpt-5.5, /responses/compact
- [File / Artifact Names] Decompiler and analysis artifacts – binary analysis outputs and intermediate evidence retained outside the compacted context. – decompiled functions, tool outputs
- [Configuration / Code Snippets] Example compaction settings – sample threshold used to trigger server-side compaction. – compact_threshold: 200000, store=False