From Narrative to Knowledge Graph | LLM-Driven Information Extraction in Cyber Threat Intelligence

From Narrative to Knowledge Graph | LLM-Driven Information Extraction in Cyber Threat Intelligence

This post evaluates using large language models to extract and contextualize information from CTI reports, turning narrative into structured JSON entities and a unified knowledge graph for downstream defense workflows. It describes a three‑phase workflow (sanitization, LLM extraction, knowledge‑graph assembly), experimental results across multiple LLMs, and operational trade‑offs in accuracy, abstention, ensembling, and data‑model design. #GPT4_1 #ClaudeSonnet4_5

Keypoints

  • Proposes a three‑phase CTI extraction pipeline: report ingestion and sanitization, LLM‑based extractors (Infrastructure, Executables, Playbook), and knowledge‑graph assembly linking IOCs, steps, playbooks, and threat actors.
  • Uses custom extractor data models (including 12 IOC contextual attributes) and structured prompts with an evidence‑grading policy (High/Medium/Low) and an explicit abstention class (None) to constrain inference.
  • Evaluates off‑the‑shelf LLMs (GPT‑4.1, GPT‑5, GPT‑5.2, Claude Sonnet 4.5, Claude Opus 4.5) on a ground truth of 343 atomic IOCs and 1,859 labeled IOC attribute instances, emphasizing feasibility rather than definitive model ranking.
  • LLM extractors delivered large speedups versus human analysts (human baseline ~41 minutes per report; LLMs ~3.3 minutes on average, >18× speed‑up) while trading off coverage and correctness depending on settings.
  • Extraction quality depends strongly on report formatting, label/context cues, OCR coverage for embedded images, prompt‑model fit, and deliberate data‑model wording that guides LLM attention.
  • Discusses ensemble strategies, abstention error metrics (FDR, FNR), and the need for representative, flexible ground truth that can capture genuine ambiguity via multiple valid labels.
  • Recommends deliberate operational planning, continuous evaluation, and tailored data‑model and prompt design to balance accuracy, coverage, latency, and downstream reliability.

MITRE Techniques

Indicators of Compromise

  • [Domain ] selective extraction of attacker‑owned versus benign domains – malicious-example[.]com, attacker-domain[.]net
  • [IP address ] infrastructure indicators described in reports and IOC tables – 192.0.2.1, 198.51.100.23
  • [File hash ] executable identifiers (MD5, SHA‑1, SHA‑256) for malicious or attacker‑used binaries – d41d8cd98f00b204e9800998ecf8427e (MD5), e3b0c44298fc1c149afbf4c8996fb92427ae41e4… (SHA‑256), and 2 more hashes
  • [File path ] extracted open‑text attributes for reported artifacts on host systems – C:WindowsTempmal.exe, /usr/bin/evil
  • [Command line ] command‑line arguments associated with executables – mal.exe -s -c, python exploit.py –target 10.0.0.1
  • [Injected process ] names of processes reported as injection targets – explorer.exe, svchost.exe
  • [Certificate fingerprint ] pivotable artifacts used for correlation and blocking decisions – SHA1:AB:CD:EF:12:34:56:78:90:AB:CD:EF:12:34:56:78:90, cert-fingerprint:12:34:56:78:90:AB


Read more: https://www.sentinelone.com/labs/from-narrative-to-knowledge-graph-llm-driven-information-extraction-in-cyber-threat-intelligence/