Introducing YARA Rules: Search and Monitor the Internet’s Infrastructure with YARA

Introducing YARA Rules: Search and Monitor the Internet’s Infrastructure with YARA

Validin added YARA retro hunting across its large archive of virtual host responses, allowing enterprise users to write, run, and view matches from custom YARA rules to discover and track indicators in historical web artifacts. A demonstrated use case found over 5,000 exposed OpenAI API keys in one week by searching for the OpenAI key substring. #Validin #OpenAI_api_keys

Keypoints

  • Validin now supports custom YARA rules tied to projects to retroactively scan virtual host responses across its dataset.
  • Rules must be syntactically correct YARA, contain no private/global rules, and have a single definition.
  • Runs are configurable with a lookback window (with a 4-hour buffer) and currently scan virtual host responses as the source.
  • Match results show the body SHA1, the hour of observation, and allow viewing full HTML artifacts for context.
  • A replicated retro hunt for exposed LLM API keys (OpenAI substring T3BlbkFJ) returned over 5,000 matches in one week, revealing exposed keys and sensitive comments in HTML.
  • Validin plans to expand supported sources (e.g., favicons, certificates, JavaScript) and invites customer suggestions via [email protected] or Slack.
  • Enterprise Edition customers can use the feature immediately; others are encouraged to contact Validin to gain access.

MITRE Techniques

  • [T1083] File and Directory Discovery – Using YARA rules to scan archived virtual host response artifacts to discover exposed keys and sensitive strings: “we wrote a YARA rule that simply searched for the substrings T3BlbkFJ and sk-ant-api03”.
  • [T1503] Credentials in Files – Identification of exposed API keys embedded in HTML artifacts by searching for known API key substrings: “This is a rule to find exposed OpenAI API Keys” and discovering matches with the OpenAI substring.
  • [T1412] Domain/URL Discovery – Scanning virtual host responses (web artifacts) to enumerate URLs and pages containing sensitive information: matches include body SHA1 and the hour the virtual host response was observed (“you’ll see the body’s SHA1, the hour of data in which it was matched”).

Indicators of Compromise

  • [File Hash] Matched artifact bodies – body SHA1 values are reported for each match (example: body SHA1 shown in match view, and many other SHA1s across matches).
  • [Strings / Secrets] Exposed API key substrings in HTML – “T3BlbkFJ” (OpenAI key substring) and “sk-ant-api03” (Anthropic key substring) found in artifacts.
  • [File Names / Artifacts] HTML artifacts containing comments and embedded keys – examples include exposed HTML pages with embedded keys and developer comments (and many similar HTML artifacts).


Read more: https://www.validin.com/blog/yara_hunting/