Proofpoint released an open-source tool called PDF Object Hashing to create detection rules that fingerprint PDFs by their internal object structure, enabling detection and clustering even when PDFs are obfuscated, encrypted, or have changing lure content. The technique has been used internally to track multiple threat actors including UAC-0050 and UNK_ArmyDrive and is available on GitHub. #PDF_Object_Hashing #UAC-0050 #UNK_ArmyDrive
Keypoints
- Proofpoint developed PDF Object Hashing to fingerprint PDFs by hashing the sequence of object types rather than specific object parameters or content.
- The method is robust against common PDF variations (whitespace, cross-reference table formats, object parameter placement) and can handle compressed stream objects.
- PDF Object Hashing enables detection of encrypted PDFs because document structure remains visible even when object contents are obscured.
- The tool supports clustering of related PDFs to identify builder or process similarities despite changes to lure images, URIs, or other surface-level artifacts.
- Proofpoint has used the technique to attribute and track threat actors internally, including UAC-0050 (NetSupport RAT distribution) and UNK_ArmyDrive.
- PDF Object Hashing complements other detection methods (e.g., image dhash) by focusing on document skeletons and is intended to produce more robust detection signals.
- The project is open source and available on Proofpoint Emerging Threats’ GitHub for use by defenders and researchers.
MITRE Techniques
- [T1105] Ingress Tool Transfer – Used as PDFs contained URLs that download additional payloads (e.g., “the URL typically downloads a compressed JavaScript file which, if executed, installs the NetSupport RAT payload”).
- [T1204] User Execution – PDFs are used as lures to entice users to follow links or open attachments leading to credential phishing or malware delivery (“PDFs distributed in many ways…contain URLs leading to malware or credential phishing”).
- [T1110] Brute Force / Valid Accounts (credential phishing enabling BEC) – PDFs with fake banking details or invoices facilitate business email compromise and credential harvesting (“PDFs with fake banking details or invoices to enable business email compromise (BEC) activity”).
- [T1027] Obfuscated Files or Information – Attackers use PDF obfuscation and encrypted PDFs to hide URIs and payload parameters (“because these malicious PDF files are encrypted, many cybersecurity tools…are unable to extract the embedded content”).
- [T1203] Exploitation for Client Execution – PDFs act as initial vectors that lead to execution of subsequent malicious scripts or RATs after user interaction (“messages contain PDF files with URLs leading to NetSupport RAT” and subsequent JavaScript execution).
Indicators of Compromise
- [File Hash ] example malicious PDF SHA256 – ee03ad7c8f1e25ad157ab3cd9b0d6109b30867572e7e13298a3ce2072ae13e5 (OneDrive impersonation sample).
- [File Hash ] example malicious PDF SHA256 – 08367ec03ede1d69aa51de1e55caf3a75e6568aa76790c39b39a00d1b71c9084 (UNK_ArmyDrive Bangladesh Ministry of Defense lure).
- [GitHub Repository ] tool source – https://github.com/EmergingThreats/pdf_object_hashing (location of PDF Object Hashing project).
- [Malicious Payload ] RAT/Downloader context – NetSupport RAT distribution via URLs embedded in PDFs that download a compressed JavaScript leading to payload execution.