GuardDog 3.0 is a major update to Datadog’s open source package scanner, replacing Semgrep with YARA, adding a new risk engine, and sandboxing extraction and analysis with Nono. It also introduces stronger evaluation metrics and support for scanning npm, PyPI, Go modules, custom rules, and malicious package datasets such as the litellm sample and Shai-Hulud-related detections. #GuardDog #Datadog #Nono #Semgrep #YARA #PyPI #npm #ShaiHulud
Keypoints
- GuardDog 3.0 is the latest release of Datadog’s open source tool for detecting malicious software packages.
- The project moved from Semgrep-based source heuristics to YARA rules run through yara-python for better scale and performance.
- The new risk engine correlates capability rules and threat-indicator rules to assess whether a package is likely malicious.
- GuardDog 3.0 scores packages using factors such as maximal severity, attack-chain completeness, specificity, and sophistication.
- Package extraction and scanning are isolated in a Nono sandbox to reduce the impact of potential exploitation of GuardDog vulnerabilities.
- Evaluation is based on precision, recall, F1 Score, and MCC, using clustered malicious samples from Datadog’s dataset of more than 27k packages.
- Users can scan npm, PyPI, and dependency manifests such as package.json or requirements.txt, and the tool reports risk levels from low to high.
MITRE Techniques
- [T1190 ] Exploit Public-Facing Application – The article notes that attackers could exploit vulnerabilities in GuardDog to read or write arbitrary files or achieve remote code execution on the system running it (‘could let an attacker craft a specific npm or PyPI package to read or write arbitrary files, or even gain remote code execution on the machine running GuardDog’).
- [T1036 ] Masquerading – Malicious packages may use names resembling legitimate popular packages or exploit dependency confusion to appear trustworthy (‘packages may have names that resemble a legitimate popular package’).
- [T1195 ] Supply Chain Compromise – The article describes attackers adding malicious dependencies or backdooring packages on npm/PyPI to compromise downstream users (‘an attacker adds a malicious dependency to the package.json file on PyPI, without reflecting the change on GitHub’).
- [T1059 ] Command and Scripting Interpreter – GuardDog looks for pre/post-install scripts and injected setup.py code, which are common script execution vectors (‘Pre/post-install scripts, or injected code in the setup.py file’).
- [T1005 ] Data from Local System – Malicious packages may read sensitive credentials from the filesystem or environment variables (‘Read sensitive credentials from the filesystem or environment variables’).
- [T1041 ] Exfiltration Over C2 Channel – The article describes uploading stolen credentials to attacker-controlled locations via HTTP requests (‘performing an HTTP request with stolen credentials as a payload’).
- [T1105 ] Ingress Tool Transfer – A second-stage payload can be downloaded from an attacker-controlled location and executed (‘Pull a second-stage payload from an attacker-controlled location and execute it’).
- [T1547 ] Boot or Logon Autostart Execution – Persistence may be achieved by modifying startup locations such as .bashrc or Windows registry keys (‘Inject code in standard startup locations such as .bashrc or Windows registry keys’).
- [T1056 ] Input Capture – GuardDog includes capability rules such as clipboard access, which can indicate keylogger-like or input-grabbing behavior (‘capability.runtime.clipboard (access the clipboard’s content)’).
- [T1071 ] Application Layer Protocol – GuardDog flags outbound network capability, indicating the ability to perform network-based communications (‘capability.network.outbound (perform outbound network calls)’).
- [T1060 ] Registry Run Keys / Startup Folder – The article explicitly references Windows registry keys as a persistence location (‘Windows registry keys’).
- [T1485 ] Data Destruction – Not directly described as destruction, but the sandboxing discussion includes protections against file-system abuse where a malicious package might attempt harmful file operations (‘read or write arbitrary files’).
Indicators of Compromise
- [Package names ] Scanning examples and malicious sample references – react, litellm
- [URLs ] Malicious package sample location and dataset reference – https://github.com/DataDog/malicious-software-packages-dataset/blob/main/samples/pypi/compromised_lib/litellm/1.82.7/2026-03-24-litellm-v1.82.7.zip, github.com/DataDog/malicious-software-packages-dataset
- [File names ] Dependency manifests and startup/persistence targets mentioned in detection logic – package.json, requirements.txt, .bashrc
- [Environment variable / file path indicators ] Sensitive targets that may be accessed by malware – environment variables, sensitive file names
- [Hashing / clustering artifact ] Dataset de-duplication and evaluation sample selection – TLSH, commit 092675f
- [Project / tool names ] Components referenced in the scanning stack – GuardDog, Nono, nono-py, yara-python
Read more: https://securitylabs.datadoghq.com/articles/guarddog-3-0-release/