In-depth Analysis of the PyTorch Dependency Confusion – Aqua

An attacker exploited a PyTorch-nightly dependency confusion by uploading a malicious Torchtriton package to PyPI, causing users to pull a counterfeit binary. The malware exfiltrates data via DNS to a domain controlled by the attacker, and the post explains the attack, its indicators, and how to mitigate such supply chain issues. #Torchtriton #PyPI #PyTorchNightly #DNSExfiltration #SupplyChainAttack

Keypoints

  • PyTorch-nightly’s dependency chain was compromised via a dependency confusion attack on PyPI between December 25–30, 2022.
  • A malicious Torchtriton package with the same name and a higher version was uploaded to PyPI, causing the private/private-public mix to pull the malicious package.
  • The malicious package appeared nearly identical to the legitimate one; a malicious binary named “triton” was added, and __init__.py was modified to run it.
  • The malware collects sensitive machine data (passwd, hosts, user info, SSH data, environment variables) and exfiltrates it via DNS.
  • Exfiltrated data is sent to the domain h4ck.cfd using the DNS server wheezy.io, as shown by traffic analysis.
  • Immediate mitigations include uninstalling PyTorch-nightly/torchtriton and purging caches, plus upgrading to post-December 30, 2022 nightly builds; general guidance includes SBOMs, dependency scanning, sandboxing, and runtime protection.

MITRE Techniques

  • [T1195] Supply Chain Compromise – In a PyPI dependency confusion attack, a malicious package with the same name and higher version was uploaded, causing legitimate installs to fetch the malicious one. ‘a malicious package with the same name (and a higher version) to the Python Package Index (PyPI) code repository, resulting in a dependency confusion.’
  • [T1119] Automated Collection – The malware gathers sensitive information such as passwd and hosts files, current user info, SSH data, and environment variables. “The main function of the malware is to gather sensitive information, such as the passwd and hosts files, and to collect various other data including information about the current user, SSH data, and environment variables.”
  • [T1048.003] Exfiltration Over DNS – Data is exfiltrated via DNS to a domain using a DNS server, e.g., “The data is sent to the domain h4ck[.]cfd, using the DNS server wheezy[.]io.” and “data is exfiltrated via DNS.”
  • [T1036] Masquerading – The attacker made the malicious package resemble the legitimate one; the two packages were “almost 100% identical,” and lines were inserted into __init__.py to run the binary. “‘the two packages are almost 100% identical. We only saw two differences’ … ‘the attacker inserted to the __init__.py file the lines 4-13 which were designed to run the binary.’”

Indicators of Compromise

  • [Hash] MD5 – 908596ffe11c30d1669431f3f4cb54f2 – MD5 hash of the malicious binary “triton” discovered in the runtime folder.
  • [Filename] triton – Malicious binary name inserted into the runtime directory.
  • [Domain] h4ck.cfd – Exfiltration domain used to receive data.
  • [Domain] wheezy.io – DNS server used to transmit exfiltrated data.
  • [Package] torchtriton – Malicious PyPI package name used in the supply chain attack.

Read more: https://blog.aquasec.com/pytorch-dependency-confusion-administered-malware