Navigating the Realm of Malicious Python Packages

Have you ever encountered the term ‘double agent’? Recently, we’ve had the opportunity to revisit this concept in Austria. Setting aside real-world affairs for prosecutors and journalists, let’s explore what this term means in the digital world as I continue my journey tracking malicious Python packages.

Open Source is a key!

Suppose you were a script kiddie threat actor researcher looking to snag some cookies analyse new tools used to steal information from victims — where would you head? You might choose from several options, but let’s assume you love open source and decide to visit GitHub, one of the largest platforms for open source projects. It’s an excellent resource for almost everything IT-related, including educational materials on malware builders.

I tricked you a bit there — did you catch it? Not everything on GitHub is open source, even if the source code is visible. Moreover, “open source” doesn’t mean the same thing in every context. It’s crucial to always check the licence. Always check the licence.

For instance, on GitHub, you can find the Oak Token Grabber V2. It offers a builder to customize a grabber (information stealer malware). Check out these features:

A screenshot of the README from the repository dreamyoak/Oak-Grabber-V2

A screenshot of the README from the repository dreamyoak/Oak-Grabber-V2

This repository isn’t new; it existed already in the middle of last year [1]. When I visited, there was a link to a website offering paid versions of educational RAT grabbers and other services. This isn’t unusual. Reviewing the repository’s history showed no activity for a year between March 2023 and March 2024, then suddenly, an author with a slightly different name (dreamyoak instead of the original dynastyoak) began updating the code. This suggests that the repository had been moved or perhaps taken down in the past year.

A brief note on analysing the history of git repositories: like anything, you cannot blindly trust the data provided by git. For instance, dates can be easily tampered with by the commit author. However, in this case, all changes were made through the GitHub web interface, which means such commits are automatically signed by GitHub, and we can verify them using their public key [2].

What did the new author do? The most significant change was the introduction of a new dependency that did quite a bit: collecting Wi-Fi passwords, PowerShell history, installed applications, desktop screenshots, and more. Yet, there was one tiny detail.

These weren’t features of the builder.

This data was collected from people trying to build malware, effectively acting as a double agent spying on both sides.

In short, a grabber builder was transformed into a grabber itself. The dependency — a Python package uploaded to PyPI — was imported by the builder when used, then it automatically downloaded an actual grabber that collected and exfiltrated data. This was, in fact, the grabber advertised on the website — the Nagogy Grabber — first observed at least a year ago [3]. It can be easily detected with a YARA rule from Any.run [4].

While the malicious dependency straightforwardly downloaded and ran the actual grabber, a very clever old technique was used to evade static analysers. Python is one of the languages that does not require source code to be written using pure ASCII characters. PEP 3131 introduced support for any characters that can be normalized and defined Python behaviour as follows: “All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.” [5]

What does this mean? Consider the example below. In the first two lines, I used simple ‘u’ and ‘a’ letters. But the characters in the third line aren’t them any more—those are “Mathematical Sans-Serif Bold Italic Small U” [6] and “Mathematical Sans-Serif Bold Small A.” [7] Both are part of the Unicode specification, and as you can see, even though they don’t graphically match the variable identifiers from previous lines, Python was able to process the statement successfully. This was possible thanks to the normalization, which translated the characters to ASCII before evaluation.

An example of mixing ASCII and Unicode characters in identifiers that could confuse people but not Python

An example of mixing ASCII and Unicode characters in identifiers that could confuse people but not Python

This feature isn’t often used, but authors of malicious code learned years ago that many static analysers do not follow PEP 3131 and won’t recognize what the code in the picture below, sampled from the malicious package imported by the mentioned grabber builder, really does.

This is an entirely valid Python code downloading and running a malicious executable

This is an entirely valid Python code downloading and running a malicious executable

The author of the Oak-Grabber-V2 seemed quite determined to maintain the double agent feature. After removing malicious packages from PyPI, they were quickly introduced new ones and updated the repository. Observing the stars and forks statistics captured by archive.org, the repository doubled its popularity in just a few days between February 27 and March 7 [8] [9]. It appears the author promoted the tool aggressively and later tried to extract data from its users — the ‘double agent’ feature was introduced on April 11. This cat-and-mouse game finished when GitHub removed the repository on April 16.

Repository stats on February 27 Repository stats on March 7
Comparison of the repository statistics on February 27 [8] and March 7 [9]

User agent control

It wasn’t just one ‘agent’ found recently. I came across another Python package that offered a unique functionality — controlling your server via the User-Agent header!

A sample from the user-agents-parser package

A sample from the user-agents-parser package

This innovative feature was embedded within a clone of a popular package designed to parse user-agent strings [10], which are self-descriptions browsers send to servers with every request [11]. Web applications often use these strings for different purposes, like directing users to a mobile site or gathering statistics. In this instance, the author replicated an existing package but added a twist: the strings could execute as shell commands before being parsed. Despite the modification, the packages maintained their original functionality, meaning you wouldn’t know you were using a compromised package unless a specific request triggered the command execution.

Moreover, the package creator employed another common tactic worth noting: they preserved the original project’s website and author information, which are typically displayed on package index pages like PyPI. These stats can mislead users into trusting a seemingly popular and secure package.

PyPI recently took steps to prevent such deception by clearly indicating which data are verified and which are not — a significant improvement.

Stats in PyPI - old version PyPI stats - new
Left – a screen from a repository captured in archive.org [12], right – another repository, state as of today [10].

After I reported the package, Mike Fiedler from PyPI security team found that its earlier version was also trying to establish a persistent reverse shell by registering a cron job.

An earlier version of user-agents-parser was trying to use crontab for persistence

Final thoughts

All associated packages were removed from the PyPI, and the Oak-Grabber-V2 repository was shut down by GitHub. However, this isn’t the first or last time we’ll encounter such threats. If you’re looking for advice, I’ve noted some tips in my last post. But most importantly, avoid downloading random software, even if it’s for educational purposes.

IoCs

  • Malicious packages used by Oak-Grabber-V2: argsreq, colarg, colargs, reqarg, reqargs
  • URLs with the actual grabber:
    • hxxps://api.dreamyoak[.]xyz/cdn/file
    • hxxps://api2.dreamyoak[.]xyz/cdn/file
  • Malicious packages pretending to be user agent parser: user-agents-parser, user-agents-parsers
  • IP used in an attempt for reverse shell: 95.179[.]177[.]74

References

[1] https: //web.archive.org/web/20230731214919/https://github.com/dreamyoak/
[2] https://github.com/web-flow.gpg
[3] https://twitter.com/MalGamy12/status/1698367753919357255
[4] https://github.com/anyrun/YARA/blob/73fba11a040629e147281aa0528439d72fb5402a/NagogyGrabber.yar
[5] https://peps.python.org/pep-3131/
[6] https://unicodeplus.com/U+1D66A [7] https://unicodeplus.com/U+1D5EE
[8] https: //web.archive.org/web/20240227221457/https://github.com/c/Oak-Grabber-V2?tab=readme-ov-file
[9] https: //web.archive.org/web/20240307140321/https://github.com/dreamyoak/Oak-Grabber-V2
[10] The original, safe package is here: https://pypi.org/project/user-agents/
[11] https://developer.mozilla.org/en-US/docs/Glossary/User_agent
[12] https://web.archive.org/web/20240117161520/https://pypi.org/project/adafruit-circuitpython-htu31d/

Source: https://cert.at/en/blog/2024/4/double-agents-and-user-agents-navigating-the-realm-of-malicious-python-packages