Keypoints
- EMERALDWHALE scanned the internet for exposed Git configuration files and Laravel .env files to harvest secrets.
- Researchers found over 15,000 stolen cloud and email credentials, including keys tied to more than 10,000 private repositories.
- The operation used private toolsets—MZR V2 (MIZARU) and Seyzo-v2—plus web scraping and git-dumper to locate and extract credentials.
- Stolen data and tooling were stored in a publicly accessible S3 bucket (s3simplisitter) belonging to a prior victim and later taken down after reporting.
- Attackers validated and abused credentials to create users, check SMTP/SNS capabilities, and prepare accounts for spam and phishing campaigns.
- Target lists and exposed Git config files were actively traded in underground markets, enabling large-scale automated abuse.
MITRE Techniques
- [T1003] Credential Dumping – Extracted secrets from exposed Git configuration files and Laravel environment files (‘extracted credentials from exposed Git configurations and Laravel .env files’)
- [T1210] Exploitation of Remote Services – Abused misconfigured web services to retrieve repository and configuration data (‘abused multiple misconfigured web services, allowing attackers to steal credentials, clone private repositories, and extract cloud credentials from their source code’)
- [T1213] Data from Information Repositories – Accessed and harvested secrets stored in code repositories (‘access and extracted data from Git repositories’)
- [T1203] Web Service Misconfiguration – Leveraged web server misconfigurations to read the .git directory and download repositories (‘exploited web server misconfigurations to access the .git directory’)
Indicators of Compromise
- [S3 bucket] public storage used to collect stolen data – s3simplisitter (contained malicious tools, logging, and >1 TB of harvested data)
- [Exposed path / URL] locations targeted to retrieve Git configs – https://<IP>/.git/config (over 67,000 discovered URLs with this path)
- [Usernames] attacker/automation account names observed – mailer-sns-smtp, mizaruveryhq
- [File names] output and result files created by tools – healthy_aws_smtp.txt, ses_valid.txt
- [Target lists / IPs] scanning targets and scope evidence – lists containing 500M+ IP addresses and 12k IP ranges (and large domain lists)
The Sysdig Threat Research Team discovered a coordinated campaign named EMERALDWHALE that scanned large swaths of the internet for misconfigured web services and exposed repository or environment files, then harvested and validated thousands of credentials. The investigation began when analysts monitoring a cloud honeypot saw a suspicious ListBuckets call using a compromised account that referenced an external, publicly exposed S3 bucket named s3simplisitter. That bucket contained malicious tools, extensive logging, and over a terabyte of data, including harvested credentials and evidence of a multi-stage operation; after Sysdig reported it, AWS removed the bucket.
Analysis of the bucket’s contents revealed a large-scale scanner operation running between August and September that searched for exposed Git configuration files and Laravel .env files. The attackers used target lists—ranging from massive IP lists to domain collections—to feed tools that probed hosts in parallel and requested /.git/config and similar paths. When an exposed configuration was reachable, scripts parsed URLs and tokens from those files (for example, URLs of the form https://user:token@github/… .git), validated tokens via service APIs, and then used valid credentials to clone repositories and search repository files for additional secrets. In many cases, valid GitHub tokens were recovered; a limited check of roughly 6,000 GitHub tokens showed about 2,000 remained valid.
Two privately developed toolsets were found in the stolen data. MZR V2 (also referred to as MIZARU) is a collection of Python and shell scripts that begins with gitfinder.sh using httpx to locate /.git/config files, then runs ghpurl.py (which fetches config content and extracts repository URLs), checkuser.sh (which validates credentials via the GitHub API), and dumpsph.sh (which clones repositories and greps for secrets such as AWS keys). MZR V2 also contains parser scripts to extract AWS keys and regions, and it uses AWS CLI commands to verify key capabilities. Depending on options, the tool can automatically create IAM users, convert IAM secrets to SMTP passwords for sending mail through SES, check SNS SMS quotas, and run Node-based verifications for email delivery; successful accounts are written to files such as healthy_aws_smtp.txt and ses_valid.txt.
Seyzo-v2 follows a similar workflow but leverages git-dumper for a more comprehensive repository harvest. Its dumperz.sh script runs git-dumper to collect repository histories and then searches the dumped repositories for a broad set of indicators related to SMTP providers, SMS and API keys (for example, Twilio SK keys and Nexmo credentials), and cloud provider patterns. Seyzo-v2’s searches produce outputs like smtp.txt and api_sms.txt, and its operators use the resulting credentials to create accounts and prepare large-scale spam or phishing operations.
Beyond exposed Git configs, the operation used large-scale web scraping to capture static assets and client-side code where credentials can be accidentally embedded. Dozens of folders recovered from the S3 bucket contained downloaded site assets; a central extraction script, ex.sh, ran grep-based regexes across those assets to find patterns like AKIA[A-Z0-9]{16} (AWS access key IDs) and long base64-like strings that could represent secrets. These scraping artifacts show that attackers combined focused scans for repository artifacts with broad scraping to increase yield.
The scope of EMERALDWHALE was significant. The logging data included target lists with more than 500 million individual IPs, 12,000 IP ranges, roughly 500,000 domains, and about one million EC2 hostnames; the group even maintained a full IPv4 list enumerating billions of addresses. Using one of these lists and MZR V2, researchers observed discovery of over 67,000 URLs exposing /.git/config. Repositories came from major hosting services including GitHub, Bitbucket, and GitLab, while smaller providers and personal repositories were also present; the dataset included approximately 3,500 smaller or less common repositories and over 700 instances of AWS CodeCommit repositories. The attackers collected credentials from more than 10,000 private repositories and ultimately amassed over 15,000 credentials tied to cloud services, email providers, and other platforms.
Multigrabber, a commercial secret-harvesting tool commonly advertised in underground forums, was also observed in the ecosystem and is focused on locating exposed Laravel .env files. Laravel environment files often contain database credentials, API keys, and cloud credentials, making them valuable targets. The research uncovered version 8.5 of Multigrabber and evidence that the tool—and related courses or resale of the code—circulates in Telegram groups and other underground markets. EmperorsTool is cited as an original developer of Multigrabber, though variants and resellers now appear active.
The recovered tooling shows how automated and low-effort these attacks have become: attackers run scalable scans on ephemeral infrastructure, validate credentials automatically, and then either directly monetize accounts (for example by sending spam or selling valid credentials) or sell curated target lists and credential packs on marketplaces. Sysdig observed that target lists themselves are traded—for example, a list of exposed Git configuration URLs sold for about $100—underscoring the commercial incentives driving this abuse. The value of individual credentials can be hundreds of dollars, and compiled target lists or verified account packs have additional resale value.
EMERALDWHALE demonstrates that secret management alone is insufficient if web service misconfigurations expose repository data or environment files. Misconfigured servers that allow access to /.git directories or publicly readable .env files create alternate leakage paths that attackers can automate at scale. Defenders should therefore combine secret management with continuous exposure management, external and internal scanning for publicly accessible repository artifacts, and behavioral monitoring of identities and keys to detect anomalous use. The researchers emphasize that detecting exposed Git configuration files and environment files from an external perspective is critical because that is what attackers can see and exploit.
Read more: https://sysdig.com/blog/emeraldwhale/