Inside the Great Firewall Part 1: The Dump

Inside the Great Firewall Part 1: The Dump

A breach of China’s censorship infrastructure leaked roughly 500–600 GB of internal data — including source code, PCAPs, configuration files, Visio diagrams, and metadata tying engineers and organizations to the Great Firewall’s operation. The dump reveals detection heuristics and deployment records for tools like Psiphon, V2Ray, and Shadowsocks and exposes ISP- and vendor-linked organizational fingerprints such as China Telecom and CETC. #Psiphon #V2Ray #Shadowsocks #ChinaTelecom #CETC

Keypoints

  • Over 500 GB (estimated ~600 GB) of internal GFW-related data were leaked, comprising ~7,000+ files including source code, logs, PCAPs, Visio diagrams, and operational runbooks.
  • Artifacts include RPM packaging files, Jira/Confluence project data, emails, configuration files, and OCR’d screenshots of internal control dashboards, revealing operational workflows and tooling.
  • Technical materials show DPI, SSL/TLS fingerprinting (SNI), sketch-based detection, and heuristic classifiers used to identify and throttle VPNs, proxies, and circumvention tools like Psiphon, V2Ray, and Shadowsocks.
  • Leakage of metadata (usernames, hostnames, authorship, internal IPs) enables attribution linking personnel and organizations to roles across China Telecom, China Unicom, China Mobile, academic labs, and MSS-linked vendors (e.g., CETC, Topsec).
  • Evidence of failures: misconfigured mirrors exposing blacklist data, cross-border traffic escaping inspection, and honeypot logs showing foreign reconnaissance and delayed rule propagation.
  • Two plausible breach vectors: an insider with broad privileged access or a coordinated external exfiltration exploiting misconfigurations and insecure admin interfaces.
  • The disclosure undermines many detection heuristics and operational secrecy, empowering circumvention developers, red teams, and policy actors while increasing exposure risks for identified personnel and vendors.

MITRE Techniques

  • [T1078 ] Valid Accounts – Leaked metadata showing dozens of usernames and system account names (e.g., “admin-jw”, “yunwei-wang”) tied to operations implies use and compromise of legitimate accounts: “dozens of unique usernames… system-level account names… enabling correlation to individual operators.”
  • [T1005 ] Data from Local System – Archive contains internal documents, Visio files, Excel spreadsheets, and code from packaging servers, indicating collection of locally stored artifacts: “RPM packaging server files… Word, Excel, and PowerPoint files exposes the usernames… and edit trails.”
  • [T1041 ] Exfiltration Over C2 Channel – Dataset organization and evidence of persistent access (PCAPs, CPU logs, routing tables) suggest methodical siphoning of data over time likely via persistent channels: “PCAP captures, CPU load logs, and Visio diagram exports suggest persistent access and automated tooling were in play.”
  • [T1592 ] Gather Victim Host Information – Packet captures, routing tables, and monitoring exports show collection of host and network telemetry for mapping and analysis: “raw IP access logs… packet captures (PCAPs), routing tables, and blackhole sinkhole exports” were included.
  • [T1071 ] Application Layer Protocol – Detection and analysis focused on TLS/SNI, DoH, and app-layer signatures to identify circumvention tools and fingerprint traffic: “references to TLS fingerprinting rules… SNI patterns… Anonymous DNS Resolution System via Tor Network with DOH (DNS-over-HTTPS) Encryption.”
  • [T1113 ] Screen Capture – OCR-processed screenshots and screen captures of management consoles provided visual evidence of control panels and logging dashboards: “OCR-processed screenshots illustrate the UI panels of traffic control dashboards, logging mechanisms, and internal tooling.”
  • [T1566 ] Phishing (possible enabling vector) – While not explicitly confirmed, the report notes potential exploitation of insecure admin panels and misconfigurations as breach vectors consistent with credential harvesting or access via compromised interfaces: “misconfigurations in firewalls, insecure admin panels, and segmented network seams may have been exploited.”

Indicators of Compromise

  • [IP addresses ] state-run provider logs and testbed references – raw IP access logs from China Telecom/China Unicom/China Mobile and GFW staging zones (examples referenced as provider IP ranges and staging IPs; specific addresses not quoted in article).
  • [File names / types ] leaked artifacts and tooling – examples include Visio files (.vsd/.vsdx), Excel spreadsheets, RPM packaging server files, and OCR’d screenshots of management consoles (e.g., “Visio (.vsd/.vsdx) files”, “RPM packaging server files”).
  • [Application identifiers ] targeted circumvention tools – examples: Psiphon, V2Ray, Shadowsocks found in application-layer analyses and testbed logs.
  • [Metadata / user accounts ] usernames and hostnames – examples: “admin-jw”, “it_ops_lh”, “yunwei-wang” tied to document authorship and system accounts (and dozens more usernames).
  • [Network artifacts ] capture and routing data – examples: PCAPs, routing tables, sinkhole/blackhole exports demonstrating interception and redirection of traffic (plus numerous packet captures and routing entries).


Read more: https://dti.domaintools.com/inside-the-great-firewall-part-1-the-dump/