Inside the Great Firewall Part 2: Technical Infrastructure

Inside the Great Firewall Part 2: Technical Infrastructure

The leaked 500GB Great Firewall dataset details a modular, nationwide DPI and orchestration ecosystem—centered on a Traffic Secure Gateway (TSG) platform, MAAT, Gohangout, Redis telemetry, JA3 and SNI fingerprinting, and vendor-supplied DPI hardware—used to detect, predict, and disrupt circumvention tools like Psiphon, V2Ray, and Shadowsocks while integrating telemetry into broader surveillance and social-credit systems. The files reveal centralized command queues, regional enforcement nodes, sinkholing/BGP hijacks, protocol-deviation quarantine and active probing, and evidence of misclassification/false positives affecting cloud providers and legitimate services. #Psiphon #V2Ray #Shadowsocks

Keypoints

  • The core GFW stack revealed includes a Traffic Secure Gateway (TSG) DPI platform, centralized hubs like the YGN Center, dashboards such as Cyber Narrator, and telemetry/aggregation tools MAAT and Gohangout.
  • Fingerprinting and classification use JA3 TLS fingerprints, SNI filtering, DNS manipulation, TLS/HTTP header inspection, and behavioral baselining to detect encrypted circumvention tools (Psiphon, V2Ray, Shadowsocks).
  • Command-and-control is tiered and decentralized: central policy hubs push updates to regional enforcement nodes via queued synchronization, allowing targeted, time-sensitive rule deployment and rollback.
  • Active countermeasures include TCP RST injections, sinkholing, BGP prefix injection/hijacks, active probing, traffic replay, and quarantine routes for protocol deviations.
  • Extensive vendor integration (e.g., A Hamson, Venustech, Topsec, Huaxin) supplies DPI blades, firmware, cryptographic modules, and orchestration interfaces under state supervision.
  • Behavioral prediction engines and real-time session profiling (CPU/memory/port/TLS metrics) assign risk scores to preemptively flag or terminate suspected circumvention sessions.
  • Telemetry and identifier fusion (UUIDs, IMEI/IMSI hashes, partial SSO tokens) link network-level detections into broader surveillance and social-credit systems, enabling escalation from technical flags to administrative consequences.

MITRE Techniques

  • [T1040] Network Sniffing – DPI modules perform deep packet inspection to extract HTTP headers and TLS handshakes for classification: ‘DPI modules process TCP streams in real-time to extract HTTP headers, inspect TLS handshakes, and apply keyword filtering.’
  • [T1070] Indicator Removal on Host (log management) – MAAT and Gohangout aggregate and process logs with telemetry and snapshot exports for classification and retention: ‘MAAT acts as a central log aggregator and decision engine, ingesting stream data to feed classification engines.’
  • [T1020] Automated Exfiltration (data aggregation) – Telemetry and identifier fusion feed centralized repositories linking UUIDs/IMEI hashes to session logs: ‘firewall telemetry… feeds into centralized repositories where it is correlated with endpoint identity, system behavior, application telemetry, and even social profiling signals.’
  • [T1499] Endpoint Denial of Service – The system injects TCP RSTs, sinkholes, and BGP hijacks to disrupt sessions: ‘sessions may be hijacked or redirected via sinkholes and TCP reset injections.’
  • [T1598] Compromise Infrastructure (vendor-assisted) – Vendors provide custom firmware/DPI modules enabling centralized censorship capabilities: ‘vendors supply the routers, DPI cards, cryptographic modules, firmware updates, and orchestration platforms that allow the GFW to adapt…’
  • [T1204] User Execution (phishing/flagging via rules) – Rules and automated blacklist updates target application endpoints and VPNs based on telemetry and heuristics: ‘Blacklists identify VPN exit nodes, encrypted tunnel endpoints, and known circumvention platforms like Psiphon or V2Ray… updates to these lists are driven by anomaly detection from the logs.’
  • [T1595] Active Scanning – Active probing and replay tests are used to confirm protocol deviation and probe suspected VPN exit nodes: ‘the system captures…supporting active countermeasure deployment through automated probe and reset mechanisms’ and ‘timed replay payloads and outbound test probes using crafted TLS or DNS packets.’
  • [T1071] Application Layer Protocol (use of TLS/HTTP) – Classification leverages TLS handshake (JA3) and SNI strings to identify applications: ‘presence of advanced JA3 and SNI fingerprinting… demonstrates the GFW’s ability to identify encrypted channels.’
  • [T1486] Data Encrypted for Impact (TLS interception) – SSL/TLS interception and application-layer proxying are used to inspect encrypted traffic: ‘a modular, exportable DPI platform capable of application-layer proxying, SSL/TLS interception, and centralized policy enforcement.’

Indicators of Compromise

  • [File names ] dataset context – examples from dump: Network Research Report.docx, MAAT Regularization Test.docx (many operational docs and configuration files)
  • [Tool names ] detection context – Psiphon, V2Ray, Shadowsocks (flagged as circumvention platforms in logs and fingerprints)
  • [IPv6 subnets ] inspection context – listed IPv6 blocks like ‘境内谷歌IPv6地址段’ used for targeted isolation of Google services (and other IPv6 segments referenced)
  • [Telemetry fields/identifiers ] identity correlation context – UUIDs, IMEI/IMSI hashes, partial SSO tokens used to tie sessions to endpoints
  • [Logs/Artifacts ] operational context – firewall.sd.maat.status.txt, sd-redis-cli-info.txt, gohangout session logs (showing status messages, Redis stats, and regex extraction outputs)


Read more: https://dti.domaintools.com/inside-the-great-firewall-part-2-technical-infrastructure/