WhoisXML API reduced the false positive rate of its First Watch Malicious Domains Data Feed from 3% to 1.66% by refining machine learning models, expanding training data, improving reputation signals, and addressing name-based biases. This improvement yields more accurate predictive threat intelligence and fewer interruptions for security teams. #FirstWatch #WhoisXMLAPI
Keypoints
- WhoisXML API lowered the First Watch Malicious Domains Data Feed false positive rate from 3% to 1.66%.
- Expanded training dataset by adding billions of data points, including legitimate sites and malicious examples from takedown records, abuse reports, and threat feeds.
- Refined domain reputation signals to better weigh registrar and TLD credibility when classifying domains.
- Reviewed and corrected name-based biases to avoid penalizing legitimate naming conventions.
- Reduced false positives by 45% relative to the previous 3% rate, improving operational efficiency for security teams.
- Positioned First Watch as a more precise predictive threat intelligence solution that provides earlier and cleaner signals.
- Encourages users to download a sample file or contact WhoisXML API to learn more about the data feed.
MITRE Techniques
- [T1583] Acquire Infrastructure – Use of domain registration and registrar/TLD reputation signals to inform detection decisions: ‘refined domain reputation signals’ that weigh registrars and TLDs to differentiate credible entities from sources associated with abuse.
- [T1598] Phishing for Information (domain-based detection) – Detection improvements focused on identifying malicious domains used in attacks by expanding training data with malicious examples from takedown records and abuse reports: ‘includes … an extensive set of malicious examples sourced from domain takedown records, abuse reports, threat intelligence feeds’.
- [T1609] Data Manipulation – Addressing name-based biases to prevent legitimate domains from being misclassified due to naming conventions: ‘major review to avoid name-based biases that could penalize legitimate domain naming conventions’.
Indicators of Compromise
- [Data Feed / Domain Examples] context – malicious and legitimate domain examples used for training (not individually listed in article).
- [Source Types] context – datasets referenced as IOC sources include domain takedown records and abuse reports (no specific domains or hashes provided).