False Positive Rate Reduced to 1.66% on WhoisXML API’s First Watch Malicious Domains Data Feed

False Positive Rate Reduced to 1.66% on WhoisXML API’s First Watch Malicious Domains Data Feed

WhoisXML API reduced the false positive rate of its First Watch Malicious Domains Data Feed from 3% to 1.66% by refining machine learning models, expanding training data, improving reputation signals, and addressing name-based biases. This improvement yields more accurate predictive threat intelligence and fewer interruptions for security teams. #FirstWatch #WhoisXMLAPI

Keypoints

  • WhoisXML API lowered the First Watch Malicious Domains Data Feed false positive rate from 3% to 1.66%.
  • Expanded training dataset by adding billions of data points, including legitimate sites and malicious examples from takedown records, abuse reports, and threat feeds.
  • Refined domain reputation signals to better weigh registrar and TLD credibility when classifying domains.
  • Reviewed and corrected name-based biases to avoid penalizing legitimate naming conventions.
  • Reduced false positives by 45% relative to the previous 3% rate, improving operational efficiency for security teams.
  • Positioned First Watch as a more precise predictive threat intelligence solution that provides earlier and cleaner signals.
  • Encourages users to download a sample file or contact WhoisXML API to learn more about the data feed.

MITRE Techniques

  • [T1583] Acquire Infrastructure – Use of domain registration and registrar/TLD reputation signals to inform detection decisions: ‘refined domain reputation signals’ that weigh registrars and TLDs to differentiate credible entities from sources associated with abuse.
  • [T1598] Phishing for Information (domain-based detection) – Detection improvements focused on identifying malicious domains used in attacks by expanding training data with malicious examples from takedown records and abuse reports: ‘includes … an extensive set of malicious examples sourced from domain takedown records, abuse reports, threat intelligence feeds’.
  • [T1609] Data Manipulation – Addressing name-based biases to prevent legitimate domains from being misclassified due to naming conventions: ‘major review to avoid name-based biases that could penalize legitimate domain naming conventions’.

Indicators of Compromise

  • [Data Feed / Domain Examples] context – malicious and legitimate domain examples used for training (not individually listed in article).
  • [Source Types] context – datasets referenced as IOC sources include domain takedown records and abuse reports (no specific domains or hashes provided).


Read more: https://circleid.com/posts/false-positive-rate-reduced-on-whoisxml-apis-first-watch-malicious-domains-data-feed