This article discusses the challenges and methodologies of creating machine-learning models for cyber-security using datasets like the CSE-CIC-IDS2018, highlighting issues found in the data and their corrections. It emphasizes the use of Azure, Databricks, and PySpark in processing this data and building models for detecting network-based anomalies. Affected: CSE-CIC-IDS2018 dataset, Cybersecurity sector
Keypoints :
- The Canadian Institute for Cybersecurity (CIC) provides datasets for machine learning in cybersecurity.
- CSE-CIC-IDS2018 dataset is used for detecting network anomalies, containing multiple attacks on victim machines.
- The CICFlowMeter tool is used for feature extraction from raw network traffic data.
- Issues were found in older datasets regarding data capture and labeling, affecting machine learning evaluations.
- Arevised version of the 2018 dataset, correcting earlier inaccuracies, has been made publicly available.
- Databricks on Azure is effectively utilized for processing and analyzing large datasets.
- FastAPI is suggested for deploying machine learning models as APIs for predictions.
Full Story: https://infosecwriteups.com/network-intrusion-analysis-at-scale-733169fc29ff?source=rssβ-7b722bfd1b8dβ4