Streamlining HTTP Flooding Attack Detection through Incremental Feature Selection

Streamlining HTTP Flooding Attack Detection through Incremental Feature Selection

This paper proposes INFS-MICC, an incremental feature selection method designed to detect HTTP flooding attacks on web applications by identifying the most relevant and independent features in near-real time. The method efficiently updates the feature set as new data arrives, improving detection accuracy while saving computational resources. #HTTPFlooding #INFSMICC #ExtremeGradientBoosting

Keypoints

  • HTTP flooding attacks exploit legitimate-looking HTTP requests to overwhelm web servers, making detection difficult due to their similarity to normal traffic.
  • The paper introduces INFS-MICC, a model that incrementally selects highly relevant and independent features using mutual information and correlation measures.
  • INFS-MICC processes new data batches without retraining from scratch, enhancing efficiency and scalability in dynamic environments.
  • Three well-known HTTP flooding attack datasets (HTTP Flood, UNSW, CICIDS) were used to test the method’s effectiveness.
  • Extreme Gradient Boosting and Gradient Boosting classifiers paired with INFS-MICC achieved high accuracy (up to 99.9%) with very few features.
  • Using recursive feature elimination, the method identifies the smallest optimal feature subset that maintains strong detection performance.
  • Compared to other popular feature selection techniques, INFS-MICC showed comparable or superior F1-scores in detecting HTTP flooding attacks.

What is this about?
This research focuses on detecting HTTP flooding attacks, a type of cyberattack that sends many fake web requests to overwhelm servers. It proposes a smart way to pick the best features (important pieces of information) from data that keep changing over time, making real-time detection faster and more accurate.

What problem does it solve?
Detecting HTTP flooding attacks is hard because attack requests look very similar to normal web requests, and constantly retraining detection systems with new data is slow and resource-heavy. This paper solves the problem of efficiently updating the detection system’s important features without starting over every time new network data arrives.

What’s the idea?
Imagine you have a huge list of puzzle pieces (features), but only some of them fit the picture perfectly without repeating what others show. INFS-MICC works like a smart sorter that keeps the best fitting pieces and remembers them even when new pieces arrive, so you never need to re-sort the whole box from scratch.

How does it work?
The method combines two main ideas: mutual information, which measures how strongly a feature relates to detecting attacks, and correlation, which helps avoid picking features that tell the same story. It first cleans the data, then ranks features by relevance and removes redundant ones. When new data comes in, it updates the feature ranks incrementally, combining old and new information, and uses a process called recursive feature elimination to find the smallest efficient set of features to detect attacks.

What did they find?
Testing on three big datasets, the method was able to pick a small number of features that yielded very high detection accuracy—up to 99.9% with just 2 or 3 features in some cases. The Extreme Gradient Boosting classifier often gave the best results. The method performed as well as or better than several existing feature selection methods while saving time by updating incrementally.

Why is this important?
This research teaches how to handle large, continuously arriving cybersecurity data without wasting resources on full retraining. Security teams can detect HTTP flooding attacks faster and more accurately using fewer data points, which means better protection for websites with less cost and complexity.

In short (summary)
The paper introduces an advanced feature selection method, INFS-MICC, that improves detection of tricky HTTP flooding attacks by smartly and efficiently updating key data features as new network information arrives. This approach helps cybersecurity defenders quickly identify attacks with high accuracy while reducing computational effort, making it a practical solution for keeping web services safe.

The content featured on this site is sourced from arXiv.org, a free distribution service and open-access archive hosting over 2.4 million scholarly articles across a wide range of disciplines. This collection specifically highlights articles focused on cybersecurity, particularly topics relevant to threat intelligence and Security Operations Center (SOC) work.

Please note that materials on arXiv are not peer-reviewed, and are shared as preprints by the authors to foster early dissemination and feedback within the academic and professional community. I recommend using arXiv papers as a starting point for exploration and research, not as definitive sources. Always evaluate findings critically, and whenever possible, cross-check with peer-reviewed publications or operational validation.


Read more: https://arxiv.org/html/2505.17077v1