🦠 AI-Based Malware Detector
Machine learning-powered malware detection system that analyses PCAP network captures to identify suspicious traffic patterns and flag potential malware communication. Built with Python and ML classification models.
Project Overview
Traditional signature-based antivirus solutions often fail to detect zero-day malware and highly sophisticated polymorphic threats. This project addresses this vulnerability by shifting the detection mechanism from static file signatures to dynamic network behaviors.
By feeding raw `.pcap` files into the system, the AI Malware Detector extracts and normalizes network flow telemetry (such as packet sizes, inter-arrival times, protocol ratios, and TLS handshake metrics) and uses machine learning classification models to distinguish between benign user traffic and malicious Command & Control (C2) beaconing.
Technical Implementation
The core data pipeline is written in Python using `scapy` and `tshark` bindings to shred PCAP files into structural flow datasets. Pandas and Scikit-learn are used to curate the feature matrix and train multiple classifier algorithms, including Random Forest, Support Vector Machines (SVM), and Gradient Boosted Models (XGBoost).
The model was trained on a synthesized dataset containing millions of packets from the Stratosphere IPS dataset of real-world botnet traffic combined with regular enterprise network captures. The resulting ensemble model achieves a high F1-score with an exceptionally low false positive rate, making it viable as a supplementary SOC detection layer.
Key Features / Findings
- Parses complex PCAP captures into standardized CSV datasets automatically.
- Ensemble Machine Learning approach comparing RF, SVM, and XGBoost performance.
- Focuses on behavioral network anomalies rather than static file hashes.
- Provides a modular Python CLI interface for easy integration into existing SOC pipelines.