Public Datasets for Cybersecurity Research from Universities and Institutions
- KDD Cup 1999 Data – This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between
bad "connections", called intrusions or attacks, and
“good” normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html - The UNSW-NB15 dataset – UNSW, 2015: Free use of the UNSW-NB15 dataset for academic research purposes is hereby granted in perpetuity. Use for commercial purposes is strictly prohibited. Nour Moustafa and Jill Slay have asserted their rights under Copyright. The Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) for generating a hybrid of real modern normal activities and synthetic contemporary attack behaviours. Tcpdump tool is utilised to capture 100 GB of the raw traffic (e.g., Pcap files). This data set has nine types of attacks, namely, Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode and Worms. The Argus, Bro-IDS tools are used and twelve algorithms are developed to generate totally 49 features with the class label. Design a decent dataset for evaluating network anomaly detection system. https://researchdata.edu.au/unsw-nb15-dataset/1425943
- Intrusion Detection Evaluation Dataset (CIC-IDS2017) – CICIDS2017 dataset contains benign and the most up-to-date common attacks, which resembles the true real-world data (PCAPs). It also includes the results of the network traffic analysis using CICFlowMeter with labeled flows based on the time stamp, source, and destination IPs, source and destination ports, protocols and attack (CSV files). https://www.unb.ca/cic/datasets/ids-2017.html
- DDoS Evaluation Dataset (CIC-DDoS2019) – Reflection-based DDoS: Are those kinds of attacks in which the identity of the attacker remains hidden by utilizing legitimate third-party component. The packets are sent to reflector servers by attackers with source IP address set to target victim’s IP address to overwhelm the victim with response packets. These attacks can be carried out through application layer protocols using transport layer protocols, i.e. Transmission control protocol (TCP), User datagram protocol (UDP) or through a combination of both. As Figure 1 shows, in this category, TCP based attacks include MSSQL, SSDP while as UDP based attacks include CharGen, NTP and TFTP. There are certain attacks that can be carried out using either TCP or UDP like DNS, LDAP, NETBIOS, and SNMP. https://www.unb.ca/cic/datasets/ddos-2019.html
- List of Other Datasets for Devices – Canadian Institute for Cybersecurity datasets are used around the world by universities, private industry, and independent researchers. https://www.unb.ca/cic/datasets/index.html
- 2000 DARPA Intrusion Detection Scenario Specific Datasets – Attack scenario dataset to be created for DARPA as a part of this effort. It includes a distributed denial-of-service attack run by a novice attacker. Future versions of this and other example scenarios will contain more stealthy attack versions. https://www.ll.mit.edu/r-d/datasets/2000-darpa-intrusion-detection-scenario-specific-datasets