About the Program
Real-world attack campaigns have become more sophisticated, coordinated, and destructive over time. We have seen attack campaigns, such as the WannaCry ransomware campaign, affect hundreds of thousands of computers. Also, Advanced Persistent Threat (APT) attacks proceed stealthily and have successfully gained footholds and remained undetected in victim organizations for months and sometimes years .
Today, inter-organizational cooperation and global coordination are used primarily for sharing threat intelligence about attacks after they have occurred. For example, when an attack is detected by one organization, Indicators of Compromise (IoCs) are disseminated to other organizations so that the latter can add corresponding entries to their firewalls and intrusion detection systems. However, this solution does not leverage global coordination to detect attacks, when in fact, such coordination could make it easier to detect new attack variants and zero-day vulnerabilities.
The objective of this project is to develop distributed algorithms to detect live zero-day attacks, as early as possible, through global analysis that leverages the power of big data, collected at multiple organizations. Our hypothesis is that such an inter-organizational globally coordinated effort will expose attacks within a short time frame when the attacks are still largely invisible to any single organization.
The fundamental research problem lies in detecting zero-day cyber attacks from anomalies in network traffic data and host logs, collected by multiple enterprises, in the face of two constraints: (i) privacy considerations that prevent a complete sharing of enterprise data with the global-analysis provider, and (ii) challenges in handling the large volume of data collected by multiple enterprises.
The novel features in our proposed P-CORE solution are: (i) Online (stream-mode) machine learning models for early detection of fast attacks; (ii) Generalized deep learning models that can detect new (previously-unseen) attacks when provided a broad set of features; (iii) Application of privacy-preserving federated deep neural network learning methods for global attack detection without requiring enterprises to send their data to the global repository; and (iv) Applications of emerging High-Order Network (HON) representations used in network science to the cybersecurity domain. To the best of our knowledge, we are the first to propose these 4 approaches for global attack detection.
The impact of our work will be two-fold: First, it will lead to a significant reduction in the costs of large-scale cyberattacks through early detection. For example, if a Distributed Denial-of-Service (DDoS) attack is detected early at distributed sources through global coordination, the attack packets can be dropped before reaching the intended victim at some distant enterprise. Second, we project a significant reduction in the enterprise cost of security services through a reduction in the rate of false positives. Each false positive identified by a tool needs manual processing by a security analyst, which adds to costs.
The key components of our technical approach are as follows:
Data Collection: Network traffic (pcap and NetFlow records) and host logs will be collected at two large enterprises, University of Virginia (UVA) and Virginia Tech (VT), each of which coincidentally has 38K students, staff and faculty, and network traffic will be collected by one regional Internet Service Provider (ISP), the Mid-Atlantic Research Infrastrucure Alliance (MARIA).
Algorithm & Code Development: (i) Offline Extract-Transform-Load (ETL) code will be developed to extract input features for the machine-learning models from collected data. (ii) Online (live) packet-processing code will be developed for extracting features from packets on short time intervals (e.g., 1 sec) to feed as input to the online machine-learning models. (iii) Temporal and topological, offline and online, federated machine-learning algorithms with local and global components, will be designed and implemented. (iv) Powerful genetic programming technology will be used to develop attack-simulation code that can “evolve” new variants of existing attacks and create new attacks for recently identified vulnerabilities.
Code Execution: The offline ETL code, online packet-processing code, and local components of the machine-learning models, will be executed on a continuous basis at UVA and VT to process the collected data. The global components of the machine-learning models will be executed at CCRi, emulating a centralized solution. The attack programs will be executed on national testbeds and isolated hosts within UVA and VT enterprises. These simulations will generate the attack traffic required for semi-supervised machine-learning.
Evaluation: A red-team/blue-team approach will be used in which the red attack team will run attacks unbeknownst to the blue attack-detection team. This approach will allow us the evaluate the accuracy, false-positive and false-negative rates of our P-CORE solution. In addition, two Tier-3 security analysts will conduct manual analysis to evaluate the performance of our P-CORE solution on real attacks that may occur during our tests. Finally, we will validate and measure the performance of our approach in detecting zero-day attacks by using a cross-attack validation methodology. This methodology adapts the cross-validation methodology commonly used to measure the generalization performance of statistical models to measuring generalization across attack types.
Documentation: We will write and publish research papers in leading conferences and journals, and also write software guides for all our TA3 components.
Our project schedule is as follows. In the two-year Phase 1, we will develop the TA3 components required to detect instances of three attack classes: DDoS, ransomware, and APT attacks. These three attack types were selected because, at their early stages, these attacks are not readily detectable locally but they do have a clear pattern at a global level, which makes them well suited for our TA3 project. We have planned an 8-month timeline for the tasks listed above. This timeline will be repeated three times, once for each attack type.
In Phases 2 and 3, the DARPA CHASE program specifies that TA3 components will be evaluated, and that feedback loops between TA3 and other TAs will occur. In Phase 1, we will closely follow, and if allowed, offer input, to the TA1, TA2 and TA4 teams so that the attacks selected and their components lend themselves well to extension for global detection. Then, in Phases 2 and 3, we plan to design algorithms for global analysis corresponding to the selected TA1 components. If needed, the local TA1 component developed in our phase 1 will also be used in phases 2 and 3. We have planned a 6-month timeline for each of 4 attack types for which TA3 components will be developed, tested and evaluated with UVA and VT data, as well as government provided TA5 data. We will participate in the quarterly integration and evaluation workshops.
We have assembled an excellent team consisting of senior personnel with expertise in Cybersecurity, Machine Learning, Networks, High-Performance Computing (HPC), bigdata analysis frameworks such as Spark and Hadoop, and Tier-3 security analysts. In addition, the team will consist of 3.5 software engineers, 1 postdoctoral fellow, and four graduate research assistants. The team spans four organizations, UVA, VT, Northeastern University (NE) and Commonwealth Computer Research, Inc (CCRi), which is located in the same city as UVA. We have a tightly integrated collaborative plan with detailed task dependencies worked out. We will meet every week via Google Hangouts as we have already been doing to develop this proposal. Several groups of team members have collaborated in the past, and have excellent working relationships.
The total cost of our project is $7,752,925, and the duration is 4 years.
♦ Data Aquisition and Anonymization
♦ Mechine Learning
♦ Attack Simulation
♦ Host Event Monitoring
♦ Tool 1
♦ Tool 2
♦ Tool 3
Virginia Tech (VT)
Northeastern University (NE)
Commonwealth Computer Research, Inc (CCRi)