Published: 
By  Karen Walker

When you are driving down the highway and the check-engine light appears, you appreciate the time to find a service station before you are stranded on the roadside. Those riding along the information superhighway need their own version of the check-engine light, especially when the security of the network is at stake, so they can stop an attack rather than paying for repairs after the damage is done. Malathi Veeraraghavan, University of Virginia professor of electrical and computer engineering, leads a research group of cyber hunters dedicated to providing early warning tools that protect networks while preserving data privacy. First, the group collects multiple organizations' metadata from network traffic and files automatically created and maintained by servers, called host logs. They then remove all personally identifiable information from the data. Finally, the data is analyzed with machine learning techniques capable of detecting unusual activity, called anomalies, that are the first sign of an intrusion. A quick and long succession of failed log-in attempts is one type of anomaly that would trigger a warning light. Anomalies are placed in rank order based on relevance. Anomalies that indicate a high-impact malicious attack rank highest; anomalies correlated with false positives are placed further down the list. Network managers and information security officers can then use these findings to take actions such as configuring firewall settings to drop network traffic from malicious hosts. The scale of their solution distinguishes their research approach. Think of an algorithm as a sequence of instructions. Veeraraghavan's team develops distributed algorithms, which means that the different instructions can be run all at the same time, on separate processors, with limited information about what the other parts of the algorithm are doing. This approach allows the team to detect zero-day attacks in real time through global analysis that uses big data collected at multiple organizations. “Our hypothesis is that such an inter-organizational, globally coordinated effort will expose attacks within a short time frame, even as the threat remains largely invisible to any single organization,” Veeraraghavan said. Over the first year of the project, supported by a $7.6 million grant awarded in fall 2018 by the Defense Advanced Research Projects Agency Cyber-Hunting at Scale Program, the team has surpassed a number of challenges involved in handling the large volume of collected data and the relative sparsity of labeled data required for supervised machine learning methods. In addition to Veeraraghavan, the cyber hunting team is comprised of UVA computer science professor Jack Davidson, professor Donald Brown of the Department of Engineering Systems and Environment and the School of Data Science, and UVA information security engineer Jeffrey Collyer, and collaborators from Virginia Tech, Northeastern University and CCRi, a Charlottesville, Va., provider of analytical development services. In July 2019 DARPA augmented the grant by $1.48 million, which allowed Veeraraghavan's team to purchase and operate a high-performance cluster to handle the computing and storage needs of this large quantity of data. “Preserving data integrity and privacy is paramount,” Veeraraghavan said. The team applies privacy-preserving deep neural network learning methods for global attack detection. The methods employ a set of algorithms modeled loosely after the human brain that are designed to recognize hidden patterns in big data. Because the models perform well with data where it lives, this approach avoids asking enterprises to expose and transfer their data to a global repository. The team recently celebrated several achievements. First, Alastair Nottingham, a UVA research scientist in electrical and computer engineering, streamlined collection of multiple types of network and host logs (on the order of hundreds of gigabytes per day) and developed a resource-efficient, cryptographically based process to remove all traces of private and personally identifiable information from collected data. The process preserves patterns and features in the data that may be leveraged by machine-learning algorithms to hunt for threats. UVA Computer Science Department research engineer Molly Buchanan led a team of graduate and undergraduate students over the summer to extract meaningful data properties or features and run machine learning algorithms to find attacks common to data collected by UVA and Virginia Tech. The team's next challenges are to quantify the ability of their attack detection methods to reduce false positive rates and to extend their algorithms to consider different types of data such as network traffic and host logs simultaneously. This project demonstrates the high value of network traffic data for cyber defense. Communal defense against ever-changing cyberattacks will grow stronger if increasing numbers of universities and other organizations work together on data-collection and analysis.