New statistical approach classifies and generates more accurate predictions from big data

Data analytics sits at the center of modern data-driven technology. The World Economic Forum estimates that by 2020, the accumulated universe of data will reach 44 zettabytes (a zettabyte is equivalent to a trillion gigabytes). Business analytics software provider DOMO puts it another way: By 2020, there will be 40 times more bytes of data than stars in the sky.

This volume of data is good news for predictive analytics, which uses new and historical data to forecast activity, behavior and trends. However, the sheer volume and the inherent variability of data complicate forecasting and statistical analysis.

Magda Amiridi, a Ph.D. student in the University of Virginia Charles L. Brown Department of Electrical and Computer Engineering, is working on a new statistical approach to classify and generate more accurate predictions from big data, with applications ranging from skin cancer detection to drug efficacy prediction.

Health screenings and genetic testing generate thousands of different diagnostic variables that provide information about a person’s current and future health. “Our work focuses on building probability models to better predict a target response,” Amiridi said. A closely related goal is to automatically choose the most informative predictors from a diagnostic point of view, so that doctors can focus on the most important indicators—those that best predict a health risk or benefit.

Amiridi developed a hierarchical model that “zooms in” on those parts of the data distribution that are most relevant for prediction. Her approach has the potential to help doctors and other practitioners make statistically better decisions. For example, this approach can differentiate between benign and cancerous moles in skin imaging with higher accuracy than other methods. 

Amiridi’s research earned her the best student paper award at the Institute of Electrical and Electronics Engineers June 2019 Data Science Workshop. Amiridi’s academic advisor, Nikolaos Sidiropoulos, Louis T. Rader Professor and chair of the department and an IEEE Fellow, and Nikolaos Kargas, who is finishing his Ph.D. at the University of Minnesota, also contributed to the paper, “Statistical Learning Using Hierarchical Modeling of Probability Tensors.”

Amiridi is working on information-theoretic feature selection methods that leverage her hierarchical probability models. “I appreciate simple and elegant ideas that have practical applications,” Amiridi said. She is aiming to apply her algorithms to a variety of real-world health datasets, and ultimately to help improve patient outcomes through her data analytics work.