Computer Science Location: Thornton Hall A120
Add to Calendar 2022-10-14T11:00:00 2022-10-14T11:00:00 America/New_York CS Distinguished Speaker: Dr. Tianxi Li, UVA Department of Statistics Statistical Tools for Analyzing Noisy Network Data Abstract:  Thornton Hall A120

Statistical Tools for Analyzing Noisy Network Data


Network data are increasingly common in the modern analysis of complex datasets. While statistical and machine learning models and methods are intensively studied for network data, networks' noisy and unstructured nature significantly limits the applicability of many of them in practice. We will focus on rectifying these challenges in a few specific scenarios. First, we introduce a family of prediction models with nonparametric network effects. The method does not assume that the network structure is exactly observed and can be provably robust to network perturbations. Asymptotic inference framework is established, and the robustness of this method is studied in the specific setting when the errors come from random network models. The analysis of a large-scale middle school educational program demonstrates the importance of the claimed robustness. In the second part, we consider how to extract informative structures from large networks. The noise and bias introduced by non-informative components in complex networks can obscure the salient structure and limit many network modeling procedures' effectiveness. We introduce a novel core-periphery model for the non-informative periphery structure of networks without imposing a specific form for the informative core structure. A spectral algorithm for the core extraction method is proposed for general downstream network analysis tasks based on the model. The algorithm enjoys a strong theoretical guarantee and is scalable for large networks. We evaluate the proposed method by extensive simulation studies demonstrating various advantages over many traditional core-periphery methods. The method is applied to extract the informative core structure from a citation network and give more informative results in the downstream hierarchical community detection. Lastly, we will briefly introduce an ensemble learning strategy to adaptively model random networks as an enhancement for both of the previous two problems. Several real-world examples will be discussed throughout the talk, including analyzing educational workshop impacts based on a large-scale middle school student survey, establishing interpretable hierarchical community relations from a citation network between statisticians, and predicting links in hundreds of networks across multiple domains.

About the Speaker: 

Dr. Tianxi Li is an assistant professor of statistics at the University of Virginia. Before joining UVA, he obtained his Ph.D. from the University of Michigan. His main research interests include complex network analysis and high-dimensional statistical learning.

For future talks, please visit the CS Distinguished Speaker Series webpage.