Computer Science Location: Rice 504

#### Fast and Scalable Joint Estimators for Learning Sparse Gaussian Graphical Models from Heterogeneous Data

Abstract: Estimating multiple sparse Gaussian Graphical Models (multi-sGGMs) jointly from heterogeneous data is an important problem, often arising in bioinformatics and neuroimaging. The resulting tools can effectively help researchers translate aggregated data from multiple contexts into knowledge of multiple connectivity graphs. Most current studies of multi-sGGMs, however, involve expensive and difficult non-smooth optimizations, making them difficulty to scale up to many dimensions (large $p$) and/or with many contexts (large $K$).

The proposed research aims to design a novel category of estimators that can achieve fast and scalable joint structure estimation of multiple sGGMs in large-scale settings. There exist three possible formulations of multi-sGGMs. Targeting each, this research work introduces methods that are both computationally efficient and theoretical guaranteed.  In details, (1) To estimate one sGGM per context and push all learned graphs towards a common pattern. For this formulation, we propose the estimator FASJEM and solve it in an entry-wise manner that is parallelizable. The entry-wise solution improves the computational efficiency and reduces the memory requirement from $O(Kp^2)$ to $O(K)$. (2) To only estimate the changes in the dependency graphs when estimating two sGGMs. We propose the estimator DIFFEE and obtain a closed-form solution. DIFFEE reduces its entire computational cost to  $O(p^3)$, enabling the estimator to a much larger $p$ compared to the state-of-the-art estimators. (3) To learn both the shared and the context-specific sub-graphs explicitly. We propose a novel weighted-$\ell_1$ formulation WSIMULE and its faster variant that elegantly incorporate a flexible prior, along with a parallelizable formulation. Lastly, we propose to conduct rigorous statistical analysis to verify that the proposed estimators can achieve the same statistical convergence rates as the state-of-art methods that are much more difficult to compute.