Differentially Private Graph Analysis

Differentially Private Graph Analysis

  

Abstract: 

Today, data has become the next primary resource of humankind, especially in healthcare and public health.

Graphs or networks are commonly used data abstractions in many diverse applications, including social networks and public health.

There is a significant risk of sensitive information (e.g., disease, condition, and individual locations) being leaked through such applications.

My research focuses on developing privacy-preserving algorithms to solve fundamental problems arising in network science.

This proposal aims to explore two research threads: (1) theoretical foundations for community detection in graphs with privacy and (2) development of fast and scalable private algorithms for massive networks.

Different notions of community detection have been studied in graph mining.

One of the most popular methods involves identifying the most densely connected subgraph, and very efficient non-private algorithms are known for it.

My work led to the first sequential and parallel densest subgraph algorithms designed under edge differential privacy.

The main technical contribution is an iterative application of the Exponential mechanism to pick out low-degree nodes, with a sophisticated privacy analysis to overcome the privacy composition cost stemming from multiple applications of the mechanism.

Another popular approach is the community detection problem in Stochastic Block Models (SBM), which is one of the few settings where rigorous information-theoretic results are known for the recoverability of the ground-truth communities.

Though this topic has been extensively studied, solving it with differential privacy (DP) constraints remains an open question.

I design a privacy framework based on the Stability mechanism that has become the first successful attempt to solve the exact recovery problem for symmetric variants of SBM under DP.

I aim to utilize this method for extended and generalized variants of SBM, in which the recoverability for these models under DP has yet to be discovered.

In the second thread, my focus is on improving the efficiency and scalability of graph statistics, such as counts of small subgraphs on massive networks.

Methods based on global sensitivity have poor performance, and alternatives to global sensitivity, such as smooth sensitivity have been designed.

While these improve the accuracy of DP queries, computing them is computationally challenging for most statistics.

As a result, using these methods for massive networks becomes impractical.

To address this challenge, I introduce a notion of approximate smooth sensitivity for efficient DP queries and demonstrate its ability by speeding up the private triangle count calculations by orders of magnitude while maintaining comparable accuracy to other methods.

I propose using this framework to design efficient private algorithms to output other important subgraph statistics, such as counts of $k$-clique.

 

Committee:  

  • Ferdinando Fioretto, Committee Chair (CS/SEAS/UVA)
  • Anil Vullikanti, Advisor (CS, Biocomplexity/SEAS/UVA)
  • Madhav Marathe(CS, Biocomplexity /SEAS/UVA)
  • Dave Evans (CS/SEAS/UVA)
  • Sarit Kraus (CS, Bar-Ilan University)
  • Henning Mortveit (ESE, BII/SEAS/UVA)