Computer Science Location: Zoom (email presenter for link)
Add to Calendar 2023-11-21T12:00:00 2023-11-21T12:00:00 America/New_York Ph.D. Qualifying Exam Presentation by Galen Harrison Evaluating Privacy through Synthetic Data   Abstract: Zoom (email presenter for link)

Evaluating Privacy through Synthetic Data



Data synthesis has long been held out as a mechanism for enabling data sharing and independent analysis without privacy restrictions. Both empirical and theoretical work has found that one cannot simultaneously generate accurate data while also maintaining reasonable differential privacy guarantees. This prior work has focused on the synthetic data's accuracy - that is how close the synthetic data is to the real data. In this paper, we consider the problem of analyzing privacy-accuracy tradeoffs using synthetic data. In particular, we are interested in whether for $\epsilon$-differentially private mechanism $\mathcal{M}$, is the reduction in accuracy when we apply $\mathcal{M}$ to real data similar to the reduction in accuracy when we apply $\mathcal{M}$ to synthetic data? Unlike prior analyses of synthetic data, synthetic data as a privacy benchmark does not necessarily require closeness to real data. Through an empirical analysis of several differentially private data synthesis mechanisms across diverse population data sets, we find that these mechanisms do not provide data that can be used as a good benchmark for privacy-preserving mechanisms. Specifically, these mechanisms seem to produce data which has too much randomness, and are not able to capture a sufficient amount of dependencies to make a rich attack surface.



  • Anil Vullikanti, Committee Chair (CS, Biocomplexity/SEAS/UVA)
  • Madhav Marathe, Advisor (CS, Biocomplexity /SEAS/UVA)
  • Tianhao Wang (CS/SEAS/UVA)
  • David Evans (CS/SEAS/UVA)