Analyzing the Leaky Cauldron: Defending Machine Learning against Inference Attacks
While machine learning has had wide-scale adoption in industrial applications, there are many privacy concerns when the applications deal with sensitive user data. Recent works have shown that machine learning models leak sensitive information about training data. Differential privacy limits such unintentional information disclosure by adding randomized noise during model training. This has encouraged a slew of works on privacy preserving machine learning that achieve some trade-off between privacy and utility. In our preliminary work, we experimentally evaluated this trade-off via inference attacks and show that there is significant privacy leakage in the scenarios where the models achieve high accuracy. We further improve the state-of-the-art on membership inference attack in our subsequent work, and show that membership inference attacks pose threat even in realistic scenarios with skewed priors that were not explored before. More concretely, our attacks are able to correctly identify the membership of the most vulnerable records with close to 100% confidence across multiple real-world data sets in the settings where previous attacks failed. In this proposal, we plan to further explore the cause of this vulnerability and come up with better defenses against such attacks. We also plan to study attribute inference attacks with the same vigour as membership inference attacks. The final outcome of this dissertation is a broader understanding of privacy risk of performing machine learning on sensitive data and practical ways to reduce the privacy risk.
- Yanjun Qi, Committee Chair (Department of Computer Science, SEAS, UVA)
- David Evans, Advisor (Department of Computer Science, SEAS, UVA)
- Mohammad Mahmoody (Department of Computer Science, SEAS, UVA)
- Denis Nekipelov (Department of Economics, CGSAS, UVA)
- Quanquan Gu (Department of Computer Science, UCLA)