Combining Differential Privacy and Multi-Party Computation for Distributed Machine Learning by Bargav Jayaraman
Machine learning is extensively used for data analytics in various applications ranging from text and image analysis to medical research. We consider the scenario where multiple parties wish to collaboratively learn a machine learning model over their sensitive data sets, without revealing the data to other parties. In our approach, individual models are learned locally by the parties and then privately aggregated using secure multi-party computation. When learning is performed on sensitive data sets, it is often important to add noise to the models before they are released to preserve privacy. This privacy notion is referred to as differential privacy and is well-studied in literature. The proposed work combines differential privacy with secure multi-party computation to achieve private distributed machine learning. To achieve differential privacy for a certain class of machine learning algorithms, we first provide a theorem to bound their sensitivity. Next, we add differential privacy noise which is smaller than the noise required by the existing approaches to ensure differential privacy in the multi-party setting. This allows us to obtain more accurate models without compromising on the privacy. We validate this via experimental evaluation on real world datasets for regression and classification where we compare our method with the existing approaches for multi-party machine learning. The results show that our method generates models that are closer to the non-private models in terms of accuracy.
Committee Members:
David Evans (Advisor), Yanjun Qi (Committee Chair), Quanquan Gu, Mohammad Mahmoody