Distributed and Secure Sparse Machine Learning
Abstract: With the growth of the volume of data used for machine learning, and the availability of distributed computing resources, distributed machine learning has received increasing attention from machine learning researchers and practitioners. In the meantime, the extensive usage of private data in machine learning makes the privacy issue a major concern of participants of collaborative machine learning. This thesis proposes a communication-efficient algorithm for distributed sparse linear discriminant analysis (LDA), where each local machine gets a debiased estimator from local data, the central machine aggregates the debiased estimators and outputs a final sparsified estimator. At the core of the algorithm is a debiasing step by which the bias caused by regularizer is compensated. It is proved that, with much less communication cost, the aggregated estimator attains the same statistical rate with the centralized estimator, as long as the number of machines is chosen appropriately. Based on the distributed sparse LDA algorithm, we propose a secure multi-party sparse learning method, in which a secure multi-party computation (MPC) protocol was employed to aggregate the local models. The protocol ensures that the local model owned by each party would not be revealed to other parties while the correct aggregated model can still be obtained. Experiments on both synthetic and real world datasets corroborate the performance of the distributed sparse LDA algorithm and the efficiency of the secure multi-party sparse learning method.
Committee Members: Quanquan Gu, David Evans, Farzad Farnoud