On security and privacy implications of modifying machine learning datasets
Abstract:
Machine learning has been widely applied to many security and privacy sensitive tasks. Datasets, as the raw material of the machine learning working flow, determine the properties of learned models. In this proposal, we aim to understand security and privacy implications of modifications made to training datasets of machine learning algorithms. Specifically, we focus on the following. 1. Implications of malicious modifications to the dataset (also called poisoning attacks) on the security of the machine learning models: How powerful are such poisoning attacks, and which hypothesis classes are still learnable under poisoning attacks. 2. Implications of benign modifications such as machine unlearning on the privacy of the data records: Does machine unlearning lead to (more) data leakage that can be exploited by the adversary? For both of these fronts, we plan to include both theoretical analysis and empirical results.
Committee:
- David Evans, Committee Chair, (CS/SEAS/UVA)
- Mohammad Mahmoody, Advisor, (CS/SEAS/UVA)
- Haifeng Xu (CS/SEAS/UVA)
- Jundong Li (ECE/SEAS/UVA)
- Michael D. Porter (ESE/SEAS/UVA)
- Amin Karbasi (EECS/SEAS/Yale)