Poisoning attacks on Online Learning to Rank
Currently, Online Learning to Rank (OL2R) methods have been popularly studied in information retrieval. The key idea is to estimate a ranker directly based on the interaction with users. Dueling Bandit Gradient Descent (DBGD) is one of the most popularly studied OL2R methods, which is deeply rooted in online convex optimization. However, little is known about the robustness of such algorithms in scenarios when a malicious attacker tries to change the user’s clicks in such a way that the final learned ranker deviates away from what the user intended it to be.
In this work, we describe the real-world setting(s) on how a malicious attacker can attack the algorithm and the data that is available to the attacker. We then propose novel attacking strategies that the attacker can use on DBGD to divert the ranker towards the attacker’s goal. The attacks and the corresponding deviation have been subsequently analyzed to show the robustness of DBGD when such attacks do happen. The results show that DBGD is not robust against the proposed attacks. To the best of our knowledge, we are the first to propose attacks on an OL2R algorithm, show the empirical results and analyze them.
- Dave Evans, Chair
- Hongning Wang, Advisor
- Haifeng Xu