Title: Interactive Online Learning with Incomplete Knowledge
Interactive online learning is vital in modern information service systems. It explores the unknowns by sequentially collect individual user’s feedback to evaluate the quality of interactions while monitoring changes in their values. Despite the recent progress in online learning, many challenges, which are brought by the complex practical scenarios of online learning, remain unsolved. The challenges can be summarized from the following three perspectives. First, from the user-user interaction perspective: to capture user heterogeneity, personalized online learning is needed, while on the other hand, the existence of user dependency calls for collaborative online learning across users such that the learning process can be accelerated through information propagation. Second, from users’ temporal behavior perspective: information service systems are highly dynamic, which means that users’ preferences may change over time due to various internal or external factors , and item popularity may vary due to fast emerging events/contents. Third, from the user-system interaction perspective: learning to interact with users and discover their preferences from repeated interactions is central in most information service systems and web applications. Instead of passively waiting for users’ feedback, proactive information acquisition should be encouraged. In addition, user feedback in online systems is implicit. For example, user clicks can be biased and incomplete due to position bias. The proposed research aims to develop online learning algorithms, and more specifically multi-armed bandit algorithms, to conquer the aforementioned challenges. Firstly, I propose to study online learning solutions to sequentially estimate the information need of users in a collaborative manner, which enables information sharing across users. Secondly, I propose to study bandit learning in a more realistic non-stationary environment such that the learning algorithm can automatically detect the potential changes and adapt its decision making strategy accordingly. Finally, I propose to improve user-system interaction by proactively choosing the most representative users or information to initiate or incentivize the interaction for the most beneficial feedback, which further improves the system’s utility in the long run. By combining the proposed research, an information service system can provide right information to the right user at the right time, and improve users satisfaction in the long run. More importantly, the proposed solutions can be applied to a wide spectrum of applications including not only the aforementioned information service systems, but also crowdsourcing, human-machine interactions in cyber physical system.
Hongning Wang (Advisor), David Evans (Chair), Lihong Li (Google Inc.), Quanquan Gu, Denis Nekipelov (Minor Representative)