Event Actions
Group Fairness in Reinforcement Learning and Large Language Models
Abstract
This thesis addresses an important societal consideration in the application of Reinforcement Learning (RL): the equitable distribution of its benefits across different demographic groups. Specifically, we investigate how to incorporate group fairness into reinforcement learning algorithms to ensure that their societal impact is just and fair.
The thesis is organized around two key contributions to group fairness in RL. The first contribution focuses on multi-task group fairness in reinforcement learning. In many practical applications, such as recommender systems or fine-tuning large language models, a single policy is required to perform multiple tasks in real-world en- vironments. In this thesis, we introduce a multi-task fairness constraint and propose a novel algorithm for solving this problem based on constrained optimization. Through experiments in Mujoco, we demonstrate that our method better ensures group fairness compared to the previous approach that lacks this multi-task fairness constraint.
The second contribution studies group fairness in the context of fine-tuning large language models (LLMs) through Reinforcement Learning with Human Feedback (RLHF). Current approaches to address bias in LLMs largely concentrate on mitigating harmful language and often overlook group fairness considerations. In this work, we emphasize on demographic parity, a key group fairness definition that aligns with the broader fair machine learning research. In this work, we identify reward models as a potential source of bias in the RLHF process and propose a novel evaluation method based on arXiv meta-data for group fairness in reward models. Our experiment on fine-tuning the Phi-1.5 model further demonstrates that biases in reward models can propagate into the fine-tuned LLMs during RLHF training.
Committee
- Chen-Yu Wei, Committee Chair (CS/SEAS/UVA)
- Shangtong Zhang, Advisor (CS/SEAS/UVA)
- Yu Meng (CS/SEAS/UVA)