Neural Model Interpretability for Natural Language Processing
Abstract: Neural network models have achieved remarkable performance in natural language processing (NLP) due to their capacity of representation learning. Despite the impressive performance, the lack of interpretability of neural networks has raised much concern on the trustworthiness and reliability of their predictions. Without a good understanding on model prediction behavior, it has been challenging to debug a model, find its weaknesses, mitigate biases and unfairness, and avoid unexpected failures. Therefore, the applications of black-box models have been hindered, especially in high-stakes scenarios, such as health care and criminal justice.
In this thesis, I focus on the interpretability of neural language models for NLP tasks.
Specifically, I propose to improve model interpretability for building trustworthy and reliable NLP systems. My research is composed of three threads: 1) developing interpretation methods to explain model prediction behavior in NLP; 2) improving the inherent interpretability of neural language models; 3) leveraging interpretations to debug and build better models in terms of robustness and fairness. The proposed research is expected to benefit NLP and AI developers from better understanding and interpreting neural networks, hence building trustworthy, reliable and robust models.
Committee:
- Aidong Zhang, Committee Chair, (CS/SEAS/UVA)
- Yangfeng Ji, Advisor, (CS/SEAS/UVA)
- Haifeng Xu (CS/SEAS/UVA)
- Cong Shen (ECE/SEAS/UVA)
- Alexander Rush (CS/Cornell University)