Computer Science Location: Zoom (email presenter for link)
Add to Calendar 2021-04-16T11:00:00 2021-04-16T11:00:00 America/New_York M.S. Thesis Defense Presentation by Sanchit Sinha Perturbing Inputs for Fragile Interpretations in Deep Natural Language Processing   Abstract: Zoom (email presenter for link)

Perturbing Inputs for Fragile Interpretations in Deep Natural Language Processing

 

Abstract:

Interpretability methods like Integrated Gradients, LIME, etc. are popular choices for explaining NLP model predictions with word importance scores. These interpretations need to be robust for trustworthy applications of NLP in high-stake areas like medicine or finance. Our work demonstrates that an interpretation itself can be fragile even when the predictions are robust. By performing simple and minor word perturbations in an input text, Integrated Gradients and LIME generate a substantially different explanation, yet the generated input achieves the same prediction label as the seed input. Due to only a few word-level swaps, the perturbed input is semantically and spatially similar to its seed input (therefore, interpretations should have been similar too). Empirically, we observe that the average rank-order correlation between a seed input’s interpretation and perturbed inputs’ drops by over 20% when less than 10% of words are perturbed. Further, correlation keeps decreasing as more words are gradually perturbed. We demonstrate that this on 4 different text classification datasets namely - SST-2, AG-News, IMDb, and Yelp across 2 different models DistilBERT and RoBERTa.

 

Committee:

  • Yangfeng Ji, Chair
  • Yanjun Qi, Advisor
  • Vicente Ordóñez Román