Computer Science Location: Zoom (email presenter for link)
Add to Calendar 2022-01-28T10:00:00 2022-01-28T10:00:00 America/New_York Ph.D. Qualifying Exam Presentation by Hannah Chen Balanced Adversarial Training: Balancing Tradeoffs Between Oversensitivity and Undersensitivity in NLP Models   Abstract:   Zoom (email presenter for link)

Balanced Adversarial Training: Balancing Tradeoffs Between Oversensitivity and Undersensitivity in NLP Models

 

Abstract:  

Traditional (oversensitive) adversarial examples involve finding a small perturbation that does not change an input's true label but confuses the classifier into outputting a different prediction. Undersensitive adversarial examples are the opposite---the adversary's goal is to find a small perturbation that changes the true label of an input while preserving the classifier's prediction. Adversarial training and certified robust training have shown some effectiveness in improving the robustness of machine learnt models to oversensitive adversarial examples. However, recent work has shown that using these techniques to improve robustness for image classifiers may make a model more vulnerable to undersensitive adversarial examples.  We demonstrate the same phenomenon applies to NLP models, showing that training methods that improve robustness to synonym-based attacks (oversensitive adversarial examples) tend to increase a model's vulnerability to antonym-based attacks (undersensitive adversarial examples) for both natural language inference and paraphrase identification tasks. To counter this phenomenon, we introduce Balanced Adversarial Training which incorporates contrastive learning to increase robustness against both over- and undersensitive adversarial examples.

 

Committee: 

  • Matthew Dwyer, Chair, CS/SEAS/UVA
  • David Evans (Co-Advisor), CS/SEAS/UVA
  • Yangfeng Ji (Co-Advisor), CS/SEAS/UVA
  • Tom Fletcher, CS/SEAS/UVA