Sensitivity Auditing for Trustworthy Large Language Models

Sensitivity Auditing for Trustworthy Large Language Models

 

Abstract: 

Large language models (LLMs) have demonstrated remarkable capabilities on various benchmark tests, yet they are susceptible to small changes in prompts and frequently exhibit unexpected failures. This poses potential risks in high-stakes applications and prompts better evaluation schemes. We seek to improve current LLM evaluation through sensitivity auditing that analyzes how changes in the inputs affect model outputs and whether the observed input-output changes are consistent with system requirements. Our initial studies demonstrate discrepancies between evaluation metrics and underlying goals can render misleading results about models, and over-optimizing a single metric may indeed hurt the model performance. We conduct a preliminary analysis of model bias sensitivity to explicit and implicit gender features and show the benefits of applying sensitivity analysis in bias evaluation. Moving forward, we plan to examine existing benchmarks' limitations and improve their efficacy in reflecting model capabilities and performance in downstream applications. Then, we plan to design systematic tests to assess model reliability and sensitivity to features at varied granularity in black-box and white-box settings. The outcome of the proposed audits would lay the groundwork for developing mitigation measures to ensure reliable deployment.

Committee:  

  • Matthew Dwyer, Committee Chair (CS/SEAS/UVA)
  • David Evans, Advisor (CS/SEAS/UVA)
  • Yangfeng Ji, Co-Advisor (CS/SEAS/UVA)
  • Jundong Li (CS, ECE/SEAS, SDS/UVA )
  • Ferdinando Fioretto (CS/SEAS/UVA)
  • Mona Sloane (SDS/UVA)
  • Thomas Nachbar (Law/UVA)