Publications
Publications
- 2024
Quantifying Uncertainty in Natural Language Explanations of Large Language Models
By: Himabindu Lakkaraju, Sree Harsha Tanneru and Chirag Agarwal
Abstract
Large Language Models (LLMs) are increasingly used as powerful tools for several
high-stakes natural language processing (NLP) applications. Recent prompting
works claim to elicit intermediate reasoning steps and key tokens that serve as
proxy explanations for LLM predictions. However, there is no certainty whether
these explanations are reliable and reflect the LLM’s behavior. In this work, we
make one of the first attempts at quantifying the uncertainty in explanations of
LLMs. To this end, we propose two novel metrics — Verbalized Uncertainty and
Probing Uncertainty — to quantify the uncertainty of generated explanations.
While verbalized uncertainty involves prompting the LLM to express its confidence
in its explanations, probing uncertainty leverages sample and model perturbations
as a means to quantify the uncertainty. Our empirical analysis of benchmark
datasets reveals that verbalized uncertainty is not a reliable estimate of explanation
confidence. Further, we show that the probing uncertainty estimates are correlated
with the faithfulness of an explanation, with lower uncertainty corresponding
to explanations with higher faithfulness. Our study provides insights into the
challenges and opportunities of quantifying uncertainty in LLM explanations,
contributing to the broader discussion of the trustworthiness of foundation models.
Keywords
Citation
Lakkaraju, Himabindu, Sree Harsha Tanneru, and Chirag Agarwal. "Quantifying Uncertainty in Natural Language Explanations of Large Language Models." Paper presented at the Society for Artificial Intelligence and Statistics, 2024.