Publications
Publications
- 2023
In-Context Unlearning: Language Models as Few Shot Unlearners
By: Martin Pawelczyk, Seth Neel and Himabindu Lakkaraju
Abstract
Machine unlearning, the study of efficiently removing the impact of specific training points on the
trained model, has garnered increased attention of late, driven by the need to comply with privacy
regulations like the Right to be Forgotten. Although unlearning is particularly relevant for LLMs in
light of the copyright issues they raise, achieving precise unlearning is computationally infeasible
for very large models. To this end, recent work has proposed several algorithms which approximate
the removal of training data without retraining the model. These algorithms crucially rely on access
to the model parameters in order to update them, an assumption that may not hold in practice due
to computational constraints or when the LLM is accessed via API. In this work, we propose a new
class of unlearning methods for LLMs we call “In-Context Unlearning”, providing inputs in context
and without having to update model parameters. To unlearn a particular training instance, we provide
the instance alongside a flipped label and additional correctly labelled instances which are prepended
as inputs to the LLM at inference time. Our experimental results demonstrate that these contexts
effectively remove specific information from the training set while maintaining performance levels
that are competitive with (or in some cases exceed) state-of-the-art unlearning methods that require
access to the LLM parameters.
Keywords
Citation
Pawelczyk, Martin, Seth Neel, and Himabindu Lakkaraju. "In-Context Unlearning: Language Models as Few Shot Unlearners." Working Paper, October 2023.