AI can unlearn what it has learned
A potential solution for the right to be forgotten regarding language models
TL;DR
This newsletter is about machine unlearning. It looks a potential implementation of this that could be used to comply with data protection law.
Here are the key takeaways:
Training data extraction attacks reveal the personal data contained in the web-scraped datasets that LLMs like ChatGPT are trained on.
Using personal data for training requires a suitable legal basis under data protection law. Developers often rely on legitimate interest.
Relying on this legal basis still entitles data subjects to the deletion of their data if they object to its use for model training. Developers could override this if they can argue compelling legitimate grounds for using the data, though EU law makes this difficult by prioritising privacy rights over economic or public interests.
Fulfilling deletion requests for LLMs by deleting the data from the training dataset and retraining the model is expensive and time-consuming. This makes this method for implementing deletion requests highly cumbersome.
In October 2023, computer scientists from the Georgia Institute of Technology and Stanford University published a paper with a potential solution. They call it Efficient Unlearning method for LLMs (EUL).
EUL involves the training of an unlearning layer, a subset of the parameters of the LLM, and training it to produce the same outputs as the original model without the deleted data. Learned unlearning layers are then fused into the LLM architecture ensuring that the model does not rely on the learnings from the deleted data when producing its outputs.
The researchers found EUL to be a more efficient way to implement multiple data deletion requests. It takes less training time than retraining the whole model and does not decrease the performance.
The problem
LLMs like ChatGPT trained on web-scraped datasets can memorise and reproduce verbatim personal data it has memorised during training when subject to training data extraction attacks (TDEAs).
This risk of privacy leakage in language models has been explored in research carried out by experts from Google DeepMind and other academics, some of which I have written about previously. This research has found that:
TDEAs against GPT-2 the reveal names, phone numbers, and social media accounts existing in its training data.
Such attacks even work against fine-tuned models like ChatGPT, which is shown to be capable of revealing several megabytes worth of data that is has been trained on.
The vulnerability to such attacks increases with the size of the model, the number of duplicates in the training dataset and the size of the context window.
From a data protection perspective, many developers justify the use of personal data for model training on the basis of legitimate interest. Such interests can be economic in nature, as recognised by the EU Charter of Fundamental Rights.1
But in relying on this basis, data subjects will have the right to object to the use of their data.2 In exercising this objection, their personal data can no longer be used and must be deleted.3
The exception to this is where the developer can argue "compelling legitimate grounds" that override the data subject's rights, freedoms and interests. However, as I have written previously, unless the data subjects are public figures, EU law will likely prioritise their privacy rights over the economic and/or public interest in using their data for training LLMs.
Accordingly, in many cases, developers need to create ways to fulfil data deletion requests. But this is not just limited to deleting personal data from the training dataset for the model.
This is because the model has already learned the data from its previous training, recorded in its weights. So to fulfil the deletion request fully, developers must get the model to essentially 'unlearn' the deleted data.
Doing this by retraining the model would be very cumbersome. Given the size of these models and the amount of compute needed to train them on large datasets, such retraining would be highly expensive and time-consuming, especially if multiple deletion requests are fulfilled this way.
Herein lies the problem with the right to be forgotten and LLMs. How can you get models to unlearn data that has been deleted from its training dataset in a practicable manner?
A possible solution
In October 2023, computer scientists from the Georgia Institute of Technology and Stanford University published a paper with a potential solution. They call it Efficient Unlearning method for LLMs (EUL).
As described in the paper, EUL provides a way to "efficiently unlearn what needs to be forgotten without completely retraining the whole model while retaining the performances of the models."4 There are two main parts to EUL:
Training an unlearning layer. This is about adding a layer to the model's transformer architecture that is trained to unlearn the data that needs to be removed. This layer therefore enacts the data deletion requests without needing to retrain the whole model.
Fusing multiple unlearning layers together. This is about combining several unlearning layers together within a LLM. Each unlearning layer therefore can represent a data deletion request.
These two elements to EUL enable it to update LLMs achieving the goal of unlearning the forgotten data without retraining the whole model and maintaining performance.
The unlearning layer
The process for creating the unlearning layer for EUL is as follows:
An unlearning layer is created by taking a copy of a subset of the parameters from the original LLM. The unlearning layer becomes the student and the original model becomes the teacher.
The aim is to use the teacher model to 'teach' the student model to forget the deleted data. This is done by teaching the student model to mimic predictions of the teacher model on the retained data while deviating from the predictions of the teacher model on the deleted data.
The difference between the retained data outputs and the forgotten data outputs is measured using what is called Kullback-Leibler (KL) divergence. In essence, the researchers taught the student model to minimize KL divergence on the retained data while maximizing the divergence on the forgotten data.
By training the unlearning layer is this way, the layer learns to maintain its learning from the data remaining in its training dataset whilst removing the learning from the data deleted from the training dataset. The key thing is that only the unlearning layer needs to be trained from scratch, while the bulk of the large pre-trained LLM is fixed.
Once trained, the unlearning layer can be added to the transformer architecture of the original model. When the model then processes new inputs, the unlearning layer will ensure an output that does not rely on the data it has learned to forget.
Fusing the unlearning layers
The goal of the fusing process is to take the trained unlearning layers and merge them together. By doing this, each data deletion request can be fulfilled by training a new unlearning layer and adding this to the fused unlearning layer in the original LLM.
The fused unlearning layer therefore applies the parameters of the constituent unlearning layers that it has been fused with. This fused unlearning layer can then be added to the original model to enact multiple data deletion requests.
Testing EUL
The researchers tested EUL on both classification and generation tasks. The generation tasks used the SAMSum dataset, which consists of conversations between different speakers.
EUL was tested using the T5 models, which are pre-trained, open-source language models. Both the T5-base and the T5-3B models were fine-tuned on the SAMSum dataset for testing.
The data to be forgotten by the model was randomly selected. For SAMSum, this consisted of the name of a speaker and their corresponding conversations in the dataset.
Unlearning layers were then trained to forget the data selected. The researchers tested the performance of the model on the generation task on the retained data and the forgotten data:
Higher performance on the retained data would indicate that the model is good at maintaining its learning for the data retained in its training dataset.
Lower performance on the forgotten data would indicate that the model is good at forgetting the data deleted from its training dataset.
Using this testing, the researchers were able to demonstrate the effectiveness of EUL:
[It] consistently achieves the best overall performances by effectively forgetting the requested data while remembering the retained data...with significantly less amount of training time. This indicates that our objectives could...be generalized to generation tasks.
[...]
The results demonstrate that our proposed fusion method that combines different unlearning layers could effectively handle the sequence of deletion (achieving higher accuracy on the test set and lower accuracy on the forgot set.) especially when the sequence length gets longer compared to baseline models.5
EUL demonstrates a possible method for fulfilling data deletion requests under the GDPR regarding LLMs, better ensuring data protection by design in their development.
Charter of Fundamental Rights of the European Union, Article 16.
Chen et al, ‘Unlearn What You Want to Forget: Efficient Unlearning for LLMs’ (2023), 2.
Chen et al, ‘Unlearn What You Want to Forget: Efficient Unlearning for LLMs’ (2023), 8.