The GDPR and AI exceptionalism
Some initial reflections on the (leaked) proposed changes to the EU GDPR

So the European Commission is exploring ways to simplify several different EU laws as part of its âOmnibusâ reform package, and one of the drafts for this was leaked last week.
There are some interesting things to note from this leak. The part that stood out most to me was that pertaining to AI.

It should be clear by now that AI development and deployment will very often trigger data protection implications. And this fact has been addressed a few times by the European Data Protection Board (EDPB), which comprises of all the EU data protection regulators:
In May 2024, it published a report containing its preliminary views on the data protection issues related to OpenAIâs ChatGPT. You can read my commentary on this here.
In December 2024, the Board adopted its opinion on certain data protection aspects related to the processing of personal data in the context of AI modelsâ. You can also find my commentary on this here.
At the same time, the EU has been contemplating the potential stagnating effect that its web of regulations is having on innovation and growth in Europe.
In the area of tech law and policy, the EU has passed/brought into force plenty of legislation in recent years. The GDPR came into force in 2018, the Digital Services Act and the Digital Markets Act were passed in 2022, and the AI Act was passed in 2023. That is a lot of hefty and impactful regulation coming into existence over a 5-year period. The infamous Draghi report suggests that such legislation forms as regulatory barriers that are particularly onerous on young companies in the tech sector.1
The European Commission in particular appears to be giving increasingly more credence to the idea that the EU is too much of a âlawyerly societyâ obsessed with the process of building things rather than actually building this. And this is the political backdrop to its Omnibus package.
In the leaked draft, the Commission has proposed several different amendments to the GDPR as well as other pieces of legislation. Among those amendments include the legal basis for the processing of personal data in the context of AI development and deployment.
There two main thoughts that I want to address after reading this part of the draft:
The can of worms that the proposal potentially opens up
Whether the proposal is realistic about the legal basis that can be used for processing personal data for AI development and deployment
The can of worms
Letâs start with the AI-related draft recitals, which supplement the draft provisions.
Draft recital (27) acknowledges that the models underlying AI systems, including LLMs, ârely on data, including personal data, in various phases in the AI lifecycle, such as the training, testing and validation phase.â This certainly makes sense.
But then that sentence carries on to say this:
...and may in some instances be retained in the AI system or the AI model.2
This is the can of worms. The idea that AI models like LLMs âcontainâ or âretainâ personal data after training is, letâs say, debatable. I wrote an extensive piece on this last year.
To sum up that piece:
Based on the learnings garnered from its training data, LLMs store in its wegiths correlations between fragments of words (i.e., tokens) in a numerical form
Ordinarily, this information cannot be linked back to individuals, and therefore such information may not be regarded as personal data
However, if the model comes across certain data in its training data frequently enough, it can create strong correlations between the tokens that make up that data
Those strong correlations may be learned and retained by the model such that the correct prompt can be used to extract this data verbatim
LLMs may therefore be regarded as storing personal data that it has âmemorisedâ in this way
Such personal data could constitute pseudonomized data when stored in the model, and then becomes readable when the correct prompt is used to generate it from the model
The draft recital suggests that AI systems or models may retain personal data. If so, the consequences could get messy, in particular for downstream modifiers of AI models. If such models are considered to be retaining personal data after training, and organisations take those models to incorporate them into their own systems, then this leads to several crucial data protection questions, including:
Who are the controllers? Is the downstream modifier a controller of the retained personal data, or does it become a joint controller with the model provider? Or perhaps there is a controller-processor relationship?
What is the legal basis under Article 6 GDPR that can be relied on for the processing of that personal data? Can it be legitimate interest?
What if the retained personal data constitutes special categories data? Which exception under Article 9(2) GDPR applies here?3
What are the expectations regarding transparency and data subject rights requests? Perhaps this depends on whether there is joint controllership or at least a controller-processor relationship?
These are questions that are left unanswered in the Commissionâs draft, and it seems that there are still diverging views among data protection regulators on whether AI models or systems âcontainâ personal data after training. So on this, there is still no real clarity.
Legitimate interest for AI
This is where the accusations of âAI exceptionalismâ come into play.
Draft recital (27), in acknowledging that AI development and deployment may involve the processing of personal data, suggests the appropriate legal basis for this processing; legitimate interest. But it also states:
...this does not affect the obligation of the controller to ensure that the development or use (deployment) of AI in a specific context or for specific purposes complies with other Union or national law, or to ensure compliance where its use is explicitly prohibited by law. It also does not affect its obligation to ensure that all other conditions of Article 6(1)(f) of Regulation (EU) 2016/679 as well as all other requirements and principles of that Regulation are met.
Accordingly, the Commissionâs proposal includes new provision Article 88c, which provides that legitimate interest can be used for the processing of data that is ânecessary for the interests of the controller in the context of the development and operation of an AI system...or an AI model.â This is so long as that interest is not overridden by the âinterests, or fundamental rights and freedom of the data subject.â Additionally, controllers also need to implement appropriate safeguards to ensure, among other things:
Adherence to data minimisation in relation to data sources and model/system training and testing
Protect against disclosure of residually retained data in the model/system
Enhanced transparency to data subjects
The right to object to the collection of personal data can be exercised by data subjects
Furthermore, draft recital (28) suggests some legitimate interests that could apply in this context, of which may be âbeneficial for the data subject and society at large.â These include:
Detecting and removing bias, protecting data subjects from non-discrimination
Ensuring accurate and safe outputs for a beneficial use, such as to improve accessibility to certain services
Interestingly, Article 88c nor draft recital (28) addresses what I think is the elephant in the room, which is whether the large-scale scraping of data from the internet, which is common practice for the development of general-purpose foundation models, can be justified by a legitimate interest.
The EDPB seems to open to the possibility of legitimate interest being an appropriate basis here. As suggested in both its ChatGPT report and its opinion on data processing for AI models, so long as the conditions for legitimate interest can be met, model developers can rely on it as a basis for processing.
But recall what those conditions for legitimate interest are:
It needs to be shown that the controller or a third party is pursuing a legitimate interest. This means that the controller must inform data subjects of the legitimate interests being pursued at the time that their data are collected.4
The processing of personal data must be necessary to pursue that legitimate interest. This requires proof that âthe legitimate data processing interests pursued cannot reasonably be achieved just as effectively by other means less restrictive of the fundamental rights and freedoms of data subjects.â5
The legitimate interest being pursued, and the data processing it entails, must not take precedence over the interests or fundamental freedoms and rights of the data subjects. This means that the rights of the data subjects and the interests of the controller must be balanced, taking into account the relevant context of the processing.6
In practice, I think model developers will struggle to meet all three of these conditions. As I wrote previously on the EDPBâs opinion on data processing for AI models:
the issues that will have the most impact on the legality of legitimate interest as a basis for web scraping for AI development will be on the necessity and balancing tests.
On the necessity test, web scraped datasets are quite likely to contain personal data. Given the broad definition of personal data and the fact that using datasets like CommonCrawl have become fairly standard in the development of foundation models, the idea of frontier model developers training models with datasets that do not contain any personal data seems highly unlikely. But obtaining such data via web scraping may not be considered necessary if such datasets could be acquired via other means. Data licensing agreements for example could provide developers with higher quality, though lower quantity, datasets that may be preferable to indiscriminate scraped datasets. This could weaken the case for web-scraped datasets being necessary for developing AI models.
Additionally, those building on top of foundation models may have an easier time reducing the amount of personal data contained in their datasets for AI engineering or at least implementing appropriate mitigation measures. Those in the application layer of the AI eco-system will be prioritising quantity over quality; foundation models will already possess very general capabilities after being trained on masses of text data from the internet, and those building on top of these models need a much smaller amount of data to tailor the model for a particular task or domain.
These dynamics for frontier and app developers will also hold true for the balancing test requirement. The biggest issue with web scraping is that (a) it is usually done at a large scale and therefore consumes a lot of data and (b) the data subjects whose data are collected will not know about this before, during or after it has been carried out. So from a data protection perspective, this method of data acquisition is highly controversial.
However, if the GDPR is amended to include a provision that explicitly acknowledges that legitimate interest may be an appropriate legal basis, it gives model developers a much clearer path towards justifying their data practices for model development.
Even so, and what new Article 88c as written would not change, is that the onus of demonstrating compliance with the relevant conditions for legitimate interest remains on the model developers. Simply declaring legitimate interest as a legal basis for data processing in a privacy policy would insufficient. Developers will need to conduct an assessment to show that the interests, rights and freedoms of data subjects have not been overridden due to the implementation of appropriate safeguards.
On the other hand, what new Article 88c does change is the âtech neutralâ approach of Article 6 GDPR. That provision, as it currently stands, applies regardless of the technology used to process personal data or exactly how data are processed; so long as personal data is being used, it needs to be justified by one of the bases under Article 6.
But Article 88c would alter this approach by indicating how one those bases, namely legitimate interest, could be used for data processing in the context of AI development and deployment.
This is a criticism that has been made by NOYB, a European non-profit organisation focused on information privacy (and co-founded by Max Schrems):
So far Article 6(1) is âtech neutralâ. Here a specific technology (AI) is for the first time (somehow) legitimized, this may also mean that processing is only legal, because AI is used â while it would otherwise not fall under Article 6(1) GDPR. The provision could impair the tech neutrality of the GDPR and imply that new technologies also raise debate and need specific provision.
This âAI exceptionalismâ flows from the premise of the Commissionâs Omnibus, which is that EU legislation is too cumbersome and therefore needs to change. But the ramifications could be interesting indeed.
This point is also made in new provision Article 88c.
Perhaps new Article 9(2)(k) in the Commissionâs draft proposal somewhat saves the day here.
Case C-252/21, Meta Platforms Inc and Others v Bundeskartellamt (4 July 2023), para. 107.
Case C-252/21, Meta Platforms Inc and Others v Bundeskartellamt (4 July 2023), para. 108.
Case C-252/21, Meta Platforms Inc and Others v Bundeskartellamt (4 July 2023), para. 110.





