The EU AI Act and sensitive data
One of the key ways that AI governance and data protection intersect
TL;DR
This newsletter is about how the AI Act and GDPR intersect regarding the use of sensitive data for AI development. It looks what is required under both pieces of legislation, the potential gaps in the legal framework and what steps developers could take regarding compliance.
Here are the key takeaways:
Under the GDPR, personal data cannot be processed without an appropriate legal basis, of which are provided under Article 6 of the Regulation. Additionally, sensitive personal data cannot be processed unless one of the exceptions listed under Article 9.2 apply.
The AI Act permits, under Article 10.5, the use of sensitive data for the purposes of bias mitigation regarding the development of AI systems. It states that providers may exceptionally process sensitive data âto the extent strictly necessary for the purpose of ensuring bias detection and correction in relation to the high-risk AI systems.â
Article 10.5 itself cannot be a legal basis for processing sensitive data for AI development. The provision is written in such a way that entertains the possibility of processing sensitive data for AI development, but only for the purpose of bias mitigation and only if its use meets certain other conditions both under the AI Act and the GDPR.
Steps that developers can take to help comply with the requirements under the AI Act and GDPR regarding the use of sensitive data for AI development include:
Determining whether sensitive data are needed for the development of the AI system
Documenting the justifications/explanations for using sensitive data in a record of processing operation
Applying data minimisation to the sensitive data
What is sensitive data under the GDPR?
Article 4.1 of the GDPR defines âpersonal dataâ as âany information relating to an identified or identifiable natural person.â This includes a range of different types of information, including names, ID numbers, location data or online identifiers.
Under the umbrella of personal data is what the GDPR calls âspecial categories dataâ. These are certain types of information that possess a higher degree of sensitivity. The GDPR, under Article 9.1, provides a specific list of personal data that is considered special categories. It includes:
Data relating to racial or ethnic origin
Data relating to political opinions, religious or philosophical beliefs, or trade union membership
Genetic data
Biometric data (for the purpose of uniquely identifying a natural person)
Data concerning health or data concerning a natural personâs sex life or sexual orientation
The list provided above is exhaustive, so only these types of data are considered sensitive under the GDPR. The legislation does not provide a general criteria which can be used to determine if other types of information could also be classed as sensitive or special categories data.
What are the GDPR requirements regarding sensitive data?
Under the GDPR, the use of sensitive data is generally prohibited. The beginning of the first paragraph Article 9 states that the âprocessing of [special categories data] shall be prohibited.â
However, this prohibition is not unconditional. Recital (51) states that there should be specific cases in which the processing of special categories data is permitted. Those cases, or exceptions to the prohibition, are listed under Article 9.2. Accordingly, the use of special categories data may take place if one of the following applies:
Explicit consent of the data subject
Legal obligation of the data controller
Vital interests of the data subject
Data processing by non-profit bodies
Personal data manifestly made public by the data subject
Legal claims
Substantial public interest
Medical purposes
Public health
Archiving in the public interest, scientific or historical research, or statistical purposes
Additionally, Recital (51) states that âthe general principles and other rules of [the GDPR] should apply, in particular as regards the conditions for lawful processing.â This therefore means that, as well as one of the Article 9.2 exceptions applying to the use of sensitive data, that use must also be underpinned by one of the relevant legal bases for the processing of all types of personal data under Article 6.1.
The Court of Justice of the European Union (CJEU) in Meta Platforms Inc and Others v Bundeskartellamt also made some further relevant stipulations regarding the use of sensitive data.
Firstly, if a dataset comprises of both sensitive and non-sensitive data (even if there is just one single sensitive data point), that whole dataset must be subject to the requirements for special categories data under the GDPR.1 This is unless, at the time of data collection, the sensitive data points are separated from the non-sensitive data points. If, however, all the data are collected en bloc (i.e., altogether with no means of separating), then Article 9 will apply to all the data.
Secondly from the Bundeskartellamt case, the requirements of Article 9 apply even if it was not the intention of the data controller to process special categories data. Article 9 therefore applies regardless of how the controller intended to process personal data, including whether the purpose was to surface information that might be considered sensitive under the GDPR.2
Furthermore, regarding AI systems specifically, the GDPR applies to the outputs of the system as well as its inputs pre- and post-deployment (i.e., training data and inference). If any of these include information falling in at least one of the categories listed under Article 9.1, then it will constitute special categories data.
So overall, to use sensitive data lawfully under the GDPR:
One of the legal bases under Article 6 must apply and one of the exceptions under Article 9.2 must also apply
This is the case regardless if just one data point in the dataset is sensitive data
It does not matter if the controller intended to process sensitive data or not
What does the AI Act say about using sensitive data?
Article 10 of the EU AI Act sets out the data governance measures required for the development and deployment of high-risk AI systems. The provisions in this Article reflect an important reality of AI development: good data combined with good data quality processes are essential for building well-functioning and high-quality AI systems. It avoids âgarbage in, garbage outâ.
Paragraph 5 of Article 10 relates to bias mitigation. This is important to âprotect the right of others from the discrimination that might result from the bias in AI systems.â3 It states that providers may exceptionally process sensitive data âto the extent strictly necessary for the purpose of ensuring bias detection and correction in relation to the high-risk AI systems.â In doing so, the AI Act imposes the following conditions:
It must not be possible to conduct the bias detection and correction using other data, including synthetic or anonymised data
The use of sensitive data must be subject to technical limitations on the re-use of the personal data, and state-of-the-art security and privacy-preserving measures, including pseudonymisation
The sensitive data must be subject to measures to ensure that it is secured, protected, subject to suitable safeguards, including strict controls and documentation of the access, to avoid misuse and ensure that only authorised persons have access to the data with appropriate confidentiality obligations
The sensitive data cannot be transmitted, transferred or otherwise accessed by other parties
The sensitive data must be deleted once the bias has been corrected or the personal data has reached the end of its retention period, whichever comes first
A record of processing operation must be created that includes the reasons why the processing of special categories of personal data was strictly necessary to detect and correct biases, and why that objective could not be achieved by processing other data
Processing sensitive data for bias mitigation must also comply with the other relevant requirements under the GDPR.4 This includes the requirement for one of the exceptions under Article 9.2 to apply and for one of the legal bases under Article 6.1 to apply.
Regarding the exceptions under Article 9.2, Recital (70) of the AI Act states that, for the purpose of ensuring bias detection and correction in AI systems, providers of such systems should âbe able to process also special categories of personal data, as a matter of substantial public interest within the meaning of [Article 9.2(g) of the GDPR].â
But there is a problem...
When developing an AI system, developers may use certain datasets to to detect and mitigate bias. In particular, âcollecting special categories of data (such as ethnicity data) is often useful, or even necessary, to audit AI systems for discrimination.â5 Sensitive data might be used for fine-tuning LLMs to minimise biases it may have picked up during pre-training. See my previous article on how Google tried to do this for Gemini (with interesting results):
But in this context, Article 10.5 of the AI Act is a strange provision. Read together with Recital (70), the provision seems to be providing a basis for the processing of sensitive data for bias mitigation in AI systems. Article 10.5 itself states that providers of AI systems may use sensitive data for this purpose, and Recital (70) seems to indicate that such activity constitute processing that is necessary for reasons of substantial public interest (one of the exceptions under Article 9.2 of the GDPR).
Article 10.5 itself, however, cannot be a legal basis for processing sensitive data for AI development. It is written in such a way that entertains the possibility of processing sensitive data for AI development, but only for the purpose of bias mitigation and only if its use meets certain other conditions both under the AI Act and the GDPR.
The substantial public interest exception under Article 9.2(g) of the GDPR has three elements to it:
The processing must be for reasons of substantial public interest. Recital (46) of the GDPR mentions processing necessary for âhumanitarian purposes, including for monitoring epidemics and their spread or in situations of humanitarian emergencies, in particular in situations of natural and man-made disasters.â Is bias mitigation in AI systems also another example of a substantial public interest? According to Recital (70) of the AI Act it could be.
It must be provided for by EU or national law. This straightforwardly means that the public interest pursued must exist under a legal framework. In this case, the Article 10.5 of the AI Act could be cited as the relevant legislation.
The law is proportionate, respects the right to data protection and provides suitable and specific measures to safeguard the rights and interests of data subjects. Article 10.5 does specify the conditions under which the processing of sensitive data for bias mitigation can take place, which are in addition to the requirements also imposed by the GDPR.
Furthermore, even if these elements can be satisfied, use of sensitive data still also requires an applicable legal basis under Article 6.1 GDPR. In this regard,
Recital (63) states that the AI Act itself does not provide a legal basis for the processing of personal data for the purposes of an AI system unless this is specifically stated in the provisions of the Regulation. Regarding bias mitigation specifically, the AI Act is silent on the legal basis for the processing personal data.
So what should AI system developers do?
There three things that AI system providers could do in light of what is provided in the AI Act and the GDPR regarding sensitive data:
Determine whether sensitive data are needed for the development of the AI system. If special categories of data can be avoided, then navigating the provisions of the AI Act and GDPR can also be avoided.
Document the justifications/explanations for using sensitive data in a record of processing operation (ROPA). As per Article 30, data controllers are required to maintain records containing details about the personal data they process. If providers are therefore using personal data to develop their AI systems, they will also be required to maintain such records, which includes the purpose of the processing. Further information regarding the legal bases relied on, as well as the relevant exception under Article 9.2 for processing sensitive data, should be included here too, as per Article 10.5(f) of the AI Act.
Apply data minimisation. If sensitive data is used for AI development, providers should try to limit the data to that which is absolutely necessary for the development process. This principle is already implicitly mentioned in Article 10.5(b) which refers to the use of pseudonymisation.
Case C-252/21, Meta Platforms Inc and Others v Bundeskartellamt (4 July 2023), para. 89.
Case C-252/21, Meta Platforms Inc and Others v Bundeskartellamt (4 July 2023), para. 69.
EU AI Act, Recital (70).
EU AI Act, Article 10.5.
van Bekkum et al, âUsing sensitive data to prevent discrimination by artificial intelligence: Does the GDPR need a new exception?â (Computer Law and Security Review, Vol 48, April 2023).