The EU's code of practice for general purpose AI models
Some interesting highlights from the first draft
What is the code of practice?
In November 2023, the European AI Office published the first draft of the code of practice for providers of general-purpose AI models under the EU AI Act.
Under Article 56 of the AI Act, the AI Office must encourage and facilitate "the drawing up of codes of practice...in order to contribute to the proper application of this Regulation, taking into account international approaches." This includes codes of practice that providers of general-purpose models can rely on to comply with their obligations under the legislation.1
This first draft of the code, which will go through four drafting rounds in total, was based on contributions from model providers. It forms the as "a foundation for further detailing and refinement."
The code focuses on "key considerations for providers of general-purpose AI models and for providers of general-purpose AI models with systemic risk." [fn, p.1] Systemic risk has a specific definition under the AI Act:
The Act identifies two types of general-purpose models: models with systemic risk and models without systemic risk.
So what is systemic risk? This is how it is defined under Article 3.65 of the Act:
...a risk that is specific to the high-impact capabilities of general-purpose AI models, having a significant impact on the Union market due to their reach, or due to actual or reasonably foreseeable negative effects on public health, safety, public security, fundamental rights, or the society as a whole, that can be propagated at scale across the value chain.
Accordingly, under Article 51.1, a general-purpose AI model is classified as a model with systemic risk if it meets at least one of the following conditions:
It has high impact capabilities (Article 3.64 defines this as "capabilities that match or exceed the capabilities recorded in the most advanced general-purpose AI models").
A decision of the European Commission determines a model to have high impact capabilities.
Simply put, a general-purpose AI model with systemic risk is a model with high-impact capabilities that could have a negative effect on public health, safety or security, fundamental rights or society as a whole and propagated at scale across the value chain.
The AI Office aims to have the final draft of the code completed in May 2025, with the provisions applicable to general-purpose AI models coming into force in August of that year.
This current draft does not include the level of granularity that the final draft will possess. Nevertheless, the draft code is structured in a series of measures, sub-measures and KPIs related to the various obligations for model providers.
Some notable highlights from the first draft
Under Measure 2, the code provides a table covering all the information that model providers must include in their documentation as specified in Annex IX of the Act. Among these items includes information about computational resources:
Signatories should detail the computational resources (e.g. the number of type of hardware units needed to train and do inference with the general-purpose AI model, the duration of the training process, the number of FLOPs) used to train and do inference with the model...2
Such a provision would cover so-called "reasoning" models like OpenAI's o1. Such models use more compute at inference time given that they are trained to apply a series of reasoning steps when generating responses to prompts.
However, as I covered previously on the implications this inference-time compute has for the applicability of the AI Act's provisions for models with systemic risk:
If the inference scaling law introduced by o1 holds true, then the presumption that higher training compute equates to higher risk is significantly weakened.
Instead, inference compute may be what correlates with risk, as OpenAI hints at in its System Card. The more compute used at inference, the more time the models spends 'reasoning' and therefore the better its performance, which in turn could increase the risk of the model exhibiting dangerous behaviour.
That is not to say that models like o1 would definitely be exempt from the more onerous obligations for general-purpose models with systemic risk under the AI Act. Even if the compute used for training falls below 10 FLOPs, other factors can be taken into account for the risk categorisation, including the benchmarks and evaluations of the model's capabilities.
Nevertheless, the advent of o1 highlights the problem with regulation that is too specific to the (former) state of the art, therefore limiting its ability to be future proof.
Measure 6 of the code covers different types of systemic risks that developers should consider in their risk assessments. It includes the following risks:
The automated use of models for AI research and development (the fourth bullet point) is interesting. I suspect this refers to the use of AI models to create even better AI models, a phenomenon where advanced technology leads to even more advanced technology.
This is something that Azeem Azhar explains in his book Exponential. He references the argument made by Ray Kurzweil that there is a positive feedback loop to be observed in technological development whereby improved computer chips allow for computers that can process more data, in turn enabling us to discover ways of building even better chips and building even better computers.
Accordingly:
...this process is constantly accelerating: the returns of each generation of technology layer on top of the previous generation's, and even feed into one another.3
This also connects to the idea of intelligence explosions, where sufficiently intelligent machines create other intelligent machines to achieve its goals. If this were to occur, and we did not have sufficient control over this process, researchers like Stuart Russell and Nick Bostrom reckon that this would constitute an existential risk for humanity.
Measure 6 also lists the various sources of systemic risks. This includes 'the potential to remove guardrails'.4 A good example of this is the inadvertent reversing of the safety fine-tuning carried out by developers when open LLMs are further fine-tuned by other users, which I wrote about previously.
The architecture of these models are so vast and complex that it is incredibly difficult, perhaps even impossible, to really understand how they are supposed to work. And if you cannot understand how these models work, how can you possibly ensure that they behave as you intended?
Finally, there is Measure 7, which details the requirements for 'safety and security frameworks' (SSFs). These are documents which:
...detail the risk management policies they adhere to in order to proactively assess and proportionately mitigate systemic risks from their general-purpose AI models with systemic risk.5
One interesting item that must be included in these SSFs is risk forecasts:
Signatories will include in their SSF best effort estimates of timelines for when they expect to develop a model that triggers the systemic risk indicators...6
Both OpenAI and Anthropic have responsible scaling policies in place which assume that the scale of the model predicts risk. The bigger the model, the higher the capabilities, and therefore the greater the risk it presents.
But such an approach only works as long as the scaling law remains true. And recently there have been some indications that this 'bigger is better' approach is losing its effectiveness, as
of The Algorithmic Bridge notes:The blind trust OpenAI and competitors like Google, Anthropic, or Meta put on the scaling laws—if you increase size, data, and compute you’ll get a better model—was unjustified. And how could it be otherwise! Scale was never a law of nature like gravity or evolution, but an observation of what was working at the time—just like Moore’s law, which today rests in peace, outmatched by the impenetrability of quantum mechanics and the geopolitical forces that menace Taiwan.
AI companies had no way of knowing when the scaling laws (or scaling hypothesis as it was once appropriately called) would break apart. It seems the time is now. Making GPT-like language models larger or training them with more powerful computers won’t suffice.
EU AI Act, Articles 53.4 and 55.2.
EU AI Office, First Draft of the General-Purpose AI Code of Practice (November 2024), p.12.
Azeem Azhar, Exponential: How Accelerating Technology Is Leaving Us Behind and What to Do About It (Penguin Random House 2021), p.30.
EU AI Office, First Draft of the General-Purpose AI Code of Practice (November 2024), p.19.
EU AI Office, First Draft of the General-Purpose AI Code of Practice (November 2024), p.21.
EU AI Office, First Draft of the General-Purpose AI Code of Practice (November 2024), p.23.