On 7 August 2025, OpenAI released its long-awaited GPT-5. It was made available to all users of ChatGPT, free and paid, and was described by OpenAI as its "best AI system yet."
In this post I cover three initial thoughts on GPT-5:
Maybe AGI cannot be achieved through scaling after all
Maybe instead of AGI we get diffusion
Maybe this is the start of the AI bubble popping
Scaling may have hit a wall
Sam Altman said this at the start of 2025:
We are now confident we know how to build AGI as we have traditionally understood it.
How have OpenAI sought to build AGI? Scaling.
As I wrote previously, much of OpenAI's strategy has been oriented around increasing three main aspects of their AI models: their size, their training datasets and their compute consumption. Doing so seemed to lead to predictable improvements in model capabilities.
This bet was made soon after OpenAI's 'discovery' of scaling laws in 2020, eventually leading to the release of ChatGPT just a few years later.1 In fact, this product sparked a whole AI race in which several companies with access to the necessary resources engaged in the development of AI systems powered by this belief in scaling.
The development of OpenAI's GPT models have therefore been driven by scaling laws. Accordingly, GPT-5 was supposed to be its best yet.
And there had been plenty of anticipation and hype built up prior to its release, perpetuated even by Altman himself. He claimed that GPT-5 would be smarter than him and would be a "crazy high IQ tool." Its capabilities would be better "across the board" and show that we've hit "the intelligence threshold."
So is GPT-5 AGI? Well, not really.
On the day of GPT-5's release, Altman merely said that it is "a significant step along the path to AGI." So no, we are not at AGI, though apparently we should not expect this until 2027 anyway.
But was GPT-5 impressive nevertheless? Did it improve across the board? Did it hit the intelligence threshold (whatever this means)?
If you watched the hour and 20 minute launch video, you might think yes. Although you might have questioned things if you paid close enough attention to some of the material shared during the presentation.
Indeed, GPT-5 does show great improvements across a number of metrics. As shown by LMArena:
However, GPT-5 is not the big leap that previous GPT models perhaps were. Across different benchmarks, the model makes incremental improvements on its predecessors. And on some measures it loses out to models from other developers. For instance, on ARC-AGI and ARC-AGI-2 (benchmarks attempting to measure how well a model can learn and solve new problems it has never seen before), it falls behind xAI's Grok 4.
What this might suggest is that the returns on scaling alone are diminishing. As suggested by Helon Toner back in June:
It does seem actually pretty clear that the improvements we're getting from just scaling pre-training—that initial phase of learning to imitate text or other kinds of data—it does seem like the value of continuing to scale that is slowing a little bit. It's also at a point where continuing to scale it means spending hundreds and hundreds of millions of dollars. So if the value is not worth hundreds and hundreds of millions of dollars, it's going to maybe be difficult to keep moving up that curve.
If this true, then OpenAI's strategy may need to shift away from being heavily reliant on scaling laws. To maintain its leadership AI development, maybe it needs to focus more on productising its models rather than making them much smarter.
Diffusion continues
I have already written about how I see the future of AI being mainly about how to turn models into systems that can function in real-world use cases. We are already seeing evidence of models being deployed across an increasing number of domains, and this could continue if the consensus after the release of GPT-5 is that models will not get much smarter than they are now.
In fact, OpenAI is not marketing GPT-5 itself as a model. It is presenting it as a system with the capability of utilising the 'right model' for a given task:
GPT‑5 is a unified system with a smart, efficient model that answers most questions, a deeper reasoning model (GPT‑5 thinking) for harder problems, and a real‑time router that quickly decides which to use based on conversation type, complexity, tool needs, and your explicit intent (for example, if you say “think hard about this” in the prompt). The router is continuously trained on real signals, including when users switch models, preference rates for responses, and measured correctness, improving over time. Once usage limits are reached, a mini version of each model handles remaining queries. In the near future, we plan to integrate these capabilities into a single model.
Accordingly, OpenAI has intended for GPT-5 to be "more useful for real-world queries" by "reducing hallucinations, improving instruction following, and minimizing sycophancy." Such efforts may be much appreciated given how difficult LLMs have been difficult to work with in practice.
In fact, improving the practical workability may be what determines the fate of LLMs. If they fail to be reliable enough in the real-world, then diffusion probably does not happen.
And if LLMs do prove useful enough for downstream applications, then it will not just be the EU AI Act provisions on general-purpose AI models, along with the code of practice applicable to developers like OpenAI, that will be important. How the Act also applies to those building on top of these models will be key to keep on eye on too.
Does the hype finally slow down?
Cf. from A normie's guide to AI hype:
We will eventually get a better sense of what generative AI can and cannot be used for.
When this happens, the best use cases for these models will become clearer and other more speculative uses will be attempted less. In turn, the excitement around AI may lessen as we concentrate on more proven uses.
If scaling has hit a wall, and maturation and diffusion follow, then AI becomes less about 'artificial intelligence' and more about 'augmented intelligence'. In other words, it becomes a tool for improving things rather than something which completely replaces humans.
And as a result we focus less on 'AI kills us all' or 'AI saves us all'. It becomes just another technology.
If so, then perhaps the hype dies down. Significantly.
See Kaplan et al, 'Scaling Laws for Neural Language Models' (2020).