What the Gemini saga tells us about AI development
It is even more proof that algorithms are not just mathematics
TL;DR
This newsletter is about the latest controversies surrounding Gemini, Google's family of advanced multimodal generative AI models. It covers the problems identified with the Gemini models, the potential cause and the how this all relates to AI development.
Here are the key takeaways:
People have been complaining that Google's AI model is too 'woke'. This was after users discovered the model's tendency to generate inaccurate historical images seemingly aligned with progressive ideologies.
The source of this problem likely lies in the way that Gemini was developed. Google developers fine-tuned the model with RLHF to "correct" the biases existing the internet data that the model was trained on.
But this fine-tuning process resulted in unintended consequences in some of the model's behaviour. Its efforts to counter the inherent biases in Gemini's training data ended up going too far.
This whole saga reveals three things about AI:
AI is a value-laden science.
Bias in AI development is unavoidable.
We are still dealing with black boxes.
What is the problem?
People have been complaining that Google's most advanced AI model is too 'woke'.
In early February of 2024, Google released to the public Gemini Ultra, its most advanced multimodal large language model. The model was incorporated across Google's suite of products, including Workspace and Gmail.
Google's Bard, its AI-powered chatbot rivalling OpenAI's ChatGPT, was also rebranded to Gemini. This provided users access to improved image-generation capabilities.
But a few weeks after this launch, Google announced that it was pausing the image-generation features of Gemini. This was after users discovered the model's tendency to generate "inaccuracies in some historical image generation depictions."
What were these inaccurate historical depictions? One X user shared his experience using the model and some of the strange outputs it would produce, as shown below:
It seemed like Gemini struggled to produce images of white people. Some more examples below:
Even the model's text responses to certain prompts became infamous. See below for example:
What these outputs suggest is the existence of a progressive bias in Gemini's processing. It has exhibited a tendency to favour left-wing ideology and failed to produce white people in images when one could reasonably expect otherwise.
This was the problem identified with Gemini, and the reason why Google has been heavily criticised in recent weeks.
How did this happen?
The source of this problem lies in the way that Gemini was developed.
In the technical paper for Gemini, Google describes the fine-tuning process that it used in developing the model. But how does this work and how could this have caused the questionable outputs produced by Gemini?
How fine-tuning works
Models like Gemini are developed using two stages of training:
Pre-training. This involves feeding the model large datasets with the purpose of getting the model to develop an 'understanding' of the data and using this understanding to produce responses to prompts. For the development of Gemini, Google used data consisting of "web documents, books, and code, and includes image, audio, and video data."1
Fine-tuning. The 'understanding' that the base model gains after pre-training consists of a probability distribution, which is the likelihood of data points in its training data being the correct response to a given prompt. Fine-tuning is about nudging the model to rely on those parts of the probability distribution that produce the most accurate and relevant responses to prompts.
For Gemini, Google fine-tuned the model in a similar way to how OpenAI fine-tuned its GPT models. This involved the use of reinforcement learning with human feedback (RLHF), and this process is key to understanding the flaws shown in Gemini.
This is how is how fine-tuning with RLHF works:2
Google developers put together a prompt dataset, which used for the demonstration and feedback stages of fine-tuning (steps 2 and 3 below). The Gemini technical paper describes this dataset as consisting of "a diverse set of prompts that are representative of real-world cases."3
Using the prompt dataset, developers train the model on demonstration data. This is a supervised process, meaning that the model is shown examples of what the desired output for a given prompt should look like (hence the name supervised fine-tuning).
Again using the prompt dataset, developers have humans rank and provide feedback on responses to prompts to create feedback data and train a reward model (RM). The aim here is to develop a RM that 'understands' the human preferences expressed in the feedback data, and use the trained RM to train the base model.
The RM trains the base model by rewarding the base model for outputs that are aligned with the human preferences. This process is repeated iteratively until the base model proves capable of producing responses that align with human preferences.
The purpose of fine-tuning
Fine-tuning with RLHF is essentially about using bias to fight bias. In Google's case, the developers attempted to substitute societal biases evident in the training data with more 'correct' biases.
The reason for doing this has to do with the source of the data used to create the base model during pre-training. This is something that I wrote about previously on the drawbacks of using internet data to train generative AI models.
When pre-training large AI models like Gemini on lots of internet data, the resulting base model will generate a probability distribution of this data. When the base model produces responses to prompts, it is using this distribution to generate the response.
The problem here is that the probability distribution is a reflection of the biased, discriminatory and toxic content that features in the internet data. This includes "negative stereotypes and biases, discriminatory and harmful representation, and cultural and linguistic homogeneity."4
So when the model is using its distribution of internet data, it might use those same biases to generate its outputs. In doing so, it reinforces the shortcomings of society represented in its training data:
...if the society in which the training data are generated is structurally biased against marginalized communities, even completely accurate datasets will elicit biases...The resulting models may codify and entrench systems of power and oppression, including capitalism and classism; sexism, misogyny, and patriarchy; colonialism and imperialism; racism and white supremacy; ableism; and cis and heteronormativity.5
Herein lies the purpose of fine-tuning. It is about narrowing the scope of the probability distribution so that the model only uses the parts of it that have been assigned a higher reward by human feedback that is supposed to contain less of the harmful content.
In other words, with RLHF, developers are biasing the model to behave in a manner that is in accordance with human preferences. They are fighting bias with bias.
Gemini fine-tuning
Fighting bias is something that the Gemini developers speak to in their technical paper for the model. On the data curation practices for Gemini, it states the following:
Prior to all training stages, we take various steps to mitigate potential downstream harms through data curation and careful data collection. We filter training data for high-risk content and to ensure training data is sufficiently high quality.
Humans also play an essential role, both for data creation and evaluation, in the post-training process. For certain data creation and evaluation initiatives, we consider diversity across gender presentation, age, and racial and ethnic diversity.6
Accordingly, Google developers used specialised datasets for identifying representational harms in models. This is how they described it:
[These datasets] are constructed by starting with a harmful stereotype, and then questions are constructed to test whether models challenge or reinforce these stereotypes when answering questions.7
The developers also describe how these datasets have been helpful for determining what 'good behaviour' looks like regarding representational harms. The paper states:
[These datasets] all have a well-defined notion of desirable versus harmful behavior. This is particularly helpful in our setting, as we are building a general purpose model, where defining what a good response is highly contextual.8
One of the datasets used is Bias Benchmark for QA (BBQ). This is "a dataset of question sets constructed by the authors that highlight attested social biases against people belonging to protected classes along nine social dimensions relevant for U.S. English-speaking contexts."
We can therefore see how Google attempted to train Gemini to refrain from and counter biases existing in its training data. The problem is that such efforts ended up going too far in that direction.
So what does this all mean?
This saga demonstrates three things:
AI development is not value-free.
There are multiple ways in which bias can manifest in AI development.
We (still) do not really understand deep learning.
A value-laden science
The controversies with Gemini show that, once again, AI development is NOT a value-free science and that algorithms are NOT just mathematics (as I have written about before).
In reality, AI is very much a value-laden science. As one paper that explores the values that are empirically observable in the field of AI explains:
ML research is often cast as value neutral and emphasis is placed on positive applications or potentials. Yet, the objectives and values of ML research are influenced by many social forces that shape factors including what research gets done and who benefits.9
In Google's case, the developers believed it necessary to "correct" the biases in the internet data Gemini was trained on. This decision is clear from the model's technical paper.
Bias is unavoidable
In recognising the real-world biases in the training data for Gemini, the Google developers attempted to shape the model in a way that it would not learn, reinforce and amplify those biases.
But what this therefore proves is that bias is simply unavoidable in AI development. If AI is in fact value-laden, then models will inevitably be encoded with the biases and assumptions constituting those values imposed by developers.
Such bias can manifest in the training data used, but this is not only the place in the development cycle where this can occur. RLHF shows how bias can enter AI development through the expression of human preferences used to influence model behaviour.
So developers are confronted with a choice: either they stick with the biases inherent in their training data, or they try to address it. Google chose the latter, with undesirable results.
We are still dealing with black boxes
It remains a mystery how deep learning models actually work, and some would go even further and argue that "deep learning shouldn't work":
...we now take it for granted that with sufficient hidden units, deep networks will classify almost any training set nearly-perfectly. We also take for granted that the fitted model will generalize to new data. However, it is not at all obvious either that the training process should succeed or that the resulting model should generalize.10
Deep learning is an empirical science. The only way to truly verify how a model will behave in the real world is by releasing it into the real world and seeing what happens.
How is this case? Simply put, the size and complexity of these models make them incredibly opaque.
We understand these models at a high-level. They are are repeatedly performing billions of calculations on data to identify and record in its weights the optimal path between the input and the output.
But this is insufficient for explaining why a model has produced a particular output. As Melanie Mitchell puts in her book Artificial Intelligence: A Guide for Thinking Humans:
Even the humans who train deep networks generally cannot look under the hood and provide explanations for the decisions their networks make.11
When computer scientist Stephen Wolfram tried to look under the hood of GPT-2, a language model with over a billion parameters, the results were completely incomprehensible. For example, the image below visualises the processing applied by the model with the input "hello hello hello hello hello hello hello hello hello hello bye bye bye bye bye bye bye bye bye bye":
On this strange depiction, Wolfram notes the following:
What determines this structure? Ultimately it’s presumably some “neural net encoding” of features of human language. But as of now, what those features might be is quite unknown. In effect, we’re “opening up the brain of ChatGPT” (or at least GPT-2) and discovering, yes, it’s complicated in there, and we don’t understand it—even though in the end it’s producing recognizable human language.
This lack of understanding is likely the crux of the problem that the Google developers ran into with Gemini. They were unable to predict exactly how the model would go about achieving the goals that they had set.
Simply put, I do not think Google intended for the model to produce the controversial outputs that have garnered so much attention. I think it was a consequence of working with black boxes.
Google DeepMind, ‘Gemini: A Family of Highly Capable Multimodal Models’ (2024), 5.
Google DeepMind, ‘Gemini: A Family of Highly Capable Multimodal Models’ (2024), 20-21.
Google DeepMind, ‘Gemini: A Family of Highly Capable Multimodal Models’ (2024), 20.
Birhane et al, ‘On Hate Scaling Laws for Data Swamps’ (2023), 2.
Simon JD Prince, Understanding Deep Learning (MIT Press 2024), 423.
Google DeepMind, ‘Gemini: A Family of Highly Capable Multimodal Models’ (2024), 28-29.
Google DeepMind, ‘Gemini: A Family of Highly Capable Multimodal Models’ (2024), 34.
Google DeepMind, ‘Gemini: A Family of Highly Capable Multimodal Models’ (2024), 34.
Birhane et al, ‘The Values Encoded in Machine Learning Research’ (2022), 1.
Simon JD Prince, Understanding Deep Learning (MIT Press 2024), 401.
Melanie Mitchell, Artificial Intelligence: A Guide for Thinking Humans (Penguin Random House 2019), 127.