How homomorphic encryption works
A new solution shared by Apple for processing encrypted data without decrypting it
What is homomorphic encryption
Homomorphic encryption (HE) is a solution for performing functions on encrypted data without decrypting it.
To quote a previous post of mine explaining how encryption works:
...encryption is a method for transforming data from one format into another. The way encryption does this makes it a common security measure across different computer systems.
Encryption involves three main elements:
The plaintext
A cipher
A key
An algorithm
The cipher text
Encryption takes plaintext (i.e., the data to be encrypted), and applies an algorithm that executes a cipher with a key to transform the plaintext, the output of which is cipher text.
[...]
A crucial element of encryption is the key, or the cryptographic key. This is a piece of information used to configure the cipher that transforms the plaintext to cipher text.
Without the key, it is not possible to 'reverse engineer' the cipher text back to the plaintext.
So when data are encrypted, the cipher text is unintelligible without the secret key. Therefore, ordinarily, it is not possible to run functions on this cipher text without decrypting it first.
This is the problem that HE seeks to solve. It is a form of encryption that allows a computer to run calculations on cipher text without needing to read (i.e., see) the plain text:
Imagine that you could encrypt the values a, b, and c separately, send the ciphertexts to a service, and ask that service to return the encryption of a x 3b + 2c + 3, which you could then decrypt. The important idea here is that the service never learns about your values and always deals with ciphertexts.1
Some of the valuable use cases for HE are in the context of cloud-based services. If you can have a system whereby users store or transmit encrypted data to a server, which can process the cipher text without decrypting it, then you can essentially ensure full privacy of the information whilst still being able to use the service.
This is the very use case that Apple has been working on.
Apple's HE solution
In October 2024, Apple released an article on a HE solution is has deployed for the machine learning (ML) models running on its devices:
HE is designed so that a client device encrypts a query before sending it to a server, and the server operates on the encrypted query and generates an encrypted response, which the client then decrypts. The server does not decrypt the original request or even have access to the decryption key, so HE is designed to keep the client query private throughout the process.
Apple has implemented this solution for its Enhanced Visual Search feature in its photos app. This feature uses ML to allow users to "search their photo library for specific locations, like landmarks and points of interest."
The ML model for this runs locally on the device, which is a common approach that the company takes for its ML-based software features. This model takes a user's photo as an input and determines whether it contains a 'region of interest' (ROI) that could contain a known landmark.
If a ROI is identified, then a vector embedding is generated for that part of the image.2 This embedding is quantized (i.e., compressed) to an 8-bit precision before being encrypted using HE and sent to the Apple server.
The database held on the Apple server contains vectors of global landmarks (e.g., the Eiffel Tower in Paris, France). This database is divided into shards (i.e., subdivisions) so that the query from the user's device only needs to be sent to the relevant shard.3
Once it has reached the server, the HE solution is applied to the query to determine the landmark embeddings most similar to the vector embedding in the query. The server then returns to the user's device the candidate landmarks.
The device decrypts the server response to access the candidate landmarks. An on-device reranking model is then used to predict "the best candidate by using high-level multimodal feature descriptors, including visual similarity scores; locally stored geo-signals; popularity; and index coverage of landmarks (to debias candidate overweighting)."
Once a match is identified, the metadata for the photo is updated to include the identified landmark. Users can then easily find the photo in their library by using the landmark's name.
I think the most important part of this solution is this:
By implementing HE with a combination of privacy-preserving technologies [...], on-device and server-side ML models, and other privacy preserving techniques, we are able to deliver features like Enhanced Visual Search, without revealing to the server any information about a user’s on-device content and activity. (Emphasis added)
This could be highly relevant to the debates around Chat Control and end-to-end encryption (E2EE), which I have written about previously. If this solution proves feasible for cloud-based E2EE messaging services, then it will be interesting see how this influences policies encouraging client-side scanning and other methods for identifying illegal content on such platforms.
David Wong, Real World Cryptography (Manning Publications 2021), p.458.
This means that that part of the image is transformed into a numerical representation.
As Apple explains in its article: "A precomputed cluster codebook containing the centroids for the cluster shards is available on the user’s device. This enables the client to locally run a similarity search to identify the closest shard for the embedding, which is added to the encrypted query and sent to the server".
This was a great explainer and example to gently lead folks into better understanding the technology. It's nice to see HE escaping the theoretical realms into practical uses.
One concern that will continue to make Chat Control problematic (notwithstanding technical controls and privacy-preserving techniques, is that what is being searched for remains opaque.
This is understandable for CSAM, but turning the feature on without transparency for what is added to an ever-expanding list of badness, makes abuse trivial.
Unlike landmarks, most CC proposals rely on some version of hashing to detect images. But a hash can be anything. And it's not far-fetched to assume that once Apple say, bakes this in to the product, it will be exploited by authoritarian regimes eager to stamp out any criticism they face that can be rendered into a searchable hash.
Whether that's Winnie the Pooh images or Tianamen Square in China, to common phrases against Viktor Orban, to #nevertrump or #resist (or just Democrat) in the US.
HE won't save those folks.