Apple Plans to Use a Custom Gemini Model to Power Siri in 2026

Mark Gurman, reporting for Bloomberg:

Apple Inc. is planning to pay about $1 billion a year for an ultrapowerful 1.2 trillion parameter artificial intelligence model developed by Alphabet Inc.’s Google that would help run its long-promised overhaul of the Siri voice assistant, according to people with knowledge of the matter.

Following an extensive evaluation period, the two companies are now finalizing an agreement that would give Apple access to Google’s technology, according to the people, who asked not to be identified because the deliberations are private…

Under the arrangement, Google’s Gemini model will handle Siri’s summarizer and planner functions — the components that help the voice assistant synthesize information and decide how to execute complex tasks. Some Siri features will continue to use Apple’s in-house models.

The model will run on Apple’s own Private Cloud Compute servers, ensuring that user data remains walled off from Google’s infrastructure. Apple has already allocated AI server hardware to help power the model.

This version of Gemini is certainly a custom model used for certain tasks that Apple’s “foundation models” cannot handle. I assume the “summarizer and planner functions” are the meat of the new Siri, choosing which App Intents to run, parsing queries, and summarizing web results. It wouldn’t operate like the current ChatGPT integration in iOS and macOS, though, because the model itself would be acting as Siri. The current integration passes queries from Siri to ChatGPT — it does nothing more than if someone just opened the ChatGPT app themselves and prompted it from there. The next version of Siri is Gemini under the hood.

I’m really interested to see how this pans out. Apple will probably be heavily involved in the post-training stage of the model’s production — where the model is given a personality and its responses are fine-tuned through reinforcement learning — but Google’s famed Tensor Processing Units will be responsible for pre-training, the most computationally intensive part of making a large language model. (This is the P in GPT, or generative pre-trained transformer.) Apple presumably didn’t start on developing the software and gathering the training data required to build such an enormous model — 1.2 trillion parameters — early enough, so it offloaded the hard part to Google for the low price of $1 billion a year. The model should act like an Apple-made one, except much more capable.

This custom version of Gemini should accomplish its integration with Apple software not just through post-training but through tool calling, perhaps through the Model Context Protocol for web search, multimodal functionality, and Apple’s own App Intents and personal context apparatus demonstrated at the 2024 Worldwide Developers Conference. I’m especially intrigued to see what the new interface will look like, especially since Gemini might take a bit longer than Siri today to generate answers. There is no practical way to run a 1.2 trillion-parameter model on any device, so I also wonder how the router will decide which prompts to send to Private Cloud Compute versus the lower-quality on-device models.

I do want to touch on the model’s supposed size. 1.2 trillion parameters would make this model similar in size to GPT-4, which was rumored to be 1.8 trillion parameters in size. GPT-5 might be a few hundred billion higher, and one of the largest models one can run on-device is GPT-OSS with a size of 120 billion parameters. A “parameter” in machine learning is a weight given to a learnable value. LLMs predict the probability of the next word in a token in a sequence by training on many other sequences. The weights of those various probabilities are parameters. Therefore, the more parameters, the more probabilities (“answers”) the model has. Most of those parameters would not be used during everyday inference, as Federico Viticci points out on Mastodon, but it’s still important to note how large this model is.

We are so back.