Google Launches Gemini 3, the Smartest Model for the Next 10 Weeks

Simon Willison, writing on his delightful blog:

Google released Gemini 3 Pro today. Here’s the announcement from Sundar Pichai, Demis Hassabis, and Koray Kavukcuoglu, their developer blog announcement from Logan Kilpatrick, the Gemini 3 Pro Model Card, and their collection of 11 more articles. It’s a big release!

I had a few days of preview access to this model via AI Studio. The best way to describe it is that it’s Gemini 2.5 upgraded to match the leading rival models.

Gemini 3 has the same underlying characteristics as Gemini 2.5. The knowledge cutoff is the same (January 2025). It accepts 1 million input tokens, can output up to 64,000 tokens, and has multimodal inputs across text, images, audio, and video.

I strongly agree with Willison: Gemini 3 isn’t a groundbreaking new model like GPT-4 or Gemini 2. I think large language models have hit a point of maturity where we don’t see such groundbreaking leaps in intelligence with major releases. The true test of these models will be equipping them with the correct tools, integrations, and context to be useful beyond chatbots. Examples include OpenAI’s acquisition of Software Applications Inc., the makers of the Sky Mac app; Gemini’s features in Chrome, Android, and ChromeOS; and Apple’s “more personalized Siri,” which is apparently due for launch any time between now and the world’s ending. That’s why Silicon Valley companies are hell bent on “agents” — they’re applications of LLMs that prove useful sometimes.

Back to Gemini 3, which, nevertheless, is an imposing model. It beats Claude Sonnet 4.5, GPT-5.1, and its predecessor handily in every benchmark, with the notable exception of SWE-bench, a software engineering benchmark that Claude still excels at. (SWE-bench tests models’ capability in fixing bug reports in real GitHub repositories, mostly in Python.) That’s unsurprising to me because Claude is beloved for its programming performance. Even OpenAI’s finest models cannot compete with Claude’s ingenuity, clever personality, and syntactically neat responses. Claude always matches the complexity of the program as it is. For instance, if a program isn’t using recursion, Claude understands that it probably shouldn’t, either, and uses a different solution. ChatGPT, on the other hand, just picks whatever is most efficient and uses as few lines of code as possible.

Gemini is quite competent at programming, but I don’t regularly use it for that. Gemini 3 Pro does not change this. It has historically been poor at SwiftUI, unlike ChatGPT, and I find its coding style to be unlike mine. It takes a very verbose route to solving problems, whereas Claude treats its users like adults. That’s not to say Gemini 3 is bad at programming, but it certainly is not as performant as Claude Sonnet 4.5 or GPT-5.1 with medium reasoning. Interestingly, Google has launched a new Visual Studio Code fork on Tuesday called Antigravity, with free support for Gemini 3 Pro and Claude Sonnet 4.5. I assume this will be Google engineers’ text editor of choice going forward, and it gives the newly acquired Windsurf team something to do at Google. Cursor should be worried — Antigravity’s Tab autocomplete model is equally as performant and it has great models available for free with “generous” rate limits.

Outside of programming, I found I used Gemini 2.5 Pro for analyzing and working with long text documents, like PDFs, the most. This is not just because of its industry-leading one-million-token context window, but because it’s trained to read the entire document and cite its sources properly. I don’t know what sorcery Google did to make Gemini so good at this, but OpenAI could learn. ChatGPT still writes (ugly) Python code to read bits of the document at a time, and often fails to parse text when it isn’t perfectly formatted. Claude’s tool calling, meanwhile, is nowhere near as good as Gemini or ChatGPT, and I seldom upload documents to it. In recent weeks, however, I’ve been uploading more documents to ChatGPT as I found that, despite its flaws, it was doing a slightly better job than Gemini only because GPT-5.1 is newer. Now that ChatGPT no longer has that advantage, I’m happy to go back to Gemini for my document reading needs.

Gemini 2.5 Pro was also the best for engineering-related explanations like physics, chemistry, and mathematics. ChatGPT got these questions right — and is much quicker than Gemini — but I appreciate Gemini’s problem-solving process more than GPT-5.1, even when set to the Candid personality. But again, in recent weeks, I’ve veered away from Gemini and switched to Claude for these explanations, despite Claude not rendering LaTeX math equations half the time, because I could feel Gemini 2.5 Pro getting old. (“Old” in the context of LLMs means untouched in three months.) Claude Sonnet 4.5 had more detail in its explanations and provided more robust proofs of certain math concepts, like ChatGPT, but with a more teacher-like personality. Gemini once again takes the crown for these kinds of explanations.

All of this is to say that Gemini 3 Pro is a great model, and I’m excited to use it again after the blockbuster launch of Gemini 2.0 Pro. It’s just that its predecessor was getting a bit old, but Google is back in the race. Here are my current use cases for the three major artificial intelligence chatbots at the end of 2025:

ChatGPT: Search and a great Mac app. Useful for general chatting and reliable answers.
Claude: Claude Code, Cursor, and literary analysis. Useful for its math explanations and nuance.
Gemini: Image analysis and document uploads. Also, copyable LaTeX.