Google Launches the Terribly-Named Gemini 2 Flash LLM

Abner Li, reporting for 9to5Google:

Just over a year after version 1.0, Google today announced Gemini 2.0 as its “new AI model for the agentic era.”

CEO Sundar Pichai summarizes it as such: “If Gemini 1.0 was about organizing and understanding information, Gemini 2.0 is about making it much more useful.” For Google, agents are systems that get something done on your behalf by being able to reason, plan, and have memory.

The first model available is Gemini 2.0 Flash, which notably “outperforms 1.5 Pro on key benchmarks” — across code, factuality, math, reasoning, and more — at twice the speed.

It supports multimodal output like “natively generated images mixed with text” for “conversational, multi-turn editing,” and multilingual audio that developers can customize (voices, languages, and accents). Finally, it can natively call tools like Google Search (for more factual answers) and code execution.

To even begin to understand this article, it’s important to recall the Gemini model hierarchy:

The highest-end model is presumably, for now, still Gemini 1.0 Ultra. There isn’t a Gemini 1.5 version of this model — 1.5 was introduced in February — but it’s still the most powerful one according to Google’s blog post from then. The catch is that I can’t find a place to use it; it’s not available with a Gemini Advanced subscription or the application programming interface.
Gemini 2.0 Flash is the latest experimental model, and it outperforms all other publicly available Gemini models, according to Google. It doesn’t require a subscription for now.
Gemini 1.5 Pro was the second-best model, only to 1.0 Ultra, up until Wednesday morning. It’s available to Gemini Advanced users.
Gemini 1.5 Flash is the free Gemini model used in Google’s artificial intelligence search overviews.
Gemini 1.5 Nano is used on-device on Pixel devices.

I assume a Gemini 2.0 Pro model will come in January when 2.0 Flash comes out of beta, but Google could always call it something different. Either way, Gemini 2.0 is markedly better than the previous versions of Gemini, which underperformed the competition by a long shot. ChatGPT 4o and Claude 3.5 Haiku continue to be the best models for most tasks, including writing both code and prose, but Gemini 2.0 is better at knowledge questions than Claude because it has access to the web. Truth be told, the large language model rankings I posted Tuesday night are pretty messy after the launch of Google’s latest model: I still think Claude is better than Gemini, but not by much and only in some cases. Neither is as good as ChatGPT, though, which is the most reliable and accurate.

No subscription is necessary to use 2.0 Flash, but whenever 2.0 Pro comes out, requiring a subscription, I feel like it’ll fare better than Claude’s 3.5 Sonnet, which is the higher-end model that sometimes does worse than the free version. I subscribed anyway, but I don’t know if I’ll continue paying because Gemini doesn’t have a Mac app — not even a bad web app like Claude¹. Still, I’m forcing myself to use it over Claude, which I’ve used for free as a backup to my paid ChatGPT subscription when OpenAI inevitably fails me. Gemini does have an iOS app, though, and I think it’s better than Claude’s. (I admittedly don’t use any chatbot but ChatGPT on iOS.) The real reason I paid for Gemini Advanced is Deep Research:

First previewed at the end of Made by Google 2024 in August, you ask Gemini a research question and it will create a multi-step plan. You will be able to revise that plan, like adding more aspects to look into.

Once approved and “Start research” is clicked, Gemini will be “searching [the web], finding interesting pieces of information and then starting a new search based on what it’s learned. It repeats this process multiple times.” Throughout the process, Gemini “continuously refines its analysis.”

I admittedly don’t do a lot of deep research in my life, but I think this will be a much better version of Perplexity, which I begrudge using after its chief executive discounted the work of journalists on the web. (Typical Silicon Valley grifters.) It’s interesting to see Google use Gemini 1.5 Pro for this agentic work after touting 2.0 Flash as a “new AI model for the agentic era.” Why not introduce the new feature with the new model? Typical Google. Qualms aside, I like it, and I’ll try to use it whenever I can over regular Google Search, which continues to decline significantly in quality. It really does feel like Google is internally snatching people from the Search department and moving them over to Gemini.

Project Mariner is the last main initiative Google announced on Wednesday, and it reminds me of Anthropic’s demonstration a few months ago:

Meanwhile, Project Mariner is an agent that can browse and navigate (type, scroll, or click) the web to perform a broader task specified by the user. Specifically, it can “understand and reason across information in your browser screen, including pixels and web elements like text, code, images and forms.”

This is vaporware at its finest. A general rule of thumb when assessing Google products is whenever it prepends “Project” to anything, it’ll never ship. And neither do I want it to ship, either, because the best way to interact with third-party tools is not by clicking around on a computer but by using APIs. Google uses a bunch of private APIs born from deals with the most important web-based companies, like Expedia, Amazon, and Uber — if there’s a company with leverage to build an agentic version of Gemini, it’s Google, which basically owns the web and most of its traffic. Nobody needs fancy mouse cursors — that’s an idea for The Browser Company.

I’ve created a Safari web app for it on my Mac, and even that is better than Anthropic’s garbage. ↩︎