Google Plays Catch-Up to OpenAI at This Year’s I/O

Google threw things at the wall — now, it hopes some will stick

An image of Sundar Pichai, Google’s chief executive, onstage at Google I/O 2024.

At the opening keynote of its I/O developer conference on Tuesday, Google employed a strategy born of sheer desperation: Throw things at the wall and see what sticks. The company, famed for leading the artificial intelligence revolution within Silicon Valley for years, has been overtaken by none other than a scrappy neighbor with some help from Microsoft, one of its most notable archenemies. That neighbor, OpenAI, stunned the world just a day prior on Monday with the announcement of a new omni-modal large language model, GPT-4o, which features a remarkably capable and humanlike text-to-speech apparatus and state-of-the-art visual recognition technology. OpenAI first took the world by storm in November 2022 with the launch of its chatbot, ChatGPT, which instantly became one of the fastest-growing consumer technology products ever. From there, it has only been smooth sailing for the company, and everyone else has been trying to catch up — including Google.

In a hurry, Google quickly went into overdrive, declaring a “code red” and putting all hands on deck after Microsoft announced a new partnership with OpenAI to bring the new generative pre-trained transformer technology to Bing. Last year, Google announced Bard, its AI chatbot meant to rival OpenAI, only for OpenAI’s latest GPT-4 to run laps around it. Bard would consistently flub answers through hallucinations — phenomena where chatbots confidently provide wrong answers unknowingly due to a quirk in their design — fail to provide references, and ignore commands, placing it dead last in the rankings against its rivals. At Google’s I/O conference last year, Google began trying to add the model hurriedly to its existing Google Workspace products, like Google Docs and Gmail, but most users didn’t find it very useful due to its constant mistakes.

Later in the year, Google announced three new models to better compete with OpenAI: Gemini Nano, Gemini Pro, and Gemini Ultra¹. The three models — each with varying parameter and context token sizes — were poised to perform different tasks each, but Google quickly touted how Gemini Pro was comparable to GPT-3.5 and Gemini Ultra even beat GPT-4 in some circumstances. It put out a demonstration showcasing the multimodal features of Gemini Ultra, showed off Gemini Pro’s deep interaction with Google products like YouTube and Google Search, and pre-installed the smaller Gemini Nano model onto Pixel phones in the fall to perform quick on-device tasks. And most importantly of all, to change Bard’s brand reputation, Google changed the name of its AI product and chatbot to Gemini. Eventually, it attempted to put Gemini everywhere: in Google Assistant, in Google Search by way of Search Generative Experience, and in its own app and website. It was a fragmented mess — while the models were average at best, there were too many of them in too many places. They cluttered Google’s already complex ecosystem of products.

So, with the stage set, expectations were high for Tuesday’s I/O event, where Google was poised to clean up the clutter and consolidate the AI mess it had entangled for itself so hastily over the last 16 months. And, in typical Google fashion, the company utterly flopped. Instead, Google leaned in on the mess, throwing Gemini into every Google product imaginable. Google Search now has Gemini built-in for content summaries, replacing SGE for all U.S. users beginning this fall; Gmail now has Gemini search and summaries to shorten threads, find old emails, and draft responses; Android now has a contextually aware version of Gemini which can be asked questions depending on user selections; and every nook and cranny of Google’s services has been dusted with the illustrious sparkles of AI in some capacity. I tried to make some sense out of the muddied features, and here is what I believe Google’s current master plan is:

Let developers toy with Gemini however they would like, lowering prices for the Gemini application programming interface and making new open-source LLMs to lead the way in the development and production of AI-focused third-party applications.
Bring Gemini to every consumer product for free to increase user engagement and deliver shareholder value to please Wall Street.
Unveil new moonshot projects to excite people and sell them on the prospect of AI.

I came up with this thesis after closely observing Google’s announcements on Tuesday, and I think it makes sense from an organizational, business perspective. In practice, however, it just looks desperate. Tuesday was catch-up day for Google — the company did not announce anything genuinely revolutionary or never seen before but rather focused its efforts on reclaiming its top spot in the AI space. Whether the strategy will yield a positive result is to be determined. In the meantime, though, consumers are left with boring, uninteresting, unexciting events that mainly function as shareholder advertisements instead of places to showcase new technology. Google I/O was such an event, with its steam stolen by OpenAI’s presentation just the day prior — and that is entirely the fault of Google, not OpenAI. Here are my takeaways from the keynote this year.

Gemini for the Web

Since the advent of ChatGPT, AI chatbots and their makers have been intent on upending the norms of the web. Publishers have reported frustration due to decreased traffic, users are inundated with cheap AI-generated spam whenever they make a Google search, and it is even harder than ever to ensure answers’ accuracy. Google, without a doubt, bears some responsibility for this after its beta introduction of SGE last year, which automatically queries the web and then quickly writes a summary pinned to the top of the results page. And even before that, Gemini was engineered to search the web to generate its answers, providing citations in line for users to fact-check its responses.

In practice, though, the citations and links to other websites are minuscule and are rarely clicked because most of the time, they’re simply unneeded. Instead of taking steps to address this information conundrum that has plagued the web for over a year, Google leaned into it at I/O this year — both in Google Search and Gemini, the chatbot.

First, Gemini: Gemini had fallen behind in sheer number of features compared to OpenAI’s GPT-4, so Google announced some remedies to better compete in the saturated chatbot market. The company announced it would build a conversation two-way voice mode into Gemini — both the web version and mobile app — similar to OpenAI’s announcements from Monday, allowing users to speak to the robot directly and receive speedy answers. It said the feature, which will become available later this year, will be conversational unlike Google Assistant, which currently only speaks aloud answers to user queries without asking follow-up questions.

However, it is unclear how this differs compared to the Gemini Google Assistant mode available for Pixel users now. Google Assistant on Pixel phones has two modes: the standard Google Assistant mode and Gemini, which uses the chatbot to generate answers. Moreover, there is already feature parity between the Gemini app and Google Assistant on Android, further muddling feature sets between Google’s AI products. This is what I mean by Gemini coming to every nook and cranny of Google’s software. Google needs to clean up this product line.

The new version of Gemini will also allow users to create custom, task-specific mini chatbots called “Gems,” a clever play on “Gemini.” This feature is meant to rival OpenAI’s “GPTs,” customizable GPT-4-powered chatbots that can be individually given instructions to perform a specific task. For example, a GPT can be programmed to search for grammar mistakes whenever a user uploads a file — that way, there is no need to describe what to do with every file that is uploaded on the user’s end as someone would have to do with the normal version of ChatGPT. Gems are a one-to-one knockoff of GPTs — users can make their own Gems and program them to perform specific tasks beforehand. Gems will be able to access the web, potentially becoming useful research tools, and they will also have multimodal functionality for paying Gemini Advanced users, allowing image and video uploads. Google says Gems will be available sometime in the summer for all users in the Gemini app on Android, Google app on iOS, and on the web.

And then, there is Google Search: Since the winter, Google has been slowly rolling out its SGE summaries to all web users on Google. The summaries appear with an “Experimental” badge and big, bold answers, and typically generate a second or two after the search has been made. The company now has fully renamed the experimental feature to “search summaries,” removing the feature from beta testing (it was only available through Google’s “Labs” portal) and vowing to expand it to all U.S. users by the end of the year. The change has the potential to entirely rewrite the internet, killing traffic to publishers that rely on Google Search to survive and sell advertisements on their pages, as well as disincentivizing high-quality handwritten answers on the web. The Gemini-powered search summaries do provide sources, but they are often buried below the summary and seldom clicked on by users, who are commonly content with the short AI-generated blurb.

The summaries are also prone to making mistakes and fabricating information, even though they’re placed front-and-center in the usually reliable Google Search interface. This is extremely dangerous: Google users are accustomed to reliable, correct answers appearing in Google Search and might not be able to distinguish between the new AI-generated summaries and the old content snippets, which remain below the Gemini blurb. No matter how many disclaimers Google adds, I think it is still too early to add this feature to a product used by billions. I am not entirely pessimistic about the concept of AI summaries in search — I actually think this is the best use case for generative artificial intelligence — but in its current state, it is best to leave this as a beta feature for savvy or curious users to enable for themselves. The expansion and improvement of the summaries were a marquee feature of Tuesday’s presentation, taking up a decent chunk of the address, and yet Google made an egregious error in its promotional video for the product, as spotted by Nilay Patel, the editor in chief of The Verge. That says a lot.

Google improved its summaries feature before beginning the mass rollout, though: it touted what it called “multi-step reasoning,” allowing Google Search to essentially function as the Gemini chatbot itself so users can enter multiple questions at once into the search bar. Most Google searches aren’t typically conversational; most people perform several searches in a row to fully learn something. This practice, as Casey Newton wrote for Platformer, once upon a time, used to be enjoyable. Finding an answer, repeating the search with more information, and clicking another one of the 10 blue links is a ritual practiced by hundreds of millions of people daily, and Google seems intent on destroying it.

Why the company has decided to upend its core search product is obvious: Google Search is bad now. Nowadays, Google recommends AI-generated pages engineered for maximum clicks and advertising revenue rather than useful, human-written sites, leading users to append “Reddit” or “Twitter” to their queries to find real answers written by real people. Google has tacitly shown that it has no interest in fixing the core problem at hand — instead, it is just closing up shop and redirecting users to an inferior product.

Google’s objective at I/O was to circumvent the problem of the internet no longer being helpful by making AI perform searches automatically. Google showcased queries that notably included the word “and” in them — for example: “What is the best Pilates studio in Boston and how long would it take to walk there from Bacon Hill?” Before Tuesday, one would have to split that question into two: “What is the best Pilates studio in Boston?” and “Travel time between the studio and home.” (The latter would probably be a Google Maps search.)

It is a highly specific yet somehow absolutely relevant example of Google throwing in the towel on web search. When Google detects a multi-step query, it does not present 10 blue links that might have the answer to both questions, because that would be all but impossible. (Very few websites would have such specific information.) It instead generates an AI summary of information pulled from all over the web — including from Google Maps — effectively negating the need to do further research. While this might sound positive, it in reality kills the usefulness of the internet by relegating the task of searching for information to a robot.

People will learn less from this technology, they will enjoy using the internet less, and as a result, publishers will be less incentivized to add to the corpus of information Gemini uses to provide answers. The new AI features are good short-term solutions to improve the usefulness of the world’s information superhighway, but they cause a major chicken-and-egg problem that Google has continuously either ignored or chosen to purposefully neglect. This pressing issue does not fit well in the quick pace of a presentation, but it will cause an already noticeable decline in high-quality information on the web. It is a short-term bandage over the wound that is lazy, money-hungry analytics firms — once the bandage withers and expires, the wound will still be there.

That is not to say that Google should not invest in AI at all, because AI pessimism is a conservative, cowardly ideology not rooted in fact. Instead, Google should use AI to remedy the major problem at hand, which it caused itself. AI can be used to find good information, improve recommendation algorithms, and help users find answers to their questions in fewer words. Google is more than capable of taking a thoughtful approach to this glitch in the information ecosystem, and that is apparent because of its latest enhancement to its traditional search product: ask with video and Circle to Search.

Asking questions with video is exactly the type of enhancement AI can bring without uprooting the vast library of information on the web. The new search feature is built into Google Lens but utilizes Google’s multimodal generative AI to analyze video clips recorded through the Google mobile app along with a quick voice prompt. When a recording is initiated, the app asks users to describe a problem, such as why a pictured record player isn’t working. It then uses AI to understand the prompt and video, then generate an answer with sources pulled from the web.

The reason this is more groundbreaking than worrisome is because it (a) enables people to learn more than they would otherwise, (b) adds a qualitative improvement to the user experience, and (c) encourages authors to contribute information to be featured as one of the sources for the explanation. It is just enough of a change to the habits of the internet where the result is a net positive. Google is doing more than simply performing Google searches by itself, then paraphrasing the answers — it is understanding a query using a neural network, gathering sources, and then explaining them while also providing credit. In other words, it isn’t a summary; it’s a new, remarkable piece of work.

It is safe to say that for now, I am pessimistic about Google’s rethinking of the web. Google’s chatbots consistently provide incorrect answers to prompts, the summaries’ placement alongside the 10 blue links — which aren’t even 10 blue links anymore — can be confusing to non-savvy users, and the new features feel more like ignorant, soulless bets on an illustrious “new internet” rather than true innovations that will improve people’s lives. But that isn’t to say there is no future for generative AI in search — there is in myriad ways. But the sheer unwillingness on Google’s end to truly embrace generative AI’s quirks is astonishing.

Gemini for Users

Google’s apparent attempt to reinvent the internet does not just stop at the web — it also extends to its personal services, like Google Photos and Gmail. This extension first took place last year at Google I/O, and many of Tuesday’s announcements seemed like déjà vu, but this year the company seemed more intent on utilizing the multimodal capabilities and larger context lengths of its latest LLMs to improve search capabilities and provide better summaries, an advantage it hadn’t developed last May.

First, Google Photos, which the company opened the event with, surprisingly. Google described a limitation of basic optical character recognition-based search: Say someone wanted to find their license plate number in a sea of images of various cars and other vehicles. Previously, they would have to sift through the photos until they found one of their car, but with multimodal AI, Gemini can locate the photos of one’s car automatically, and then display the license plate number in a cropped format. This enhanced, contextual search functions like a chatbot within Google Photos to make searching and categorizing photos easier. The AI, which uses Gemini under the hood, uses data from a user’s photo library, such as facial recognition data and geolocation, to find photos that might fit specific parameters or a theme. (One of the examples shown onstage was a user asking for photos of their daughter growing up.)

In Gmail, Google announced new email summarization features to “catch up” on threads via Gemini-written synopses. Additionally, the search bar in Gmail will allow users to sift through messages from a particular sender to find specified bits of information, such as a date for an event or a deadline for a task, without having to enumerate each email individually. The new features — while not improving the traditional Gmail search experience used to find attachments and sort between categories like the sender and send date — do fill the role of a personal assistant in many ways. And they’re also present in the Gemini chatbot interface, so users can ask Gemini to fetch emails about a given subject in the middle of a pre-existing chat conversation. Google said the new features would roll out to all users beginning Tuesday.

The new additions are reminiscent of Microsoft’s Outlook / Microsoft 365 features first debuted last year, and I surmise that is the point. Google’s flagship Gmail service had next to zero AI features, whereas now it can summarize emails and write drafts for new ones, all inline. However, these new Gemini-powered AI features create an interesting paradox I outlined last year: Users will send emails using AI only for the receiver to summarize them using AI and draft responses synthetically, which the sender will receive and summarize using AI. It is an endless, unnecessary cycle that exists due to the quirks of human communication. I do not think this is the fault of Google — it’s just interesting to see why these tools were developed in the first place and to observe how they might be used in the real world.

My favorite addition, however, is what settles the AI hardware debate that has become a hot topic of debate in recent weeks: Gemini in Circle to Search. Circle to Search — first announced earlier this year — allows users to capture a screenshot of sorts, then circle a subject for Google Lens to analyze. Now, Circle to Search adds the multimodal version of Gemini, Gemini Ultra, as well as Gemini Nano, which runs locally on Pixel phones for smaller, more lightweight queries. This one, simple-on-paper addition to Circle to Search, an already non-sophisticated feature, nearly kills both the Rabbit R1 and Humane Ai Pin. With just a simple swipe gesture, any object — physical or virtual — can be analyzed and researched by an intelligent, capable LLM. It’s novel, inventive, and eliminates the often substantial barrier between trying to understand something in the spur of the moment and accessing information.” It makes the process of searching simple, which is exactly Google’s mission statement.

Circle to Search does not summarize the web in the way other Gemini features do because it is mostly powered by a lightweight model with a smaller context window that runs on-device. Instead, it falls back to the web in most instances, but what it does do is perform the task of writing the Google search. Instead of having to enter into Google a query like “orange box with AI designed by Teenage Engineering,” a simple screenshot can automatically write that search and present links to the Rabbit R1. It is a perfect, elegant, amazing implementation of AI now supercharged by an LLM. Google says this type of searching is context-aware, which is a crucial tenant of useful information gathering because there is no use to information if it is not contextual. On Google, that awareness must be manually entered or inferred, but with Circle to Search, the system knows precisely what is happening on a user’s screen.

This might sound like the standard Google Lens, but it is much more advanced than that. It can summarize text, explain a topic, or use existing user data, such as calendar events or notes, to personalize its responses. And because it has the advantage of context awareness, it can be more personal, succinct, and knowledgeable — exactly what the AI devices from Rabbit and Humane lack. Circle to Search with Gemini is built into the most important technological device, and it is exactly the best use for AI. Yes, it might reduce the number of Google searches typed in, upsetting publishers, but it makes using computers more intuitive and personal. Google should run with Circle to Search — it is a winner.

Circle to Search is also powered by a new LLM Google announced during its presentation², called LearnLM, designed for educational settings and based on Gemini. LearnLM was demonstrated with a Circle to Search query where some algebra homework was presented — the chatbot was able to explain the answer thoroughly using the correct typography and notation, too. Presenters also described the LLM as available on Google Classroom, Google’s learning management software, and YouTube, to explain “educational videos.” The YouTube chatbot interface, which was first beta tested amongst select YouTube Premium subscribers last year, will be available more broadly and will enable users to ask questions about certain videos and find comments more easily. It is unclear what the difference is between LearnLM and Gemini exactly, but I assume LearnLM has less, more specific training data to address hallucinations.

Here are some miscellaneous additions also announced Tuesday:

NotebookLM, Google’s LLM-powered research tool that users can upload custom training data to, now uses Gemini to provide responses. The tool is mainly used to study for tests or better understand notes; it was first released to the general public last year. The most noteworthy addition, however, was the new conversation mode, which simulates two virtual characters having a faux conversation about a topic using the user-provided training data. Users can then interject with a question of their own by clicking a button, which pauses the “conversation” — when a question is asked, the computer-generated voices answer it within the context of the training data.
On-device AI, powered by Gemini Nano, will now alert users when a phone call might be a scam. This feature will, without a doubt, be helpful for seniors and the less technically inclined. Gemini will listen to calls — even ones it doesn’t automatically flag as spam — and show an alert if it detects it might be malicious.

Google, for years, has excelled at making the smartest smartphones, and this year is no exception. While the company’s web AI features have left me frustrated and skeptical, the user-end features are much more Google-like, adding delight and usefulness while also putting to rest AI grifts with no value. Many of these features might be Android-exclusive, but that makes me even more excited for the Worldwide Developers Conference when Apple is rumored to announce similar enhancements and additions to iOS. The on-device AI feature announcements at Google I/O this year were the only times I felt somewhat excited about what Google had to announce Tuesday, though it might have also helped that those features were revealed toward the beginning of the keynote.

Gemini for Investors

Project Astra is Google’s name for Silicon Valley’s next AI grift. By itself, the technology is quite impressive in the same way that Monday’s OpenAI event was: a presenter showcased how Project Astra could, in real-time, identify objects it looked at via a smartphone camera, then answer questions about them. It was able to read text from a whiteboard, identify Schrödinger’s cat, and name a place just from looking outside a window. It’s a real-time, multimodal AI apparatus, just like OpenAI’s, but there is only one problem: we don’t know if it will ever exist.

Google has a history of announcing products to do nothing more than hike its stock price, like Google Duplex, a voice-to-text AI model that was poised to be able to make calls to secure reservations or perform other mundane tasks with a simple text prompt. Project Astra feels exactly like one of those products because of how vague the demonstration was: The company did not provide a release date, more details on what it may be able to do, or even what LLMs it might be powered by. (It doesn’t even have a proper name.) All the audience received on a sunny spring morning in Mountain View, California, was a video of a smartphone, and later some smart glasses, identifying physical objects while answering questions in an eccentric voice.

The world had already received that video just a day prior, except that time, it received a release date too. And that is a perfect place to circle back to the original point I made at the very beginning of this article: OpenAI stole Google’s thunder, ate its lunch, took its money, and got all the fame. That was not OpenAI’s fault — it was Google’s fault for failing to predict the artificial intelligence revolution. For being so disorganized and unmotivated, for having such an incompetent leader, for being unfocused, and for not realizing the potential of its own employees. Google failed, and now the company is in overdrive mode, throwing everything at the wall and seeing what sticks. Tuesday’s event was the final show — it’s summit or bust.

More than to please users, Tuesday’s Google I/O served the purpose of pleasing investors. It was painfully evident in every scene how uninspired and apathetic the presenters were. None of them had any ambition or excitement to present their work — they were just there because they had to. And they were right: Google had to be there on Tuesday, lest its tenure as the leader of AI come to an end. I’d argue that has already happened — Microsoft and OpenAI have already won, and the only way for Google to make a comeback is by fixing itself first. Put on your oxygen mask before helping others; address your pitfalls before running the marathon.

Google desperately needs a new chief executive, new leadership, and some new life. Mountain View is aimless, and for now, hopeless. The mud is not sticking, Google.

Gemini Nano, Gemini Pro, and Gemini Ultra are Google’s last-generation models. Gemini 1.5 Pro is the latest, and performs equally to Gemini Ultra, though without multimodal capability. Google also announced Gemini Flash on Tuesday, which is smaller than Gemini Nano. It is unclear if Gemini Flash is built on the 1.5 architecture or the 1.0 one. ↩︎
Here is a handy list of Google’s current LLMs. ↩︎