Microsoft Redesigns Copilot and Adds Voice Features

Tom Warren, reporting for The Verge:

Microsoft is unveiling a big overhaul of its Copilot experience today, adding voice and vision capabilities to transform it into a more personalized AI assistant. As I exclusively revealed in my Notepad newsletter last week, Copilot’s new capabilities include a virtual news presenter mode to read you the headlines, the ability for Copilot to see what you’re looking at, and a voice feature that lets you talk to Copilot in a natural way, much like OpenAI’s Advanced Voice Mode.

Copilot is being redesigned across mobile, web, and the dedicated Windows app into a user experience that’s more card-based and looks very similar to the work Inflection AI has done with its Pi personalized AI assistant. Microsoft hired a bunch of folks from Inflection AI earlier this year, including Google DeepMind cofounder Mustafa Suleyman, who is now CEO of Microsoft AI. This is Suleyman’s first big change to Copilot since taking over the consumer side of the AI assistant…

Beyond the look and feel of this new Copilot, Microsoft is also ramping up its work on its vision of an AI companion for everyone by adding voice capabilities that are very similar to what OpenAI has introduced in ChatGPT. You can now chat with the AI assistant, ask it questions, and interrupt it like you would during a conversation with a friend or colleague. Copilot now has four voice options to pick from, and you’re encouraged to pick one when you first use this updated Copilot experience.

Copilot Vision is Microsoft’s second big bet with this redesign, allowing the AI assistant to see what you see on a webpage you’re viewing. You can ask it questions about the text, images, and content you’re viewing, and combined with the new Copilot Voice features, it will respond in a natural way. You could use this feature while you’re shopping on the web to find product recommendations, allowing Copilot to help you find different options.

Copilot has always been a GPT-4 wrapper since Microsoft is OpenAI’s largest investor, but it has always been an inferior product in my opinion due to its design. I’m glad Microsoft is reckoning with that reality and redesigning Copilot from the ground up, but the new version is still too cluttered for my liking. By contrast, ChatGPT’s iOS and macOS apps look as if Apple made them — minimalistic, native, and beautiful. And the animations that play in voice mode are stunning. That probably doesn’t matter for most, since Copilot offers GPT-4o with no rate limits for free, whereas OpenAI charges $20 a month for the same functionality, but I want my chatbots to be quick and simplistic, so I prefer ChatGPT’s interfaces.

The new interface’s design, however, doesn’t even look like a Microsoft product, and I find that endearing. I dislike Microsoft’s design inconsistencies and idiosyncrasies and have always found them more attuned to corporate customers' needs and culture — something that’s always separated Apple and Microsoft for me — but the new version of Copilot looks strictly made for home use, in Microsoft’s parlance. It’s a bit busy and noisy, but I think it’s leagues ahead of Google Gemini, Perplexity, or even the first generation of ChatGPT.

Design aside, the new version brings the rest of GPT-4o, OpenAI’s latest model, to Copilot, including the new voice mode. I was testing the new ChatGPT voice mode — which finally launched to all ChatGPT Plus subscribers last week — when I realized how quick it is. I initially thought it was reading the transcript in real-time as it was being written, but I was quickly reminded that GPT-4o is native by design: it generates the voice tokens first, then writes a transcript based on the oral answer. This new Copilot voice mode does the same because it’s presumably powered by GPT-4o, too. It can also analyze images, similar to ChatGPT, because, again, it is ChatGPT. (Not Sydney.)

I think Microsoft is getting close enough to where I can recommend Copilot as the best artificial intelligence chatbot over ChatGPT. It’s not there yet, and it seems to be rolling out new features slowly, but I like where it’s headed. I also think the voice modes of these chatbots are one of the best ways of interacting with them. While text generation is neat for a bit, the novelty quickly wore off past 2022, when ChatGPT first launched. By contrast, whenever I upload an image to ChatGPT or use its voice mode in a pinch, I’m always delighted by how smart it is. While the chatbot feels no more advanced than a souped-up version of Google, the multimodal functionality makes ChatGPT act like an assistant that can interact with the real world.

Here’s a silly example: A few days ago, I was fiddling with my camera — a real Sony mirrorless camera, not an iPhone — and wanted to disable the focus assist, a feature that zooms into the viewfinder while adjusting focus using the focus ring. I didn’t know what that feature was called, so I simply tapped the shortcut on my Home Screen to launch ChatGPT’s voice mode and asked it, “I’m using a Sony camera, and whenever I adjust focus, the viewfinder zooms in. How do I disable that?” It immediately guided me to where I needed to go in the settings to disable it, and when I asked a question about another related option, it answered that quickly, too. I didn’t have to look at my phone while I was using ChatGPT or push any buttons during the whole experience — it really was like having a more knowledgeable photographer peering over my shoulder. It was amazing, and Siri could never. That’s why I’m so excited voice mode is coming to Copilot.

In other Microsoft news, the company is making Recall — the feature where Windows automatically takes a screenshot every 30 seconds or so and lets a large language model index it for quick searching on Copilot+ PCs — optional and opt-in. It’s also now encrypting the screenshots rather than storing them in plain text, which, unbelievably, is what it was doing when the feature was first announced. Baby steps, I guess.