Apple’s New Transcription Tools ‘Outpace Whisper’

John Voorhees, writing at MacStories:

On the way, Finn filled me in on a new class in Apple’s Speech framework called SpeechAnalyzer and its SpeechTranscriber module. Both the class and module are part of Apple’s OS betas that were released to developers last week at WWDC. My ears perked up immediately when he told me that he’d tested SpeechAnalyzer and SpeechTranscriber and was impressed with how fast and accurate they were…

I asked Finn what it would take to build a command line tool to transcribe video and audio files with SpeechAnalyzer and SpeechTranscriber. He figured it would only take about 10 minutes, and he wasn’t far off. In the end, it took me longer to get around to installing macOS Tahoe after WWDC than it took Finn to build Yap, a simple command line utility that takes audio and video files as input and outputs SRT- and TXT-formatted transcripts.

Yesterday, I finally took the Tahoe plunge and immediately installed Yap. I grabbed the 7GB 4K video version of AppStories episode 441, which is about 34 minutes long, and ran it through Yap. It took just 45 seconds to generate an SRT file.

Speech transcription has historically been a lackluster part of Apple’s operating systems, especially compared to Google. A few years ago, Apple’s keyboard dictation feature — found by pressing the F5 key on Apple silicon Macs or the Dictation button on the iPhone’s keyboard — didn’t even have support for proper punctuation, making it unusable for anything other than quick texts. In recent years, it’s gotten better, with support for automatic period and comma insertion, but I still find it errs way more than I’d like. These days, I mostly use Whisper through MacWhisper on my Mac and Aiko on my iPhone — two excellent apps that work when I need dictation, which is rare because I’m a pretty good typist.

The new SpeechTranscriber framework is built into Voice Memos and Notes, and I think the former is especially helpful as it brings Apple back up to speed with Google, whose Pixel Recorder app is one of the most phenomenal voice-to-text utilities aside from OpenAI’s Whisper, which takes longer to generate a transcription. But I wish Apple put it in more places, like the iOS and macOS native dictation tool, which I still think is the most common way people transcribe text on their devices. Apple’s implementation, according to Voorhees, is way faster than Whisper and even includes a “volatile transcription” part that allows an app to display near real-time transcriptions, just like keyboard dictation. Apple says the new framework is only meant to be used for long-form audio, but by the way keyboard dictation butchers my words, I feel like Apple should make this new framework the standard system-wide. Until then, I’ll just have to use Aiko and MacWhisper.

For fun, I read aloud the introduction to my article from a week ago and had MacWhisper, Apple’s new SpeechTranscriber, and macOS 15 Sonoma’s dictation feature try to transcribe it. Here are the results (and the original text):

macOS dictation:

Apple, on Monday and its worldwide developers conference, announce the cavalcade of updates to its latest operating systems in a clear attempt to deflect from the mire of the companies, apple intelligence failures throughout the year during the key address held at Apple, Park in Cupertino California Apple,’s choice to focus on what the company has historically been the best at user interface design over it’s halfhearted apple intelligence strategy became obvious it very clearly doesn’t want to discuss artificial intelligence because it knows it can’t compete with the likes of AI anthropic or it’s arch enemy google who is Google Io developer conference a few weeks ago was a downright embarrassment for Apple.

MacWhisper, using the on-device WhisperKit model:

Apple on Monday at its Worldwide Developers Conference announced a cavalcade of updates to its latest operating systems in a clear attempt to deflect from the mire of the company’s Apple Intelligence failures throughout the year. During the keynote address held at Apple Park in Cupertino, California, Apple’s choice to focus on what the company has historically been the best at, user interface design, over its half-hearted Apple Intelligence strategy became obvious. It very clearly doesn’t want to discuss artificial intelligence because it knows it can’t compete with the likes of OpenAI, Anthropic, or its arch-enemy, Google, whose Google I/O developer conference a few weeks ago was a downright embarrassment for Apple.

Apple’s new transcription feature, from Voice Memos in iOS 26:

Apple on Monday at its worldwide developers’ conference, announced a cavalcade of updates to its latest operating systems in a clear attempt to deflection the mire of the company’s Apple Intelligence failures throughout the year. During the Keynote address held at Apple Park in Cupertino, California, Apple’s choice to focus on what the company has historically been the best at, user interface design, over its half hearted Apple intelligence strategy became obvious. It very clearly doesn’t want to discuss artificial intelligence, because it knows it can’t compete with the likes of OpenAI, anthropic, or its arch enemy, Google, whose Google IO developer conference a few weeks ago was a downright embarrassment for Apple.

Apple’s new transcription model certainly isn’t as good as Whisper, especially with proper nouns and some grammar nitpicks, but it’s so much better than the standard keyboard dictation, which reads like it was written by someone with a tenuous grasp on the English language. Still, though, Whisper feels like a dream to me. How is it this good?