OpenAI Launches GPT-5, the World’s Smartest Model for the Next 8 Weeks

Alex Heath, reporting for The Verge:

OpenAI is releasing GPT-5, its new flagship model, to all of its ChatGPT users and developers.

CEO Sam Altman says GPT-5 is a dramatic leap from OpenAI’s previous models. He compares it to “something that I just don’t wanna ever have to go back from,” like the first iPhone with a Retina display.

OpenAI says that GPT-5 is smarter, faster, and less likely to give inaccurate responses. “GPT-3 sort of felt like talking to a high school student,” Altman said during a recent press briefing I attended. “You could ask it a question. Maybe you’d get a right answer, maybe you’d get something crazy. GPT-4 felt like you’re talking to a college student. GPT-5 is the first time that it really feels like talking to a PhD-level expert.”

The first thing you’ll notice about GPT-5 is that it’s presented inside ChatGPT as just one model, not a regular model and separate reasoning model. Behind the scenes, GPT-5 uses a router that OpenAI developed, which automatically switches to a reasoning version for more complex queries, or if you tell it “think hard.” (Altman called the previous model picker interface a “very confusing mess.”)

Just writing to ChatGPT 5, I got the sense that it’s much better at structuring its responses compared to GPT-4o, OpenAI’s last model. GPT-4o heavily relied on bullet points and tended to follow a three-act “introduction, elaboration, and conclusion” blueprint whenever it tried to explain something, whereas GPT-5 is more unique and varied in its response styles. For now, I don’t think the difference in everyday conversations is as drastic compared to the jump between GPT-3.5 and GPT-4, or even GPT-4 to GPT-4o, but perhaps my opinions will change once I get to writing code and reasoning with it more extensively.

The most prominent design change comes to the model picker, which now only has three options: the standard GPT-5 model, GPT-5 Thinking, and GPT-5 Pro, which extends thinking even more. This differentiation is a bit confusing because GPT-5 already thinks, but at its discretion. Whereas in older versions of ChatGPT, people had to explicitly choose a reasoning model, while the new version chooses for them when a query would benefit from extended reasoning. Opting for the Thinking model forces GPT-5 to reason, regardless of how complex ChatGPT perceives the question to be. But bafflingly, there’s also an option in the Tools menu to “think longer” in the standard GPT-5 model.

The Think Longer tool in the standard model, when tested, thought for 1 minute and 13 seconds, whereas GPT-5 Thinking thought for 1 minute and 25 seconds with the same query, a negligible difference. I did, however, prefer the bespoke thinking model’s answer over the standard GPT-5, so I think OpenAI should either clarify the ambiguity or consolidate the two options into one button in the Tools menu of the standard model. To my knowledge, there is no concrete difference between the Thinking and standard models, only that the former is forced to reason via custom instructions. Perhaps the instructions vary when using the Think Longer tool versus the Thinking model?

The new models seem enthusiastic about searching the web, especially when asked to reason, and haven’t hallucinated once while I’ve used them. I do still think they’re bad for generating code, however, as they don’t write the efficient, sensible, and readable code an experienced programmer would. GPT-5 still acts like an amateur who just read Apple’s SwiftUI documentation for the first time, which is often what one would want if they know they’re doing something wrong, but it isn’t ideal when writing new code. This is at the heart of why I think large language models are still bad at programming — they ignore the fact that code should often be as beautiful and logical as possible. While they do the job quickly, they’re hardly great at it. Good code is written to be concise, self-explanatory, and straightforward, and LLMs don’t write good code.

GPT-5’s prose is still pretty rough, and anyone with two functioning eyes and a slice of a human soul should still be able to suss out artificial intelligence-generated text pretty easily. This isn’t a watershed moment for LLMs, and it’s beginning to look like that day might never come. There’s an inherent messiness to the way humans write: our sentences are varied in structure, some paragraphs are clearer than others, and most good writers try to establish a connection with their audience through some kind of rhetoric or literary device. Human-written prose is concise and matter-of-fact when it can be and long-winded when it matters. We use repetition, adverbs, and contractions without even thinking. Writing by humans isn’t perfect, and that’s what makes it inherently human.

AI-generated writing is too perfect. When it tries to establish a connection with the reader, perhaps by changing its tone to be more conversational and hip, it sounds too artificial. Here’s a small quote from a GPT-5 response that I think illustrates this well:

If you want, I can give you a condensed “master chart” that shows all the major tenses for regular verbs side-by-side so you can see the relationships and re-use the patterns instead of memorizing each one from scratch. That way, you’re memorizing shapes and connections, not 100+ isolated forms.

Maybe some less-experienced readers can’t tell this is AI-generated, but I could, even if I didn’t know it was beforehand. The “If you want…” at the beginning of the sentence comes off as artificial because ChatGPT overuses that phrase. It ends almost every one of its responses with a similar call to action or request for further information. A human, by contrast, may structure that sentence like this: “I could make a ‘master chart’ to show a bunch of major tenses for regular verbs to memorize the connections between the words rather than the isolated forms.” Some people, perhaps in more informal or casual contexts, may omit the request and just give a recommendation. “I should give you a master chart of major tenses.” ChatGPT, or any LLM, does not vary its style like this, instead aiming for a stoic, robotic, “I am trained to assist you” demeanor.

ChatGPT writes like a highly enthusiastic, drunk-on-coffee personal assistant. I don’t think that’s a personality or something coded into its post-training, but rather a consequence of ChatGPT’s existence as an amalgamation of all the internet’s text. LLMs write based on the statistically likely next word in a sentence, whereas humans convert their thoughts into words in their language based on their existing knowledge of that language. Math is always right, whereas human knowledge and thoughts aren’t, leading to the natural human imperfections expected in prose. While ChatGPT’s sentence structure is the most correct way to word a passage after studying every text published on the internet, humans don’t worry about what is correct — they simply translate their (usually rough) thoughts into words.

All of this is to say that GPT-5 doesn’t meaningfully change the calculus of when to use an LLM. It’s still not perfect at coding, it may make up numbers sometimes, and its prose reads unnaturally. But I think it’s even better at reasoning, especially when researching on the web, which has always been the primary reason I use ChatGPT. No other chatbot came close to ChatGPT before GPT-5, and they’re certainly all way behind now. While it may pale in comparison to Google Search in some rare cases — which I’m happy to point out — ChatGPT is the best web research tool on the market, and I find that GPT-5 is reliable, fast, and thorough when I use it to search. In that regard, I tend to agree with Altman: GPT-5 is the best model for doing what ChatGPT has historically been the best at.

What OpenAI hasn’t invented on Thursday is a digital God or anything similar. This is not artificial general intelligence or a computer that will replace all people. It’s yet another iteration of the LLMs that have captivated the world for nearly three years. I bet that in a few weeks, Google or Anthropic will pipe out another “World’s Best Language Model” and we’ll be having this conversation yet again. Until then, OpenAI should be proud of its work.