The New “Strawberry” AI from OpenAI
Is Still Committing Irrational Errors
Thursday saw the public release of OpenAI’s eagerly anticipated AI model, known by the code name “Strawberry.”
In its introduction, the Sam Altman-led startup promised huge things, saying that their “o1-preview” AI model “performs similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology.”
However, as early adopters have already seen personally, it’s a long way from taking the place of a human scientist or programmer.
Actually, if recently circulated social media messages are any indication, the o1-preview is still frequently having trouble comprehending the most fundamental concepts.
In its introduction, the Sam Altman-led startup promised huge things, saying that their “o1-preview” AI model “performs similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology.”
For example, researcher Mathieu Acher of INSA Rennes discovered that it continues to regularly propose unlawful chess moves in answer to specific difficulties.
Users had differing responses even when they entered the query that OpenAI provided in its demo, which was a strawberry-themed logic puzzle.
“o1-preview gives the wrong answer to this prompt 75 percent of the time,” a user discovered.
According to certain users, the model is occasionally still having trouble with one of the trickiest word puzzles for AI language models: figuring out how many times the letter “R” appears in the word “strawberry.”
To be fair, OpenAI made it quite apparent from the beginning that its most recent AI is still a work in progress.
“As an early model, it doesn’t yet have many of the features that make ChatGPT useful, like browsing the web for information and uploading files and images,” the business stated in its press release “For many common cases GPT-4o will be more capable in the near term.”
o1-public is very different from its predecessors, such as GPT-4o, which drives the company’s well-known ChatGPT chatbot, because of a novel “chain of thought” process. It builds up iterative solutions gradually rather than spewing out the first response it can manage until it reaches a conclusion.
That may considerably increase the response time. According to what one user discovered, the new AI model miscalculated the answer to a word problem after taking 92 seconds to figure it out.
Noam Brown, an OpenAI research scientist who worked on the new model, stated that allowing it to develop gradually could produce some ground-breaking solutions.
in order to demonstrate the Riemann Hypothesis?”
Renowned AI critic Gary Marcus was not impressed with those grandiose findings.
As a response, he wrote, “I really like a lot of your work, but the tweet below rubs me the wrong way because it invites the inference that running versions of o1 for weeks or months might create a new cancer drug (in reality, you aren’t going to create breakthrough batteries, you just get new candidates, but still need to do the clinical work).” shortcut the lab work) or prove the Riemann Hypothesis.”
“This is not realistic,” he added. “As you acknowledge o1 is still unreliable even at tic-tac-toe, and in some cases no better than earlier models. Longer processing times are unlikely to reach transcendent reasoning.”
(To be fair, Brown also conceded that the new model is still flubbing certain answers, including ones as fundamental as tic-tac-toe.)
Marcus is tapping into a heated debate surrounding the tremendous hype gripping the AI industry.
In short, the company’s latest AI still falling for the same old traps isn’t exactly confidence-inducing.
OpenAI promised that it’s only the beginning, though, symbolically naming its model to reset the “counter back to 1” which, given it’s stumbling right out of the gate, might end up being an appropriate name after all.
Discover more from Postbox Live
Subscribe to get the latest posts sent to your email.