With OpenAI's o3 and o4-mini, hallucinations will be improved (as in occur more often)

Free

*censored*
VVO Supporter 🍦🎈👾❤
Joined
Sep 22, 2018
Messages
42,274
Location
Moonbase Caligula
SL Rez
2008
Joined SLU
2009
SLU Posts
55565
OpenAI’s recently launched o3 and o4-mini AI models are state-of-the-art in many respects. However, the new models still hallucinate, or make things up — in fact, they hallucinate more than several of OpenAI’s older models.

Hallucinations have proven to be one of the biggest and most difficult problems to solve in AI, impacting even today’s best-performing systems. Historically, each new model has improved slightly in the hallucination department, hallucinating less than its predecessor. But that doesn’t seem to be the case for o3 and o4-mini.

According to OpenAI’s internal tests, o3 and o4-mini, which are so-called reasoning models, hallucinate more often than the company’s previous reasoning models — o1, o1-mini, and o3-mini — as well as OpenAI’s traditional, “non-reasoning” models, such as GPT-4o.
Perhaps more concerning, the ChatGPT maker doesn’t really know why it’s happening.
It's truly lovely that o3, what OpenAI calls their "most powerful reasoning model," not only can't stop making shit up, but is getting better at it. Perhaps some of the problem is that OpenAI itself has issues with hallucinations about their own work.
 

Soen Eber

Vatican mole
VVO Supporter 🍦🎈👾❤
Joined
Sep 20, 2018
Messages
3,958




It's truly lovely that o3, what OpenAI calls their "most powerful reasoning model," not only can't stop making shit up, but is getting better at it. Perhaps some of the problem is that OpenAI itself has issues with hallucinations about their own work.
Don't worry. Everything is under control.
 
  • 1Scared
Reactions: CronoCloud Creeggan

Free

*censored*
VVO Supporter 🍦🎈👾❤
Joined
Sep 22, 2018
Messages
42,274
Location
Moonbase Caligula
SL Rez
2008
Joined SLU
2009
SLU Posts
55565
Don't worry. Everything is under control.
That one? Hell, it couldn't even keep everyone alive. I'll ask for its help when I need to read lips.
 
  • 1LOL
Reactions: CronoCloud Creeggan