Did OpenAI Cheat on Its Big Math Test?

Source of this Article
Decrypt 10 months ago 383

How intelligent is a model that memorizes the answers before an exam? That’s the question facing OpenAI after it unveiled o3 in December, and touted its model's impressive benchmarks. At the time, some pundits hailed it as being almost as powerful as AGI, the level at which artificial intelligence is capable of achieving the same performance as a human on any task required by the user.

But money changes everything—even math tests, apparently.

OpenAI's victory lap over its o3 model's stunning 25.2% score on FrontierMath, a challenging mathematical benchmark developed by Epoch AI, hit a snag when it turned out the company wasn't just acing the test—OpenAI helped write it, too.

“We gratefully acknowledge OpenAI for their support in creating the benchmark,” Epoch AI wrote in an updated footnote on the FrontierMath whitepaper—and this was enough to raise some red flags among enthusiasts.

screenshot from Epoch AI's research paper recognizing OpenAI's support during the development of their FrontierMath benchmark datasted


Facebook X WhatsApp LinkedIn Pinterest Telegram Print Icon


BitRss shares this Content always with Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.

Read Entire Article


Screenshot generated in real time with SneakPeek Suite

BitRss World Crypto News | Market BitRss | Short Urls
Design By New Web | ScriptNet