AI benchmark cheating has been theorized as an inevitable consequence of training capable optimizers against fixed metrics. With OpenAI's GPT-5.6 Sol, the theory arrived in full view. The nonprofit ...
Transformer on MSN
GPT-5.6 cheats so much METR couldn't measure it
OpenAI’s new model broke rules and exploited loopholes more than any model METR has tested to date ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results