OpenAI just announced o3 and o3 mini, its next-gen reasoning models.
In the livestream, SVP of Research Mark Chen showed o3's performance on certain benchmarks, compared to o1, like competition math (96.7 percent) and PhD-level science (87.7 percent). OpenAI and the ARC Prize competition also shared how o3 scored 76 percent on the ARC-AGI benchmark, which includes novel unpublished datasets. The ARC-AGI benchmark is designed to test ability to learn new and distinct skills on the fly with every new task.