Hill-Climbing ARC-AGI-3. Human Performance with Opus 4.6

(blog.alexisfox.dev)

2 points | by famouswaffles 5 hours ago ago

1 comments

famouswaffles 5 hours ago ago

An Open Code Instance with Read, Grep, Bash tools achieved human performance on the preview games
For the full benchmark, The ARC-AGI 3 paper confirms Opus 4.6 scored 97.1%.
https://arcprize.org/media/ARC_AGI_3_Technical_Report.pdf
I was wondering why the scoring for 3 was so convoluted and I'm starting to see why. This is a solved benchmark in any way that matters.