2 points | by famouswaffles 5 hours ago ago
1 comments
An Open Code Instance with Read, Grep, Bash tools achieved human performance on the preview games
For the full benchmark, The ARC-AGI 3 paper confirms Opus 4.6 scored 97.1%.
https://arcprize.org/media/ARC_AGI_3_Technical_Report.pdf
I was wondering why the scoring for 3 was so convoluted and I'm starting to see why. This is a solved benchmark in any way that matters.
An Open Code Instance with Read, Grep, Bash tools achieved human performance on the preview games
For the full benchmark, The ARC-AGI 3 paper confirms Opus 4.6 scored 97.1%.
https://arcprize.org/media/ARC_AGI_3_Technical_Report.pdf
I was wondering why the scoring for 3 was so convoluted and I'm starting to see why. This is a solved benchmark in any way that matters.