Measuring AI Ability to Complete Long Tasks: Opus 4.5 has 50% horizon of 4h49M

(metr.org)

95 points | by spicypete 4 hours ago ago

67 comments