Why SWE-bench Verified no longer measures frontier coding capabilities

(openai.com)

75 points | by kmdupree 3 hours ago ago

60 comments