Top model scores may be skewed by Git history leaks in SWE-bench

(github.com)

457 points | by mustaphah 2 days ago ago

164 comments