HN
New
Show
Ask
Jobs
Built with Astro
Why averaging LLM benchmark scores is fundamentally broken
(arxiv.org)
1 points | by
testofschool
6 hours ago ago
1 comments
testofschool
6 hours ago ago
[flagged]
[flagged]