Why averaging LLM benchmark scores is fundamentally broken

(arxiv.org)

1 points | by testofschool 6 hours ago ago

1 comments