Absolute Zero: Reinforced Self-Play Reasoning with Zero Data

(arxiv.org)

86 points | by leodriesch 4 days ago ago

18 comments