Reinforcement Learning from Human Feedback

(arxiv.org)

30 points | by onurkanbkrc 2 hours ago ago

2 comments