Astro - Hacker News

1 comments

mnkv 2 hours ago ago

reasonable post with a decent analogy explaining on-policy learning, only major thing I take issue with is
> Reinforcement learning is a technical subject—there are whole textbooks written about it.
and then linking to the still wip RLHF book instead of the book on RL: Sutton & Barto.