Topic

RLHF

Reinforcement learning from human feedback.

3 checkpoints