trl

0.27.1
28.79M

Train transformer language models with reinforcement learning.