trl

0.18.1
13.79M

Train transformer language models with reinforcement learning.