trl

0.29.0
32.21M

Train transformer language models with reinforcement learning.