trl

0.16.1
11.66M

Train transformer language models with reinforcement learning.