Async RL Training
Async RL training emerges as dominant paradigm
What happened
Several open-source libraries have converged on disaggregating inference from training onto separate GPU pools, connecting them with a rollout buffer, and letting both sides run concurrently. A survey of 16 libraries compared them across seven axes, including orchestration primitives and buffer design. TRL is developing a new async trainer, guided by this survey.
Why it matters to you
personalizedWhy it matters to you
Async RL training requires significant changes to existing codebases, including disaggregating inference from training and implementing rollout buffers. Developers can leverage open-source libraries and design principles from the survey to implement async training.
What to do about it
Try implementing async training in a small-scale RL project using TRL's new async trainer, focusing on overlapping generation with training to improve GPU utilization.
Tags