Description:
As part of upgrading the Transformers library to the latest version, we should clean up and deprecate the existing PPO (Proximal Policy Optimization) implementation and related code paths.
Goals
- Remove or deprecate PPO-specific training code that is no longer actively used.
- Eliminate obsolete dependencies introduced solely for PPO support.
- Ensure compatibility with the latest Transformers release.
- Update documentation to reflect the removal/deprecation.
Description:
As part of upgrading the Transformers library to the latest version, we should clean up and deprecate the existing PPO (Proximal Policy Optimization) implementation and related code paths.
Goals