This project demonstrates how to combine multiple reinforcement learning policies (walking and running) into a single hybrid policy that can smoothly transition between different behaviors. It uses the Humanoid environment from Gymnasium and implements policy fusion using PyTorch and Stable-Baselines3.
.
├── hybrid_policy_training.py # Main training script
├── requirements.txt # Project dependencies
├── walk_policy_final.zip # Trained walking policy
├── run_policy_final.zip # Trained running policy
└── hybrid_policy.pth # Trained hybrid policy
- Individual Policy Training: Separate policies for walking and running behaviors
- Hybrid Architecture:
- Shared feature extractor
- Task-specific policy heads
- Gating mechanism for smooth transitions
- Modern Tools:
- PyTorch for neural networks
- Stable-Baselines3 for RL algorithms
- Weights & Biases for experiment tracking
- Gymnasium (Mujoco) for environment simulation
- Clone the repository:
git clone https://github.com/deepak-lenka/Fuse-separate-RL.git
cd Fuse-separate-RL
- Install dependencies:
pip install -r requirements.txt
- Make sure you have Mujoco installed for the Humanoid environment.
Run the main training script:
python hybrid_policy_training.py
This will:
- Train a walking policy
- Train a running policy
- Combine them into a hybrid policy
- Shared Network: Extracts common features (256 → 128 units)
- Policy Heads: Separate networks for walking and running (128 → 64 → action_dim)
- Gating Network: Learns when to blend policies (state_dim → 64 → 1)
- Individual policies are trained using PPO
- Hybrid policy combines learned behaviors
- Gating mechanism enables smooth transitions
The training achieves:
- Walking Policy: Stable locomotion at 2.0 m/s
- Running Policy: Efficient movement at 5.0 m/s
- Hybrid Policy: Smooth transitions between gaits
Feel free to open issues or submit pull requests for improvements.
MIT License
Deepak Lenka