Skip to content

deepak-lenka/Fuse-separate-RL

Repository files navigation

Fusing RL Policies: Walking and Running

This project demonstrates how to combine multiple reinforcement learning policies (walking and running) into a single hybrid policy that can smoothly transition between different behaviors. It uses the Humanoid environment from Gymnasium and implements policy fusion using PyTorch and Stable-Baselines3.

Project Structure

.
├── hybrid_policy_training.py  # Main training script
├── requirements.txt           # Project dependencies
├── walk_policy_final.zip     # Trained walking policy
├── run_policy_final.zip      # Trained running policy
└── hybrid_policy.pth         # Trained hybrid policy

Features

  • Individual Policy Training: Separate policies for walking and running behaviors
  • Hybrid Architecture:
    • Shared feature extractor
    • Task-specific policy heads
    • Gating mechanism for smooth transitions
  • Modern Tools:
    • PyTorch for neural networks
    • Stable-Baselines3 for RL algorithms
    • Weights & Biases for experiment tracking
    • Gymnasium (Mujoco) for environment simulation

Installation

  1. Clone the repository:
git clone https://github.com/deepak-lenka/Fuse-separate-RL.git
cd Fuse-separate-RL
  1. Install dependencies:
pip install -r requirements.txt
  1. Make sure you have Mujoco installed for the Humanoid environment.

Usage

Run the main training script:

python hybrid_policy_training.py

This will:

  1. Train a walking policy
  2. Train a running policy
  3. Combine them into a hybrid policy

Architecture

Network Structure

  • Shared Network: Extracts common features (256 → 128 units)
  • Policy Heads: Separate networks for walking and running (128 → 64 → action_dim)
  • Gating Network: Learns when to blend policies (state_dim → 64 → 1)

Training Process

  1. Individual policies are trained using PPO
  2. Hybrid policy combines learned behaviors
  3. Gating mechanism enables smooth transitions

Results

The training achieves:

  • Walking Policy: Stable locomotion at 2.0 m/s
  • Running Policy: Efficient movement at 5.0 m/s
  • Hybrid Policy: Smooth transitions between gaits

Contributing

Feel free to open issues or submit pull requests for improvements.

License

MIT License

Author

Deepak Lenka