Humanoid Locomotion#

Train and deploy locomotion policies for humanoid robots across three stages:

Training in IsaacGym, IsaacSim, MuJoCo
Sim2Sim evaluation across multiple simulators
Real-world deployment (networked controller)

Supported robots: g1_dof29 (full-body without hands) and g1_dof12 (lower-body).

Environment Setup#

Core Dependencies#

For Python > 3.8 (IsaacSim/Mujoco)#

You can install rsl-rl-lib directly via pip:

pip install rsl-rl-lib

For Python 3.8 (IsaacGym)#

Due to compatibility requirements with IsaacGym’s Python 3.8 environment, you’ll need to install from source with modified dependencies:

# Clone the repository and checkout v3.1.0
git clone https://github.com/leggedrobotics/rsl_rl && \
cd rsl_rl && \
git checkout v3.1.0

# Apply compatibility patches
sed -i 's/"torch>=2\.6\.0"/"torch>=2.4.1"/' pyproject.toml && \
sed -i 's/"torchvision>=0\.5\.0"/"torchvision>=0.19.1"/' pyproject.toml && \
sed -i 's/"tensordict>=0\.7\.0"/"tensordict>=0.5.0"/' pyproject.toml && \
sed -i '/^# SPDX-License-Identifier: BSD-3-Clause$/a from __future__ import annotations' rsl_rl/algorithms/distillation.py

# Install in editable mode
pip install -e .

Optional Dependencies#

LiDAR Sensor Support (OmniPerception)#

The LiDAR implementation uses the OmniPerception package for GPU-accelerated ray tracing.

For IsaacGym and MuJoCo:

Install the LidarSensor package:

cd /path/to/OmniPerception/LidarSensor
pip install -e .

For complete IsaacGym/MuJoCo integration details, see the OmniPerception IsaacGym example.

For IsaacSim (IsaacLab):

First you need to install IsaacLab from source. Follow the official IsaacLab installation guide.

Then install the LiDAR sensor extension:

cd /path/to/OmniPerception/LidarSensor/LidarSensor/example/isaaclab
./install_lidar_sensor.sh /path/to/your/IsaacLab

For complete IsaacSim integration details, see the OmniPerception IsaacLab example.

Real-World Deployment (unitree_sdk2_python)#

For real-world deployment, install the unitree_sdk2_python package:

cd third_party
git clone https://github.com/unitreerobotics/unitree_sdk2_python.git
cd unitree_sdk2_python
pip install -e .

Training#

RSL-RL PPO Training#

General form:

python roboverse_learn/rl/rsl_rl/ppo.py \
  --task <your_task> \
  --sim <simulator> \
  --num-envs <num_envs> \
  --robot <your_robot>

Examples:

G1 humanoid walking (IsaacGym):

python roboverse_learn/rl/rsl_rl/ppo.py \
  --task walk_g1_dof29 \
  --sim isaacgym \
  --num-envs 8192 \
  --robot g1_dof29

G1 DOF12 walking (IsaacGym):

python roboverse_learn/rl/rsl_rl/ppo.py \
  --task walk_g1_dof12 \
  --sim isaacgym \
  --num-envs 8192 \
  --robot g1_dof12

G1 humanoid walking (IsaacSim):

python roboverse_learn/rl/rsl_rl/ppo.py \
  --task walk_g1_dof29 \
  --sim isaacsim \
  --num-envs 4096 \
  --robot g1_dof29

Resuming Training#

To resume training from a checkpoint:

python roboverse_learn/rl/rsl_rl/ppo.py \
  --task walk_g1_dof29 \
  --sim isaacgym \
  --num-envs 8192 \
  --robot g1_dof29 \
  --resume <timestamp_directory> \
  --checkpoint -1

The --checkpoint -1 flag loads the latest checkpoint. You can specify a specific iteration number instead.

Other RL Algorithms#

The framework supports other RL algorithms in roboverse_learn/rl/:

CleanRL implementations: PPO, SAC, TD3
- Training: python roboverse_learn/rl/clean_rl/{ppo,sac,td3}.py
Stable-Baselines3: Available in roboverse_learn/rl/sb3/

Refer to the respective training scripts for algorithm-specific parameters.

Output Directory Structure#

Outputs and checkpoints are saved to:

outputs/<robot>/<task>/<timestamp>/

For example:

outputs/g1_dof29/walk_g1_dof29/2025_1203_150000/

Each training run contains:

model_<iter>.pt - Checkpoint files saved at specified intervals
policy.pt - Final exported JIT policy (for deployment)
Training logs (TensorBoard/WandB depending on configuration)

Evaluation#

Evaluate trained policies across different simulators using the evaluation script.

General form:

python roboverse_learn/rl/rsl_rl/eval.py \
  --task <your_task> \
  --sim <simulator> \
  --num-envs <num_envs> \
  --robot <your_robot> \
  --resume <timestamp_directory> \
  --checkpoint <iter>

Examples:

IsaacGym evaluation:

python roboverse_learn/rl/rsl_rl/eval.py \
  --task walk_g1_dof29 \
  --sim isaacgym \
  --num-envs 1 \
  --robot g1_dof29 \
  --resume 2025_1203_150000 \
  --checkpoint -1

MuJoCo evaluation:

python roboverse_learn/rl/rsl_rl/eval.py \
  --task walk_g1_dof12 \
  --sim mujoco \
  --num-envs 1 \
  --robot g1_dof12 \
  --resume 2025_1203_150000 \
  --checkpoint -1

IsaacSim evaluation:

python roboverse_learn/rl/rsl_rl/eval.py \
  --task walk_g1_dof29 \
  --sim isaacsim \
  --num-envs 1 \
  --robot g1_dof29 \
  --resume 2025_1203_150000 \
  --checkpoint -1

The evaluation script runs for 1,000,000 steps with fixed velocity commands for thorough policy assessment.

Real-World Deployment#

Real-world deployment entry point:

python scripts/unitree_deploy/deploy_real.py <network_interface> <robot_yaml> [--domain-id <id>]

Example:

python scripts/unitree_deploy/deploy_real.py eno1 g1_dof29_dex3.yaml

Configuration files are located in scripts/unitree_deploy/configs/:

g1_dof12.yaml - 12 DOF lower-body control
g1_dof29.yaml - 29 DOF full-body control
g1_dof29_dex3.yaml - 43 DOF with DeX3 hands

In the YAML file, set the policy_path to your exported JIT policy (the policy.pt file from training output).

This will initialize the real controller and stream commands to the robot. Ensure your networking and safety interlocks are correctly configured.

Sim-Side Validation (Unitree SDK2 Protocol)#

You can validate the same controller against a simulator by running a Unitree SDK2 protocol server backed by RoboVerse. Use a non-zero DDS domain id (e.g. 1) for simulators to avoid colliding with real robots. For protocol architecture and API details, see Protocol Simulation.

Terminal A (sim server):

python scripts/protocol_sim/unitree_sdk2_sim_server.py \
  --sim mujoco \
  --robot g1_dof29 \
  --domain-id 1 \
  --iface lo \
  --auto-remote \
  --elastic-band \
  --elastic-band-height 2.5 \
  --elastic-band-length 0.1

Terminal B (controller, unchanged):

python scripts/unitree_deploy/deploy_real.py lo g1_dof29.yaml --domain-id 1

Standby/takeover behavior (important):

The sim server enables standby control by default (--no-standby disables it).
It switches from standby to protocol control only after 100 consecutive “active” LowCmd frames.
A frame is considered active when any kp, kd, or tau exceeds a small threshold.
With control_dt=0.02 (default deploy config), this is about 2.0s of active commands before takeover.
If you need faster takeover for debugging, run the server with --no-standby.

Elastic band assist (optional safety harness):

Enable with --elastic-band.
You can set the anchor height from terminal with --elastic-band-height <meters> (default 2.0, lower values reduce initial lift force).
You can also tune rest length with --elastic-band-length <meters> (default 0.0, clamped to >= 0; larger values reduce pull strength). The band is tension-only: if rest length exceeds current anchor distance, it goes slack instead of pushing.
While the server is running in an interactive terminal, you can tune live with keys (no restart): press 7 (stronger assist / shorter rest length), 8 (weaker assist / longer rest length), ] (anchor up), [ (anchor down), r (release band), s (show), h (help).
In line mode, use release to remove the band manually.
Live key adjustment step is configurable via --elastic-band-key-step <meters> (default 0.1).
This applies an external spring-damper force from a fixed world anchor point to the robot torso/base, helping prevent early falls during bring-up.
The force scales with distance/stretch and relative velocity along the band direction (spring + damping behavior).
The band is removed immediately when you trigger manual release (r or release).
Intended as a debugging/safety aid for simulator validation, not as part of the final controller behavior.
Current implementation supports MuJoCo, IsaacGym, IsaacSim, and Newton backends.

Advanced Features#

Terrain Configuration#

The framework supports customizable terrain generation for training locomotion policies on varied ground conditions. Predefined terrain configurations are available in roboverse_pack/grounds/.

Supported Terrain Types#

Slope: Planar inclined surfaces (slope_cfg.py)
Stair: Staircase features (stair_cfg.py)
Obstacle: Random rectangular obstacle fields (obstacle_cfg.py)
Stone: Stone-like protrusions (stone_cfg.py)
Gap: Gaps that robots must traverse (gap_cfg.py)
Pit: Rectangular pits (pit_cfg.py)

Example Usage#

Terrain configurations are imported and used in task definitions. See task files in roboverse_pack/tasks/humanoid/locomotion/ for examples.

from metasim.scenario.grounds import GroundCfg, SlopeCfg, StairCfg

ground_cfg = GroundCfg(
    width=20.0,
    length=20.0,
    horizontal_scale=0.1,
    vertical_scale=0.005,
    static_friction=1.0,
    dynamic_friction=1.0,
    elements={
        "slope": [SlopeCfg(origin=[0, 0], size=[2.0, 2.0], slope=0.3)],
        "stair": [StairCfg(origin=[5, 0], size=[2.0, 2.0], step_height=0.1)]
    }
)

Terrain configuration is supported across all simulators (IsaacGym, IsaacSim, MuJoCo).

LiDAR Sensor Support#

The framework includes LiDAR point cloud sensing capabilities for enhanced perception-based locomotion policies.

Overview#

The LidarPointCloud query provides 3D point cloud data from a simulated LiDAR sensor:

Supports IsaacGym, IsaacSim, and MuJoCo simulators
Returns point clouds in both local (sensor frame) and world frames
Configurable sensor mounting location and type
Raycasts against terrain, ground, and scenario objects

Configuration#

LiDAR configuration is defined in task files. See roboverse_pack/tasks/humanoid/locomotion/walk_g1_dof29.py for examples.

Sensor Parameters#

link_name (str): Name of the robot link where LiDAR is mounted (default: “mid360_link”)
sensor_type (str): Type of LiDAR sensor pattern (default: “mid360”)
apply_optical_center_offset (bool): Apply optical center offset correction
optical_center_offset_z (float): Z-axis offset for optical center
enabled (bool): Enable/disable LiDAR query

Output Format#

The query returns a dictionary with:

points_local: Point cloud in sensor frame (E, N, 3)
points_world: Point cloud in world frame (E, N, 3)
dist: Distance measurements (when available)
link: Name of the link the sensor is attached to

where E is the number of environments and N is the number of points per scan.

Command-line Arguments#

Common Training/Evaluation Arguments#

--task (str): Task name. Examples: walk_g1_dof29, walk_g1_dof12
--robot (str): Robot identifier. Examples: g1_dof29, g1_dof12
--num-envs (int): Number of parallel environments
--sim (str): Simulator. Supported: isaacgym, isaacsim, mujoco
--headless (bool): Headless rendering (default: False)
--device (str): Device for training (default: “cuda:0”)
--seed (int): Random seed (default: 1)

Training-Specific Arguments (RSL-RL PPO)#

--max-iterations (int): Number of training iterations (default: 50000)
--save-interval (int): Checkpoint save interval (default: 100)
--logger (str): Logger type. Options: tensorboard, wandb, neptune (default: “tensorboard”)
--wandb-project (str): WandB project name (default: “rsl_rl_ppo”)
--use-wandb (bool): Enable WandB logging (default: False)

Evaluation/Checkpoint Arguments#

--resume (str): Timestamp directory containing checkpoints
--checkpoint (int): Checkpoint iteration to load. -1 loads the latest (default: -1)

Examples#

Training with custom parameters:

python roboverse_learn/rl/rsl_rl/ppo.py \
  --task walk_g1_dof29 \
  --robot g1_dof29 \
  --sim isaacgym \
  --num-envs 8192 \
  --max-iterations 30000 \
  --save-interval 200 \
  --logger wandb \
  --use-wandb

Resume from checkpoint:

python roboverse_learn/rl/rsl_rl/ppo.py \
  --task walk_g1_dof29 \
  --robot g1_dof29 \
  --sim isaacgym \
  --num-envs 8192 \
  --resume 2025_1203_150000 \
  --checkpoint 5000

Evaluation with specific checkpoint:

python roboverse_learn/rl/rsl_rl/eval.py \
  --task walk_g1_dof29 \
  --robot g1_dof29 \
  --sim mujoco \
  --num-envs 1 \
  --resume 2025_1203_150000 \
  --checkpoint 10000