Audio Overview (generated using NotebookLM)

Why Humanoid Robots Need to Hike

Hiking, challenges humans to master diverse motor skills and adapt to complex, and unpredictable terrain -- such as steep slopes, wide ditches, tangled roots, and sudden elevation changes. It demands continuous balance, agility, and real-time decision-making, making it an ideal testbed for advancing humanoid autonomy and the integration of vision, planning, and motor control. Hiking-capable robots could explore remote areas, assist in rescue missions, and guide individuals along rugged paths.

The Chanllenges for Humanoid Robot Hiking

Hiking poses challenges beyond traditional navigation, blind locomotion, or single motor pattern learning:

(1) Locomotion versatility. Hiking trails span a vast range of terrains, such as dirt, rocks, stairs, and streams, often coexisting on a single trail. Such complexity and dangerousness of environments demand that robots go beyond blind locomotion and basic skills like walking and running. They must autonomously adapt to environmental variations with dynamic skills like jumping and leaping while maintaining balance on mixed surfaces.

(2) Body awareness. Hiking introduces a local planning problem beyond traditional goal navigation -- real-time adjustments to navigate local obstacles, terrain changes, and body states. This requires seamless coordination between visual perception and motor control, enabling the robots to adaptively plan feasible foot placements and movements to immediate environmental or body state changes, as they progress through the trail.

(3) Perceptual awareness. Navigating complex 3D trails requires robots to sense and react to diverse obstacles, like stepping over logs or navigating around trees. Robots must leverage perceptual awareness from onboard sensors to dynamically select context-appropriate and agile actions, ensuring safe trail traversal.

Project Abstract

Despite significant progress in humanoid robotics, research remains fragmented: low-level motor skill learning often disregards the influence of long-horizontal goals on current movement and lacks situational awareness. While, high-level navigation struggles to accommodate real-world constraints and adapt to the irregularity of local terrains, falling short in last-step feasibility. To bridge these gaps, we propse training humanoid robots to hike. We also present LEGO-H , a universal learning framework that trains humanoid robots to become veteran hikers on complex trails by developing and integrating skills across all levels, embracing physical embodiment through both visual perceptual awareness and body dynamics. At the heart of LEGO-H's designs is the harmonization of robots' visual perception, decision-making, and motor skill execution. Our key innovations include: (1) TC-ViT, a Temporal Vision Transformer variant tailored into the Hierarchical Reinforcement Learning (HRL) framework, framing local navigation as a sequential hallucination task that softly guides locomotion policy learning. This design seamlessly grafts locomotion and goal navigation into a unified, end-to-end policy learning framework. (2) Hierarchical Latent Matching Loss Metric for Policy Distillation. LEGO-H leverages privileged learning to ensure motor skill versatility while tackling the domain shift between teacher and student stages. To achieve this, we utilize Variational Autoencoder (VAE) latent representations and masked reconstructions to capture kinematic dependencies inherent in humanoid joint structures. This design optimizes policy training and robust skill transfer with task-agnostic hierarchical loss functions that reflect the rationality of the structural relationship between the humanoid's joint actions. Based on these two techniques, LEGO-H can address challenges from both the physical constraints of humanoid robots and dynamic environments across various time scales, without relying on biased motion priors. Extensive experiments on diverse simulated hiking trials demonstrate LEGO-H’s robustness and versatility. We hope LEGO-H could serve as a baseline prototype for humanoid robots in this underexplored hiking domain.

Our Approach - LEGO-H

LEGO-H Framework Overview. LEGO-H equips humanoid robots with adaptive hiking skills by integrating navigation and locomotion in a unified, end-to-end learning framework (b). To foster the versatility of motor skills, we train the unified policy via privileged learning from oracle policy (a).

What's TC-ViT?

TC-ViT' Details. Three key components: (a) a goal-orientated temporal transformer encoder for robots cognizing surroundings with the final goal; (b) a dual process on the current depth frame for integrating spatially precise information to reflect the current state; (c) a recurrent goal adaptation mechanism that integrates visual awareness, goal information, and proprioception.

What's Hierarchical Latent Matching Loss Metric?

HLM' Details. HLM promotes the student robots to learn inter-joint dependencies and structural consistencies that align closely with the robot’s physical mechanism, rather than motion prior from human data.

Autonomous Hiking In Uneven Terrains

Embodied Path Exploration Over Extensive Obstacles

Autonomous Hiking on Out-of-Domain Trail (Zero-Shot)

Note

All demo videos on this page showcase the results of LEGO-H's unified policy: robots autonomously navigating and executing motor skills using depth inputs and proprioception.

BibTeX

Website template modified from incredible UMI-On-Legs, NeRFies, Scaling Up Distilling Down, and AnyCar. This website is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.