Audio Overview (generated using NotebookLM)

Abstract

Despite significant progress in humanoid robotics, research remains fragmented: low-level motor skill learning often disregards the influence of long-horizontal goals on current movement and lacks situational awareness. While, high-level navigation struggles to accommodate real-world constraints and adapt to the irregularity of local terrains, falling short in last-step feasibility. To bridge these gaps, we present LEGO-H , a universal learning framework that trains humanoid robots to become veteran hikers on complex trails by developing and integrating skills across all levels, embracing physical embodiment through both visual perceptual awareness and body dynamics. At the heart of LEGO-H's designs is the harmonization of robots' visual perception, decision-making, and motor skill execution. Our key innovations include: (1) TC-ViTs, a Temporal Vision Transformer variant tailored into the Hierarchical Reinforcement Learning (HRL) framework, framing local navigation as a sequential hallucination task that softly guides locomotion policy learning. This design seamlessly grafts locomotion and goal navigation into a unified, end-to-end policy learning framework. (2) Hierarchical Loss Metric Set for Policy Distillation. LEGO-H leverages privileged learning to ensure motor skill versatility while tackling the domain shift between teacher and student stages. To achieve this, we utilize Variational Autoencoder (VAE) latent representations and masked reconstructions to capture kinematic dependencies inherent in humanoid joint structures. This design optimizes policy training and robust skill transfer with task-agnostic hierarchical loss functions that reflect the rationality of the structural relationship between the humanoid's joint actions. Based on these two techniques, LEGO-H can address challenges from both the physical constraints of humanoid robots and dynamic environments across various time scales, without relying on biased motion priors. Extensive experiments on diverse simulated hiking trials demonstrate LEGO-H’s robustness and versatility. We hope LEGO-H could serve as a baseline prototype for humanoid robots in this underexplored hiking domain.

Our Approach - LEGO-H

LEGO-H Framework Overview. LEGO-H equips humanoid robots with adaptive hiking skills by integrating navigation and locomotion in a unified, end-to-end learning framework (b). To foster the versatility of motor skills, we train the unified policy via privileged learning from oracle policy (a).

What's TC-ViTs?

TC-ViTs' Details. Three key components: (a) a goal-orientated temporal transformer encoder for robots cognizing surroundings with the final goal; (b) a dual process on the current depth frame for integrating spatially precise information to reflect the current state; (c) a recurrent goal adaptation mechanism that integrates visual awareness, goal information, and proprioception.

What's Hierarchical Loss Metric Set?

HLM' Details. HLM promotes the student robots to learn inter-joint dependencies and structural consistencies that align closely with the robot’s physical mechanism, rather than motion prior from human data.

Autonomous Hiking In Uneven Terrains

Embodied Path Exploration Over Extensive Obstacles

Autonomous Hiking on Out-of-Domain Trail (Zero-Shot)

Note

All demo videos on this page showcase the results of LEGO-H's unified policy: robots autonomously navigating and executing motor skills using depth inputs and proprioception.

BibTeX

Website template modified from incredible UMI-On-Legs, NeRFies, Scaling Up Distilling Down, and AnyCar. This website is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.