Data Collection
Humanoid data collection is fundamentally different from arm-only workflows. The K1 has 22+ degrees of freedom, must maintain balance during teleoperation, and requires synchronized multi-modal capture. This page covers the challenges, methods, dataset format, and safety protocol.
Humanoid Data Collection Challenges
Collecting high-quality demonstrations on a full-size humanoid requires addressing challenges that don't exist on desktop arms.
Balance During Teleoperation
The K1 must maintain whole-body balance while the operator controls the arms. Arm movements shift the center of mass, requiring the locomotion controller to compensate continuously. Rapid arm commands can destabilize the robot.
High-Dimensional State
Full-body joint state includes 22 DOF plus IMU, head pose, and optional hand state — 30+ dimensions per timestep. Dataset files are significantly larger than arm-only datasets. Storage planning is essential.
Multi-Camera Synchronization
Humanoid tasks typically require egocentric (head-mounted) and exocentric (external) cameras. Synchronizing multiple video streams with joint telemetry at 50 Hz+ requires careful pipeline design.
Operator Fatigue
VR-based whole-body teleoperation is physically demanding. Sessions longer than 30 minutes per operator significantly degrade demonstration quality. Plan for operator rotation in extended collection campaigns.
Teleoperation Methods for Humanoids
Two primary methods are supported for upper-body teleoperation. Locomotion is always controlled via velocity commands from a gamepad or autonomously.
VR Whole-Body Teleoperation Recommended
Uses Meta Quest 3 or similar VR headset to track operator head and hand pose. The K1's head and arm joints mirror the operator's movements in real time. Provides the most natural and expressive demonstrations.
Setup: Quest 3 + SteamVR, k1_vr_teleop ROS2 node, operator wears gloves for hand tracking.
Latency: ~20ms head, ~40ms arm end-to-end.
Best for: Manipulation tasks, pick-and-place, whole-body loco-manipulation.
Leader-Follower Upper Body Advanced
A second human-scale exoskeleton or leader arm system mirrors the follower K1's upper body. Joint angles are mapped directly from leader to follower. Does not require VR hardware.
Setup: Requires a compatible leader arm system (e.g., OpenArm bimanual kit or custom exoskeleton). Contact SVRC for partner configurations.
Best for: Precise bimanual manipulation where tracking accuracy is critical.
Locomotion during teleoperation
Upper-body teleoperation is typically combined with gamepad-controlled locomotion. The operator uses a wireless gamepad to command walking velocity while the VR system controls the arms and head:
# Launch combined teleop: VR for upper body + gamepad for locomotion
ros2 launch k1_teleop k1_combined_teleop.launch.py \
vr_device:=quest3 \
gamepad:=xbox \
robot_ip:=192.168.10.102
Whole-Body Dataset Format (30+ DoF)
Each episode records synchronized joint state, camera frames, and metadata. The format is compatible with LeRobot and HuggingFace datasets.
Episode structure
episode_000001/
joint_states.npy # [T, 44] — positions, velocities, torques for 22 joints
imu.npy # [T, 6] — accel (3) + gyro (3) from torso IMU
head_pose.npy # [T, 2] — yaw and pitch in radians
head_cam.mp4 # 1280x720 @ 30 fps, head-mounted egocentric
left_cam.mp4 # 1280x720 @ 30 fps, left wrist
right_cam.mp4 # 1280x720 @ 30 fps, right wrist
external_cam.mp4 # 1920x1080 @ 30 fps, fixed external view
timestamps.npy # [T] unix timestamps for joint_states
metadata.json # task name, operator, duration, success label
Joint state schema (22 joints × 2 values each)
# joint_states.npy shape: [timesteps, 44]
# Columns: [q0_pos, q0_vel, q1_pos, q1_vel, ..., q21_pos, q21_vel]
# Joint index mapping:
# 0-5: Left leg (hip_pitch, hip_roll, hip_yaw, knee, ankle_pitch, ankle_roll)
# 6-11: Right leg (same order)
# 12: Waist (yaw)
# 13: Head yaw
# 14: Head pitch
# 15-21: Left arm (shoulder_pitch, shoulder_roll, shoulder_yaw,
# elbow_pitch, wrist_pitch, wrist_roll, wrist_yaw)
# 22-28: Right arm (same order)
# Note: total 29 joints in extended K1 config; base K1 has 22
Recording a session with k1_agent.py
# Start the platform agent (streams telemetry to RoboticsCenter)
python k1_agent.py \
--robot-ip 192.168.10.102 \
--platform-url https://fearless-backend-533466225971.us-central1.run.app \
--record \
--task "pick up red block" \
--cameras head_cam,left_wrist,right_wrist,external
# Episodes auto-numbered and saved to ./recordings/
Convert to LeRobot format
python convert_k1_to_lerobot.py \
--input-dir ./recordings/ \
--output-dir ./dataset/ \
--repo-id your-username/k1-pick-place
Safety Protocol During Data Collection
- ✓Spotter required at all times — one dedicated person monitors the robot and holds the e-stop. The teleoperator cannot simultaneously monitor safety.
- ✓3 m × 3 m clear perimeter — no bystanders, no cables, no equipment in the operational area during any live session.
- ✓Episode duration limit: 60 seconds — keep episodes short. Shorter episodes are easier to quality-filter and reduce risk from prolonged operation.
- ✓30-minute operator rotation — rotate teleoperators every 30 minutes in VR sessions. Fatigue degrades demonstration quality and increases error rates.
- ✓Immediately abort and enter DAMP on any instability — if the K1 shows any unexpected oscillation or drift, hit the e-stop and restart from DAMP. Do not try to stabilize manually.
- ✓Log all incidents — document any falls, near-falls, or aborted episodes. This data is useful for dataset quality filtering and for improving safety procedures.
Episode Quality Checklist
Review each episode before adding it to your training dataset. Poor-quality demonstrations will degrade your policy.
- ✓The task was completed successfully end-to-end (no partial completions in training data)
- ✓Robot maintained stable balance throughout — no stumbles, oscillations, or compensatory jerks
- ✓All camera streams have complete frames with no dropped segments
- ✓Joint state timestamps are continuous (no gaps > 25 ms at 40 Hz recording)
- ✓Demonstration is smooth and deliberate — not rushed, not over-corrected
- ✓The object and task scene are visible in at least two camera streams throughout