ICRA 2026 Project Page

STONE Dataset: A Scalable Multi-Modal Surround-View 3D Traversability Dataset for Off-Road Robot Navigation

Konyul Park*, Daehun Kim*, Jiyong Oh, Seunghoon Yu, Junseo Park, Jaehyun Park, Hongjae Shin, Hyungchan Cho, Jungho Kim, and Jun Won Choi†
Seoul National University
* Equal contribution    |    † Corresponding author
STONE dataset teaser middle
Overview

Abstract

Reliable off-road navigation requires accurate estimation of traversable regions and robust perception under diverse terrain and sensing conditions. However, existing datasets lack both scalability and multi-modality, which limits progress in 3D traversability prediction. In this work, we introduce STONE, a large-scale multi-modal dataset for off-road navigation. STONE provides (1) trajectory-guided 3D traversability maps generated by a fully automated, annotation-free pipeline, and (2) comprehensive surround-view sensing with synchronized 128-channel LiDAR, six RGB cameras, and three 4D imaging radars. The dataset covers a wide range of environments and conditions, including day and night, grasslands, farmlands, construction sites, and lakes. Our auto-labeling pipeline reconstructs dense terrain surfaces from LiDAR scans, extracts geometric attributes such as slope, elevation, and roughness, and assigns traversability labels beyond the robot’s trajectory using a Mahalanobis-distance-based criterion. This design enables scalable, geometry-aware ground-truth construction without manual annotation. Finally, we establish a benchmark for voxel-level 3D traversability prediction and provide strong baselines under both single-modal and multi-modal settings.

Presentation

Video

Platform

Robot Setup

STONE robot setup
  • 360° Rotating LiDAR: 1 × Hesai OT128 with 128 channels, a maximum range of 200 m, a field of view of 360° (H) × 40° (V), an angular resolution of 0.1° (H) × 0.125° (V), and a scanning frequency of 10 Hz.
  • Multi-view RGB Cameras: 6 × Basler ACE2 2A1920-51gcPRO with a resolution of 1920 × 1200 and a frame rate of 10 Hz.
  • 4D Imaging Radars: 3 × Continental ARS 548 RDI with a scanning frequency of 20 Hz.
  • Global Navigation Satellite System (GNSS): NovAtel PIM222A dual-antenna GNSS/INS with RTK capability and an update rate of 20 Hz.
  • Inertial Measurement Unit (IMU): EPSON G366P IMU providing inertial measurements at 200 Hz.

The multi-modal system was integrated and operated on Ubuntu 22.04 using the ROS 2 Humble framework.

Dataset

Statistics

43
Sequences
50,878
Frames
6 + 1 + 3
Cameras, LiDAR, 4D Radars
3D
Voxel Traversability Maps
Pipeline

Automatic Map Generation

1

Dense Surface Reconstruction

The accumulated sensor data is reconstructed into a dense 3D representation.

2

Geometry-Aware Feature Extraction

We compute geometric cues such as elevation, slope, and roughness.

3

Trajectory-Guided Labeling

Traversability labels are propagated from robot trajectories.

4

Benchmark Tasks

3D voxel-level traversability prediction are evaluated.