HUI360: A dataset and baselines for Human Robot Interaction Anticipation

Abstract

As robots increasingly operate in human-populated environments, anticipating human intentions is essential for enabling proactive and socially aware behavior. Automatic anticipation of human–robot interactions is thus emerging as a crucial perception challenge for embodied agents.

To this end, we introduce HUI360, the largest dataset for human-robot interaction anticipation in the wild and its set of baselines. The dataset was collected from a mobile robot, in the wild, over multiple days within a 3-month period, and in several environments, capturing natural, spontaneous behaviors from both passersby and users, and encompassing a diverse range of individuals. This variety enables evaluating and improving the generalization capabilities of interaction anticipation models.

We designed a pipeline and share code for automatic interaction annotation in arbitrary 360° equirectangular videos, along with interfaces for manual refinement. Using this pipeline, we release the HUI360 open set of 1M pre-processed annotations, including detailed 2D poses, facial keypoints, and segmentation masks, obtained using state-of-the-art computer vision methods and manually curated to ensure high-quality tracking and interaction annotation. Additionally, we release the raw panoptic 360° images captured from the robot’s egocentric viewpoint (on demand, for research purpose only in compliance with GDPR).

Finally, we establish benchmark baselines for interaction anticipation, including the first cross-dataset evaluations for this task: to this end, we also release 6M annotations for another existing in-the-wild outdoor dataset collected from a mobile robot (SSUP-HRI).

Dataset overview

(Recordings split)	HUI360 (Ours - Indoor w/ Shelfy)	HUI360 (SSUP-HRI - Outdoor w/ Trashcan)
Recording duration	71h	26h
Recording duration (after filtering)	11h	—
Individual tracks	4310	28000
Interactions with the robot	375	419
Images	621,000	1.4M
Detections	> 1M	6M

Processed data

Modalities comparison (better seen with Chrome).

Base ViTPose Sapiens Masks

Dataset access

You can access the HUI360 (Skeletons) dataset on Hugging Face.
To access the full dataset please visit HUI360-Videos and request access with an institutional email, then a DTA will be sent to you for approval. Access to the unanonymized videos is reserved for researchers of approved institutions and provided that users agree to the terms of use (including GDPR compliance).

HUI360 HUI360-Videos

Automatic annotation pipeline

We also provide the pipeline to automatically annotate 360° videos of human-robot interaction from the robot's egocentric viewpoint. You can run it with videos from cameras like the Insta360 attached to a robot or a person.

Interact360

HUI360 Recordings with Shelfy

We recorded 71h of multimodal data and kept 11h of recordings with passerbys. Main data available are processed from 360° equirectangular videos.
If you are interested in other modalities please contact us (not available for all sessions).

Sensorized Shelfy robot.

SSUP-HRI

Part of the HUI360 dataset is based on the amazing SSUP-HRI dataset from Cornell IRL Team. To access the SSUP-HRI dataset please visit SSUP-HRI and request access following the instructions on the repository.

BibTeX

@article{TBD,
  author    = {Raphael Lorenzo-Louis and Fabio Amadio and Bertrand Luvison and Serena Ivaldi},
  title     = {HUI360: A dataset and baselines for Human Robot Interaction Anticipation},
  journal   = {TBD},
  year      = {2026},
}