Egocentric Human-Object Interaction Detection: A New Benchmark and Method

Kunyuan Deng, Yi Wang, Lap-Pui Chau

The Hong Kong Polytechnic University

arXiv Data

Comparision of human-object interactions from third-person perspective (top row) and egocentric perspective (bottom row). Different colors represent distinct elements of each HOI triplet < human/hand, verb, object>

Abstract

Qualitative results

Baseline VS Our method

Ground Truth VS Our method

We visualize the performance of our method on a sequence of keyframes of sequential actions. The ground truths are marked in green, and the predictions of our method are marked in red.

We also show the performance of our method on all frames of each video clip. Although our method is image-based and does not consider temporal information, it still performs well on consecutive frames. The ground truths are marked in green, and the predictions of our method are marked in red. Here, we only display the interaction category labels.

BibTeX

@article{deng2025egocentric,
  title={Egocentric Human-Object Interaction Detection: A New Benchmark and Method},
  author={Deng, Kunyuan and Wang, Yi and Chau, Lap-Pui},
  journal={arXiv preprint arXiv:2506.14189},
  year={2025}
}