Egocentric Human-Object Interaction Detection: A New Benchmark and Method

The Hong Kong Polytechnic University
MP-HOI teaser image

Comparision of human-object interactions from third-person perspective (top row) and egocentric perspective (bottom row). Different colors represent distinct elements of each HOI triplet < human/hand, verb, object>

Abstract

MP-HOI teaser image

Qualitative results

Baseline VS Our method

Interpolate start reference image.

Ground Truth VS Our method

We visualize the performance of our method on a sequence of keyframes of sequential actions. The ground truths are marked in green, and the predictions of our method are marked in red.

Image 1
Image 2
Image 3
Image 4
Image 5
Image 6
Image 7
Image 8
Image 9
Image 10
Image 1
Image 2
Image 3
Image 4
Image 5
Image 6
Image 7
Image 8
Image 9
Image 10
Image 11
Image 1
Image 2
Image 3
Image 4
Image 5
Image 6
Image 7
Image 8
Image 9
Image 10
Image 1
Image 2
Image 3
Image 4
Image 5
Image 6
Image 7
Image 8
Image 9
Image 10
Image 11
Image 12


We also show the performance of our method on all frames of each video clip. Although our method is image-based and does not consider temporal information, it still performs well on consecutive frames. The ground truths are marked in green, and the predictions of our method are marked in red. Here, we only display the interaction category labels.

BibTeX