We visualize the performance of our method on a sequence of keyframes of sequential actions. The ground truths are marked in green, and the predictions of our method are marked in red.
We also show the performance of our method on all frames of each video clip. Although our method is image-based and does not consider temporal information, it still performs well on consecutive frames. The ground truths are marked in green, and the predictions of our method are marked in red. Here, we only display the interaction category labels.