Abstract: Accurate and robust visual object tracking is one of the most challenging and fundamental computer vision problems. It entails estimating the trajectory of the target in an image sequence, ...
HOI-DETR is a transformer-based framework for detecting hands, hand-held objects, and their interactions in images and video. Built on the Co-DETR architecture, it adds a lightweight interaction ...