Computer Vision Faculty Publications

TransGOP: Transformer-Based Gaze Object Prediction

Binglu Wang, Xi'an University of Architecture and Technology
Chenxi Guo, Xi'an University of Architecture and Technology
Yang Jin, Xi'an University of Architecture and Technology
Haisheng Xia, University of Science and Technology of China
Nian Liu, Mohamed Bin Zayed University of Artificial IntelligenceFollow

Document Type

Conference Proceeding

Publication Title

Proceedings of the AAAI Conference on Artificial Intelligence

Abstract

Gaze object prediction aims to predict the location and category of the object that is watched by a human. Previous gaze object prediction works use CNN-based object detectors to predict the object's location. However, we find that Transformer-based object detectors can predict more accurate object location for dense objects in retail scenarios. Moreover, the long-distance modeling capability of the Transformer can help to build relationships between the human head and the gaze object, which is important for the GOP task. To this end, this paper introduces Transformer into the fields of gaze object prediction and proposes an end-to-end Transformer-based gaze object prediction method named TransGOP. Specifically, TransGOP uses an off-the-shelf Transformer-based object detector to detect the location of objects and designs a Transformer-based gaze autoencoder in the gaze regressor to establish long-distance gaze relationships. Moreover, to improve gaze heatmap regression, we propose an object-to-gaze cross-attention mechanism to let the queries of the gaze autoencoder learn the global-memory position knowledge from the object detector. Finally, to make the whole framework end-to-end trained, we propose a Gaze Box loss to jointly optimize the object detector and gaze regressor by enhancing the gaze heatmap energy in the box of the gaze object. Extensive experiments on the GOO-Synth and GOO-Real datasets demonstrate that our TransGOP achieves state-of-the-art performance on all tracks, i.e., object detection, gaze estimation, and gaze object prediction. Our code will be available at https://github.com/chenxiGuo/TransGOP.git.

First Page

10180

Last Page

10188

DOI

10.1609/aaai.v38i9.28883

Publication Date

3-25-2024

Recommended Citation

B. Wang et al., "TransGOP: Transformer-Based Gaze Object Prediction," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 9, pp. 10180 - 10188, Mar 2024.

The definitive version is available at https://doi.org/10.1609/aaai.v38i9.28883

This document is currently not available here.

COinS

Computer Vision Faculty Publications

TransGOP: Transformer-Based Gaze Object Prediction

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Browse

Contribute

Links

Computer Vision Faculty Publications

TransGOP: Transformer-Based Gaze Object Prediction

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Share

Browse

Contribute

Links