Computer Vision Faculty Publications

A Spatial-Temporal Deformable Attention Based Framework for Breast Lesion Detection in Videos

Chao Qin, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Jiale Cao, Tianjin University
Huazhu Fu, A-Star, Institute of High Performance Computing
Rao Muhammad Anwer, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Fahad Shahbaz Khan, Mohamed Bin Zayed University of Artificial IntelligenceFollow

Document Type

Conference Proceeding

Publication Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

Detecting breast lesion in videos is crucial for computer-aided diagnosis. Existing video-based breast lesion detection approaches typically perform temporal feature aggregation of deep backbone features based on the self-attention operation. We argue that such a strategy struggles to effectively perform deep feature aggregation and ignores the useful local information. To tackle these issues, we propose a spatial-temporal deformable attention based framework, named STNet. Our STNet introduces a spatial-temporal deformable attention module to perform local spatial-temporal feature fusion. The spatial-temporal deformable attention module enables deep feature aggregation in each stage of both encoder and decoder. To further accelerate the detection speed, we introduce an encoder feature shuffle strategy for multi-frame prediction during inference. In our encoder feature shuffle strategy, we share the backbone and encoder features, and shuffle encoder features for decoder to generate the predictions of multiple frames. The experiments on the public breast lesion ultrasound video dataset show that our STNet obtains a state-of-the-art detection performance, while operating twice as fast inference speed. The code and model are available at https://github.com/AlfredQin/STNet.

First Page

479

Last Page

488

DOI

10.1007/978-3-031-43895-0_45

Publication Date

10-1-2023

Keywords

Breast lesion detection, Multi-frame prediction, Spatial-temporal deformable attention, Ultrasound videos, Computer aided diagnosis, Decoding, Feature extraction, Medical imaging, Signal encoding, Ultrasonic applications

Comments

IR conditions: non-described

Recommended Citation

C. Qin et al., "A Spatial-Temporal Deformable Attention Based Framework for Breast Lesion Detection in Videos," Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14221 LNCS, pp. 479 - 488, Oct 2023.

The definitive version is available at https://doi.org/10.1007/978-3-031-43895-0_45

Additional Links

https://doi.org/10.1007/978-3-031-43895-0_45

Link to Full Text

COinS

Computer Vision Faculty Publications

A Spatial-Temporal Deformable Attention Based Framework for Breast Lesion Detection in Videos

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Browse

Contribute

Links

Computer Vision Faculty Publications

A Spatial-Temporal Deformable Attention Based Framework for Breast Lesion Detection in Videos

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Share

Browse

Contribute

Links