Document Type


Publication Title



Following unprecedented success on the natural language tasks, Transformers have been successfully applied to several computer vision problems, achieving state-of-the-art results and prompting researchers to reconsider the supremacy of convolutional neural networks (CNNs) as de facto operators. Capitalizing on these advances in computer vision, the medical imaging field has also witnessed growing interest for Transformers that can capture global context compared to CNNs with local receptive fields. Inspired from this transition, in this survey, we attempt to provide a comprehensive review of the applications of Transformers in medical imaging covering various aspects, ranging from recently proposed architectural designs to unsolved issues. Specifically, we survey the use of Transformers in medical image segmentation, detection, classification, reconstruction, synthesis, registration, clinical report generation, and other tasks. In particular, for each of these applications, we develop taxonomy, identify application-specific challenges as well as provide insights to solve them, and highlight recent trends. Further, we provide a critical discussion of the field's current state as a whole, including the identification of key challenges, open problems, and outlining promising future directions. We hope this survey will ignite further interest in the community and provide researchers with an up-to-date reference regarding applications of Transformer models in medical imaging. Finally, to cope with the rapid development in this field, we intend to regularly update the relevant latest papers and their open-source implementations at © 2022, CC BY-SA.


Publication Date



Computer vision; Convolutional neural networks; Image segmentation; Medical imaging; Surveys; Clinical report generation; Computer vision problems; Convolutional neural network; Imaging fields; Medical image analysis; Natural languages; Report generation; State of the art; Transformer; Vision transformer; Deep neural networks; Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)


Preprint: arXiv

  • Archived with thanks to arXiv
  • Preprint License: CC by SA
  • Uploaded 24 March 2022