Guest Editorial Introduction to the Special Section on Transformer Models in Vision

Document Type

Editorial

Publication Title

IEEE Transactions on Pattern Analysis and Machine Intelligence

Abstract

Transformer models have achieved outstanding results on a variety of language tasks, such as text classification, ma- chine translation, and question answering. This success in the field of Natural Language Processing (NLP) has sparked interest in the computer vision community to apply these models to vision and multi-modal learning tasks. However, visual data has a unique structure, requiring the need to rethink network designs and training methods. As a result, Transformer models and their variations have been successfully used for image recognition, object detection, segmentation, image super-resolution, video understanding, image generation, text-image synthesis, and visual question answering, among other applications.

First Page

12721

Last Page

12725

DOI

10.1109/TPAMI.2023.3306164

Publication Date

10-3-2023

Keywords

Special issues and sections, Transformers, Text categorization, Machine translation, Natural language processing

Comments

IR Deposit conditions:

OA version (pathway a) Accepted version

No embargo

When accepted for publication, set statement to accompany deposit (see policy)

Must link to publisher version with DOI

Publisher copyright and source must be acknowledged

Share

COinS