Computer Vision Faculty Publications

How to Train Vision Transformer on Small-scale Datasets?

Hanan Gani, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Muzammal Naseer, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Mohammad Yaqub, Mohamed Bin Zayed University of Artificial IntelligenceFollow

Document Type

Conference Proceeding

Publication Title

BMVC 2022 - 33rd British Machine Vision Conference Proceedings

Abstract

Vision Transformer (ViT), a radically different architecture than convolutional neural networks offers multiple advantages including design simplicity, robustness and state-of-the-art performance on many vision tasks. However, in contrast to convolutional neural networks, Vision Transformer lacks inherent inductive biases. Therefore, successful training of such models is mainly attributed to pre-training on large-scale datasets such as ImageNet with 1.2M or JFT with 300M images. This hinders the direct adaption of Vision Transformer for small-scale datasets. In this work, we show that self-supervised inductive biases can be learned directly from small-scale datasets and serve as an effective weight initialization scheme for fine-tuning. This allows to train these models without large-scale pre-training, changes to model architecture or loss functions. We present thorough experiments to successfully train monolithic and non-monolithic Vision Transformers on five small datasets including CIFAR10/100, CINIC10, SVHN, Tiny-ImageNet and two fine-grained datasets: Aircraft and Cars. Our approach consistently improves the performance of Vision Transformers while retaining their properties such as attention to salient regions and higher robustness. Our codes and pre-trained models are available at: https://github.com/hananshafi/vits-for-small-scale-datasets.

Publication Date

11-21-2022

Keywords

Computer vision, Convolutional neural networks, Large dataset, Network architecture, Training aircraft

Comments

Open Access version from BMVC

Free to distribute

Uploaded: May 30, 2024

Recommended Citation

H. Gani et al., "How to Train Vision Transformer on Small-scale Datasets?," BMVC 2022 - 33rd British Machine Vision Conference Proceedings, Nov 2022.

Additional Links

BMVC 2022 link: https://bmvc2022.mpi-inf.mpg.de/0731.pdf

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Computer Vision Faculty Publications

How to Train Vision Transformer on Small-scale Datasets?

Document Type

Publication Title

Abstract

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Included in

Browse

Contribute

Links

Computer Vision Faculty Publications

How to Train Vision Transformer on Small-scale Datasets?

Authors

Document Type

Publication Title

Abstract

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Included in

Share

Browse

Contribute

Links