Document Type
Conference Proceeding
Publication Title
BMVC 2022 - 33rd British Machine Vision Conference Proceedings
Abstract
A novel Face Pyramid Vision Transformer (FPVT) is proposed to learn a discriminative multi-scale facial representations for face recognition and verification. In FPVT, Face Spatial Reduction Attention (FSRA) and Dimensionality Reduction (FDR) layers are employed to make the feature maps compact, thus reducing the computations. An Improved Patch Embedding (IPE) algorithm is proposed to exploit the benefits of CNNs in ViTs (e.g., shared weights, local context, and receptive fields) to model lower-level edges to higher-level semantic primitives. Within FPVT framework, a Convolutional Feed-Forward Network (CFFN) is proposed that extracts locality information to learn low level facial information. The proposed FPVT is evaluated on seven benchmark datasets and compared with ten existing state-of-the-art methods, including CNNs, pure ViTs, and Convolutional ViTs. Despite fewer parameters, FPVT has demonstrated excellent performance over the compared methods. Project page is available at https://khawar-islam.github.io/fpvt/.
Publication Date
11-24-2022
Keywords
Computer vision, Convolution, Semantics
Recommended Citation
K. Islam et al., "Face Pyramid Vision Transformer," BMVC 2022 - 33rd British Machine Vision Conference Proceedings, Nov 2022.
Comments
IR conditions: non-described
Open Access version available on BMVC