ARTriViT: Automatic Face Recognition System Using ViT-Based Siamese Neural Networks with a Triplet Loss

Document Type

Conference Proceeding

Publication Title

IEEE International Symposium on Industrial Electronics


Computer-based face recognition and other biometric techniques are now mature and trustworthy technology that plays a crucial role in many access control scenarios. Face recognition undergoes a variety of difficulties, including those related to angle, lighting, position, facial expression, noise, resolution, occlusion, and the scarcity of samples from each class. In this study, we proposed a triplet loss-based Siamese network with a vision transformer as a feature extractor instead of traditional convolution. Our Siamese analyzes a pair of face images as input, extracts the characteristics from these pairs, and uses similarity indexes to evaluate them for face recognition using the Celeb-DF (version 2) dataset. As a result, the suggested model performs well compared to the state-of-the-art (SOTA) on the Celeb-DF version 2 dataset. The trained model and code will be available at:



Publication Date



Face Recognition, Siamese Neural Network, Triplet Loss, Vision Transformers


IR conditions: non-described