Computer Vision Faculty Publications

FLIP: Cross-domain Face Anti-spoofing with Language Guidance

Koushik Srivatsan, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Muzammal Naseer, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Karthik Nandakumar, Mohamed Bin Zayed University of Artificial IntelligenceFollow

Document Type

Conference Proceeding

Publication Title

Proceedings of the IEEE International Conference on Computer Vision

Abstract

Face anti-spoofing (FAS) or presentation attack detection is an essential component of face recognition systems deployed in security-critical applications. Existing FAS methods have poor generalizability to unseen spoof types, camera sensors, and environmental conditions. Recently, vision transformer (ViT) models have been shown to be effective for the FAS task due to their ability to capture long-range dependencies among image patches. However, adaptive modules or auxiliary loss functions are often required to adapt pre-trained ViT weights learned on large-scale datasets such as ImageNet. In this work, we first show that initializing ViTs with multimodal (e.g., CLIP) pre-trained weights improves generalizability for the FAS task, which is in line with the zero-shot transfer capabilities of vision-language pre-trained (VLP) models. We then propose a novel approach for robust cross-domain FAS by grounding visual representations with the help of natural language. Specifically, we show that aligning the image representation with an ensemble of class descriptions (based on natural language semantics) improves FAS generalizability in low-data regimes. Finally, we propose a multimodal contrastive learning strategy to boost feature generalization further and bridge the gap between source and target domains. Extensive experiments on three standard protocols demonstrate that our method significantly outperforms the state-of-the-art methods, achieving better zero-shot transfer performance than five-shot transfer of "adaptive ViTs". Code: https://github.com/koushiksrivats/FLIP

First Page

19628

Last Page

19639

DOI

10.1109/ICCV51070.2023.01803

Publication Date

1-1-2023

Recommended Citation

K. Srivatsan et al., "FLIP: Cross-domain Face Anti-spoofing with Language Guidance," Proceedings of the IEEE International Conference on Computer Vision, pp. 19628 - 19639, Jan 2023.

The definitive version is available at https://doi.org/10.1109/ICCV51070.2023.01803

This document is currently not available here.

COinS

Computer Vision Faculty Publications

FLIP: Cross-domain Face Anti-spoofing with Language Guidance

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Browse

Contribute

Links

Computer Vision Faculty Publications

FLIP: Cross-domain Face Anti-spoofing with Language Guidance

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Recommended Citation

Share

Browse

Contribute

Links