A Multi-Head Approach with Shuffled Segments for Weakly-Supervised Video Anomaly Detection
Document Type
Conference Proceeding
Publication Title
Proceedings - 2024 IEEE Winter Conference on Applications of Computer Vision Workshops, WACVW 2024
Abstract
Weakly-supervised video anomaly detection (WS-VAD) is a challenging task because coarse video-level annotations are insufficient to train fine-grained (segment or frame-level) detection algorithms. Multiple instance learning (MIL) powered by a ranking loss between the highest scoring segments of normal and anomaly videos has become the de-facto standard for WS-VAD. However, ranking loss is not robust to noisy segment-level labels (induced from the video-level labels), which is inherently the case in WS settings. In this work, we propose a new variant of the MIL method that utilizes a margin loss to achieve WS-VAD. The margin loss enables effective training of an anomaly scoring head based on noisy segment-level labels with high data imbalance (large number of normal segments and very few anomalous segments). We also introduce a self-supervised learning paradigm via stochastic shuffling of segments from multiple videos to mimic event changes during training. This forces the model to learn the boundaries between different virtual events (through a boundary localization head) and localizing the center of virtual events (through a center localization head). The efficacy of the proposed multi-head approach in successfully localizing anomalies is demonstrated through experiments on two large-scale VAD datasets (UCF-Crime and XD-Violence).
First Page
132
Last Page
142
DOI
10.1109/WACVW60836.2024.00022
Publication Date
1-1-2024
Recommended Citation
S. Almarri et al., "A Multi-Head Approach with Shuffled Segments for Weakly-Supervised Video Anomaly Detection," Proceedings - 2024 IEEE Winter Conference on Applications of Computer Vision Workshops, WACVW 2024, pp. 132 - 142, Jan 2024.
The definitive version is available at https://doi.org/10.1109/WACVW60836.2024.00022