Action Knowledge Graph for Violence Detection Using Audiovisual Features

Document Type

Conference Proceeding

Publication Title

Digest of Technical Papers - IEEE International Conference on Consumer Electronics

Abstract

Detecting violent content in video frames is a crucial aspect of violence detection. Combining visual and audio cues is often the most effective way to identify violent behavior, as they complement each other. However, studies that examine the fusion of these cues in violence detection are computationally expensive and limited. To address this problem, we investigated various methods for integrating visual and audio information and proposed a Fused Vision-based Action Knowledge Graph (FV-AKG) for violence detection using audiovisual information. The authors have designed a network with three parallel branches named integrated, specialized, and scoring that capture and integrate the distinct relationships between audio and video samples. Our proposed FV-AKG captures the long-range dependencies based on similarity priors in the integrated branch, while proximity priors are used for local positional relationships in the specialized branch. In addition, the scoring branch indicates how close the predictions are to reality. We used two key operations during model training: Aggregation and update, each with its learnable weights. In the aggregation operation, long-range dependencies are compiled from global vertices, whereas in the update function, nonlinear transforms are used to compute new representations. We thoroughly investigated the possibilities of temporal context modeling using graphs and found that FV-AKG is the best option for real-Time violence detection. Our experiments showed that FV-AKG outperforms the current top State-of-The-Art (SoTA) methods on the XD-Violence datasets.

DOI

10.1109/ICCE59016.2024.10444158

Publication Date

1-1-2024

This document is currently not available here.

Share

COinS