Captionomaly: A Deep Learning Toolbox for Anomaly Captioning in Social Surveillance Systems

Document Type


Publication Title

IEEE Transactions on Computational Social Systems


Real-time video stream monitoring is gaining huge attention lately with an effort to fully automate this process. On the other hand, reporting can be a tedious task, requiring manual inspection of several hours of daily clippings. Errors are likely to occur because of the repetitive nature of the task causing mental strain on operators. There is a need for an automated system that is capable of real-time video stream monitoring in social systems and reporting them. In this article, we provide a tool aiming to automate the process of anomaly detection and reporting. We combine anomaly detection and video captioning models to create a pipeline for anomaly reporting in descriptive form. A new set of labels by creating descriptive captions for the videos collected from the UCF-Crime (University of Central Florida-Crime) dataset has been formulated. The anomaly detection model is trained on the UCF-Crime, and the captioning model is trained with the newly created labeled set UCF-Crime video description (UCFC-VD). The tool will be used for performing the combined task of anomaly detection and captioning. Automated anomaly captioning would be useful in the efficient reporting of video surveillance data in different social scenarios. Several testing and evaluation techniques were performed. Source code and dataset:

First Page


Last Page




Publication Date



Anomaly detection, Anomaly detection, Computational modeling, Deep learning, deep learning, surveillance, Task analysis, toolbox, Training, UCF-Crime, video captioning, Video surveillance, Visualization


IR Deposit conditions:

OA version (pathway a) Accepted version

No embargo

When accepted for publication, set statement to accompany deposit (see policy)

Must link to publisher version with DOI

Publisher copyright and source must be acknowledged