A multi-resolution fusion approach for human activity recognition from video data in tiny edge devices
Human Activity Recognition (HAR) is the process of automatic recognition of Activities of Daily Living (ADL) from human motion data captured in various data modalities by wearable and ambient sensors. Advances in Deep Learning, especially Convolutional Neural Networks (CNNs) and sequential models have revolutionalized HAR from video data sources. Although these models are incredibly accurate and utilize both spatial and temporal information, they require huge computation and memory resources — making them unsuitable for edge or wearable applications. Tiny Machine Learning (TinyML) aims to optimize these models in terms of compute and memory requirements – aiming to make them suitable for always-on resource constrained devices – leading to a reduction in communication latency and network traffic for HAR frameworks. In this paper, we propose a two-stream multi-resolution fusion architecture for HAR from video data modality. The context stream takes a resized image as input and the fovea stream takes the cropped center portion of the resized image as input, reducing the overall dimensionality. We tested two quantization methods: Post-Training Quantization (PTQ) and Quantization Aware Training (QAT) to optimize these models for deployment in edge devices and tested the performance in two challenging video datasets: KTH and UCF11. We performed ablation studies to validate the two-stream model performance. We deployed the proposed architecture in commercial resource constrained devices and monitored their performance in terms of inference latency and power consumption. The results indicate that the proposed architecture clearly outperforms other relevant single-stream models tested in this work in terms of accuracy, precision, recall, and F1-Score while also reducing the overall model size.
Convolutional Neural Network, Deep learning, Human Activity Recognition, Resource-constrained devices, TinyML
S. Nooruddin et al., "A multi-resolution fusion approach for human activity recognition from video data in tiny edge devices," Information Fusion, vol. 100, Dec 2023.
The definitive version is available at https://doi.org/10.1016/j.inffus.2023.101953