A multi-resolution fusion approach for human activity recognition from video data in tiny edge devices

Document Type

Article

Publication Title

Information Fusion

Abstract

Human Activity Recognition (HAR) is the process of automatic recognition of Activities of Daily Living (ADL) from human motion data captured in various data modalities by wearable and ambient sensors. Advances in Deep Learning, especially Convolutional Neural Networks (CNNs) and sequential models have revolutionalized HAR from video data sources. Although these models are incredibly accurate and utilize both spatial and temporal information, they require huge computation and memory resources — making them unsuitable for edge or wearable applications. Tiny Machine Learning (TinyML) aims to optimize these models in terms of compute and memory requirements – aiming to make them suitable for always-on resource constrained devices – leading to a reduction in communication latency and network traffic for HAR frameworks. In this paper, we propose a two-stream multi-resolution fusion architecture for HAR from video data modality. The context stream takes a resized image as input and the fovea stream takes the cropped center portion of the resized image as input, reducing the overall dimensionality. We tested two quantization methods: Post-Training Quantization (PTQ) and Quantization Aware Training (QAT) to optimize these models for deployment in edge devices and tested the performance in two challenging video datasets: KTH and UCF11. We performed ablation studies to validate the two-stream model performance. We deployed the proposed architecture in commercial resource constrained devices and monitored their performance in terms of inference latency and power consumption. The results indicate that the proposed architecture clearly outperforms other relevant single-stream models tested in this work in terms of accuracy, precision, recall, and F1-Score while also reducing the overall model size.

DOI

10.1016/j.inffus.2023.101953

Publication Date

12-2023

Keywords

Convolutional Neural Network, Deep learning, Human Activity Recognition, Resource-constrained devices, TinyML

Comments

IR Deposit conditions:

OA version (pathway b) Accepted version

24 months embargo

License: CC BY-NC-ND

Must link to publisher version with DOI

Share

COinS