Multi-level feature fusion for multimodal human activity recognition in Internet of Healthcare Things

Document Type


Publication Title

Information Fusion


Human Activity Recognition (HAR) has become a crucial element for smart healthcare applications due to the fast adoption of wearable sensors and mobile technologies. Most of the existing human activity recognition frameworks deal with a single modality of data that degrades the reliability and recognition accuracy of the system for heterogeneous data sources. In this article, we propose a multi-level feature fusion technique for multimodal human activity recognition using multi-head Convolutional Neural Network (CNN) with Convolution Block Attention Module (CBAM) to process the visual data and Convolutional Long Short Term Memory (ConvLSTM) for dealing with the time-sensitive multi-source sensor information. The architecture is developed to be able to analyze and retrieve channel and spatial dimension features through the use of three branches of CNN along with CBAM for visual information. The ConvLSTM network is designed to capture temporal features from the multiple sensors’ time-series data for efficient activity recognition. An open-access multimodal HAR dataset named UP-Fall detection dataset is utilized in experiments and evaluations to measure the performance of the developed fusion architecture. Finally, we deployed an Internet of Things (IoT) system to test the proposed fusion network in real-world smart healthcare application scenarios. The findings from the experimental results reveal that the developed multimodal HAR framework surpasses the existing state-of-the-art methods in terms of multiple performance metrics.

First Page


Last Page




Publication Date



Convolutional block attention module, Convolutional long short term memory, Human activity recognition, Internet of things, Multi-head convolutional neural network


IR Deposit conditions:

OA version (pathway a) Accepted version

24 months embargo

License: CC BY-NC-ND

Must link to publisher version with DOI