Embedded Online Machine Learning

Document Type

Conference Proceeding

Publication Title

2021 International Conference Engineering and Telecommunication, En and T 2021

Abstract

The paper presents research on a set of 'classical' machine learning algorithms for tiny (microbatch, i.e., batch size equal to or less than 128) embedded online machine learning on ARM processor boards with hard memory limits and a tiny memory footprint while running on a single CPU without multithreading. We propose mathematical improvements to algorithms as well as other programming optimizations. In the presence of evolving data streams, we present an adaptation of the Gradient Boosting Decision Trees (GBDT) learning algorithm for classification tasks, the eXtreme Gradient Boosting (XGBoost, XGB) and the Random Forest (RF) learning algorithms for supervised anomaly detection tasks, and the Extended Isolation Forest (EIF) learning algorithm for unsupervised anomaly detection tasks. In this scenario, as new data is added over time, the connection between the class and the characteristics may shift, resulting in concept drift. The proposed technique generates new members of the ensemble from microbatches and/or batches of data for each algorithm as new data becomes available. The maximum ensemble size is specified, but learning does not stop when it reaches this size because the ensemble is constantly updated with new data to ensure compatibility with the current notion. We tested our technique on real-world data and compared it to the original batch-incremental learning algorithms for data streams. Our implementations gain a speedup in inference up to several times even demonstrating prediction quality improvement by 0.1-0.3 in terms of ${F_{1}}$ measure in some cases.

DOI

10.1109/EnT50460.2021.9681738

Publication Date

1-24-2022

Keywords

Anomaly detection, ARM architecture, Classification, Ensembles, Gradient Boosting, Online Optimization

Comments

IR conditions: non-described

Share

COinS