Predicting of Acute Kidney Injury using Electronic Health Records

Document Type



Acute Kidney Injury (AKI) affects more than 13 million people annually and increases the risk of death in patients. The severity of AKI also contributes to the increase in associated costs of a patient’s treatment. The early prediction of AKI could enable clinicians to focus on preventive treatment for at-risk patients. The adoption of Electronic Health Records (EHR) in medical institutions allows healthcare professionals to access patients’ health more efficiently and develop personalized treatment trajectories, thereby improving healthcare quality. Therefore, the AKI risk prediction algorithm based on EHR could enable clinicians to devote more time directly to treating patients instead of reviewing tons of information related to patients’ visits. In this thesis, we used the publicly available EHR database MIMIC-IV v2.0 to develop an AKI risk prediction framework for patients admitted to Intensive Care Units (ICU). The framework includes the algorithm for AKI detection from creatinine value and urine output, as well as the prediction of next-day AKI onset from the data collected on the first day in the ICU. The AKI prediction task was implemented for three different granularity levels: predicting an AKI onset of any stage, predicting the AKI onset of stages 2 and 3, and predicting the AKI onset of stage 3, which is the most severe case. Due to the imbalance of the given data, we experimented with several balancing techniques to tackle this problem. In addition to the classical machine learning approach with manual feature selection, we have also explored an LSTM-based approach applied to the prediction of AKI. Due to the variety of data available for each patient, it is challenging to assess which information could be the best predictor. Thus, the text classification model used unstructured textual data to make predictions. The extreme gradient boosting (XGBoost) machine learning algorithm, trained on less than 10 thousand patients in imbalanced data settings, achieved better performance than the deep learning text classification model. The latter, in turn, showed the ability to capture meaningful information from the text.

First Page


Last Page


Publication Date



Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfillment of the requirements for the M.Sc degree in Machine Learning

Advisors: Dr. Mohammad Yaqub, Dr. Le Song

Online access provided for MBZUAI patrons