Loan Predication Using Federated Learning

Date of Award


Document Type


Degree Name

Master of Science in Machine Learning


Machine Learning

First Advisor

Dr. Samuel Horvath

Second Advisor

Dr. Bin Gu


"Most financial service providers are now able to reevaluate their risk assessment mechanism, particularly with regard to loan defaults, thanks to the breakthroughs that have marked the development of cutting-edge data-driven technology. Banks and financial institutions have always been vulnerable to loan defaults. A number of variables, such as relationships between and within customers, influence loan defaults. This paper introduces the unique approach by using Verticle Federated Learning (VFL) techniques for collaboration of different datasets of LendingClub and Berka together for loan default prediction. Using VFL has allowed us to keep data privacy intact and train a global model on multiple datasets. Using deep learning architectures like ANN and DNN, we have implemented Federated Learning on a shared dataset. The Berka dataset adds an additional dimension from, among other things, European banking transactions that enrich the predictive features, much like the LendingClub dataset, which contains most of the data describing loan applicants across the United States, from firmographic to transactional and behavioral. Such cross-dataset learning collaborations are made possible by VFL implementation, which protects data privacy. This has so far been clearly justified in terms of the efficiency of the VFL to increase prediction performance than its centralized counterpart. Thereof, there happens a comparative analysis for the four models of machine learning evaluated with respect to the prediction of loan defaults when placed to this federated learning setting. The models get evaluated on a set of metrics like the accuracy, precision, recall, the F1-score, and interrelations among the inherent imbalances that are stipulated within the dataset. And at the end of the day, the best model among the four gets revealed to gain optimal prediction performance in case of this federated learning. The results were largely in expected directions, while all other directions, with the results included models showing potential predictive ability from several aspects, were better compared to the others, with some respect to certain aspects of the data. ANN topped in the overall performance with maximum accuracy, while for the ability to capture complex nonlinear relationships among the features. From a ratio-to-accuracy perspective, it shows how VFL might assist in maximizing the prediction power of larger, more varied datasets. To manage risk and reduce the intensity of defaults in their portfolio, for instance, a financial institution could make more informed decisions about credit disbursement with the help of machine learning models such as ANN, CNN, and LSTM."


Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Machine Learning

Advisors: Samuel Horvath, Bin Gu

Online access available for MBZUAI patrons