A Transformer-Based Supervised Machine Learning Approach for Detecting SCAMs

Date of Award


Document Type


Degree Name

Master of Science in Machine Learning


Machine Learning

First Advisor

Dr. Agathe Guilloux

Second Advisor

Dr. Samuel Horvath


"Small Colloidal Aggregating Molecules (SCAMs), formed by small molecules with low aqueous solubility, often introduce false positive results in bioassays, thereby compromising the reliability of reported bioactivities. These false positives can mislead researchers in drug discovery and development, leading to wasted resources and potential setbacks in finding effective treatments. In this study, we aim to mitigate this issue by developing a supervised machine learning model through fine-tuning the pre-trained transformer model known as ChemBERTa. The primary objective of this model is to discern instances where small molecules are prone to forming SCAMs. These predictions are instrumental in early stage screening of potential drug candidates, facilitating the identification of molecules that might yield false-positive outcomes in high-throughput screens due to colloidal aggregation. Despite encountering challenges related to class imbalance, our transformer model exhibits remarkable reliability in predicting colloidal aggregate formation, surpassing the performance of state-of-the-art models."


Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Machine Learning

Advisors: Agathe Guilloux, Samuel Horvath

Online access available for MBZUAI patrons