A Transformer-Based Supervised Machine Learning Approach for Detecting SCAMs
Date of Award
4-30-2024
Document Type
Thesis
Degree Name
Master of Science in Machine Learning
Department
Machine Learning
First Advisor
Dr. Agathe Guilloux
Second Advisor
Dr. Samuel Horvath
Abstract
"Small Colloidal Aggregating Molecules (SCAMs), formed by small molecules with low aqueous solubility, often introduce false positive results in bioassays, thereby compromising the reliability of reported bioactivities. These false positives can mislead researchers in drug discovery and development, leading to wasted resources and potential setbacks in finding effective treatments. In this study, we aim to mitigate this issue by developing a supervised machine learning model through fine-tuning the pre-trained transformer model known as ChemBERTa. The primary objective of this model is to discern instances where small molecules are prone to forming SCAMs. These predictions are instrumental in early stage screening of potential drug candidates, facilitating the identification of molecules that might yield false-positive outcomes in high-throughput screens due to colloidal aggregation. Despite encountering challenges related to class imbalance, our transformer model exhibits remarkable reliability in predicting colloidal aggregate formation, surpassing the performance of state-of-the-art models."
Recommended Citation
F. Alharthi, "A Transformer-Based Supervised Machine Learning Approach for Detecting SCAMs,", Apr 2024.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfilment of the requirements for the M.Sc degree in Machine Learning
Advisors: Agathe Guilloux, Samuel Horvath
Online access available for MBZUAI patrons