Efficient and Accurate Phase Recognition in Videos of Cataract Surgeries

Date of Award


Document Type


Degree Name

Master of Science in Computer Vision


Computer Vision

First Advisor

Dr. Mohammad Yaqub

Second Advisor

Dr. Karthik Nandakumar


Cataracts are the leading cause of eye disease globally, affecting 65.2 million individuals out of an estimated 2.2 billion people suffering from some form of vision impairment. This condition is characterized by the gradual clouding of the eye’s lens, primarily due to aging. Presently, phacoemulsification cataract surgery is recognized as the benchmark due to its reduced risk of complications after surgery. Despite its benefits, challenges persist due to the scarcity of proficient surgeons and the lack of practical approaches for assessing and providing feedback on surgical skills. In response, deep learning solutions can provide intraoperative assessments, analyses after surgery, and systematic feedback on the performance of surgeons. The first step to many of those solutions is accurate and efficient phase recognition in which a deep learning model classifies each frame of a cataract surgery video into a single phase. However, the efficiency versus effectiveness trade-off of a phase recognition model has not been thoroughly considered in previous state-of-the-art methods, with most methods leaning towards achieving better performance at the cost of an inefficient architecture. In our research, we introduce a new paradigm of phase recognition that utilizes selective state spaces to strike a good balance between efficiency and effectiveness. Our proposed method (CataMamba) is a dual-stage architecture that first extracts rich visual features from the frames of the surgery and then models the temporal relations using Mamba blocks. We demonstrate our technique’s success in balancing efficiency while maintaining effectiveness across two different cataract surgery datasets with varying numbers of phases, namely Cataract-101 and CATARACTS, where our method either outperforms or performs comparably to current leading methods. These findings highlight the potential of our method within the emerging field of phase recognition in surgical settings.


Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the M.Sc degree in Computer Vision

Advisors: Mohammad Yaqub, Karthik Nandakumar

Online access available for MBZUAI patrons