Enhancing Contextual Learning in Person Search with Weighted Receptive Field and Graph Edge Convolutional Network Integration

Date of Award


Document Type


Degree Name

Master of Science in Computer Vision


Computer Vision

First Advisor

Dr. Rao Anwer

Second Advisor

Dr. Fahad Khan


In an era characterized by the ever-expanding volume of digital imagery and video content, the demand to detect and re-identify individuals within these extensive datasets has grown significantly. From enhancing security in surveillance systems to delivering personalized user experiences across diverse applications, the necessity for robust and efficient person search techniques has never been more evident. Yet, despite these advancements, persistent challenges like occlusion, lighting variations, and scalability continue to drive research and innovation in the field. However, this task is far from simple, as it entails substantial difficulties, from spatiotemporal visual feature variations to strong visual similarities among people. This thesis embarks on a journey into the domain of person search, leveraging the latest advancements in artificial intelligence, computer vision, and deep learning to address this critical challenge while building upon prior research efforts. In this study, we present an improved end-to-end framework based on OIMNET++ to tackle the challenges mentioned earlier. Our improvements involve several key enhancements. Firstly, we introduce a novel weighted receptive field method, which is applied to the CNN backbone. This method empowers the network to address occlusion and capture more intricate and global features, enhancing its ability to recognize individuals even in complex scenarios. Secondly, we incorporate feature aggregation across multiple feature hierarchies. This approach enables the network to capture information at various scales, allowing it to learn multi-scale representations. This is particularly useful for dealing with the variability in the scale of individuals. Finally, we integrate a Graph Edge Convolutional (GEC) module into the re-identification subnetwork. The GEC leverages graph representations of feature maps, which helps overcome the limitations of relying solely on the local position of human body parts. This allows the network to match and identify individuals across different postures, making it more robust in dynamic scenarios. The results of our experiments indicate that our model, which we have named GECPS (Graph Edge Convolutional Network for Person Search), surpasses the baseline model. It achieves a 3% higher mean average precision for the PRW benchmark dataset and a 1.3% improvement for the CUHK-SYSU dataset.


Thesis submitted to the Deanship of Graduate and Postdoctoral Studies

In partial fulfilment of the requirements for the Msc degree in Computer Vision

Advisors: Gao Anwer, Fahad Khan

Online access available for MBZUAI patrons