Context Matters: Distilling Knowledge Graph for Enhanced Object Detection

Document Type


Publication Title

IEEE Transactions on Multimedia


The human visual system is capable of not only recognizing individual objects but also comprehending the contextual relationship between them in real-world scenarios, making it highly advantageous for object detection. However, in practical applications, such contextual information is often not available. Previous attempts to compensate for this by utilizing cross-modal data such as language and statistics to obtain contextual priors have been deemed sub-optimal due to a semantic gap. To overcome this challenge, we present a seamless integration of context into an object detector through Knowledge Distillation. Our approach intuitively represents context as a knowledge graph, describing the relative location and semantic relevance of different visual concepts. Leveraging recent advancements in graph representation learning with Transformer, we exploit the contextual information among objects using edge encoding and graph attention. Specifically, each image region propagates and aggregates the representation from its highly similar neighbors to form the knowledge graph in the Transformer encoder. Extensive experiments and a thorough ablation study conducted on challenging benchmarks MS-COCO, Pascal VOC and LVIS demonstrate the superiority of our method.

First Page


Last Page




Publication Date



Knowledge distillation, knowledge graph, object detection

This document is currently not available here.