Open-Vocabulary Object Detection and its Application in Robotic Navigation
Date of Award
4-30-2024
Document Type
Thesis
Degree Name
Master of Science in Computer Vision
Department
Computer Vision
First Advisor
Dr. Ian Reid
Second Advisor
Dr. Shijian Lu
Abstract
This thesis explores the role of open-vocabulary object detection in enhancing robotic navigation. The initial phase of our research centers on addressing a fundamental challenge in open-vocabulary object detection: training detectors capable of recognizing a wide array of novel classes without direct supervision. Traditional self-training approaches often rely on image-level weak supervision to generate pseudo object boxes for training, which unfortunately results in noisy and base-class-biased pseudo boxes, diminishing the detectors' effectiveness. To counter this, we introduce a novel technique named Debiased Curriculum Self-Training (DCS), designed to refine the generation of pseudo object boxes through progressive pseudo-label filtering (PPF) and adaptive pseudo-label selection (APS). PPF systematically eliminates mismatched detections early in training—when the detector's bias toward base classes is most pronounced—while APS merges class-aware and class-agnostic pseudo-labeling methods, giving precedence to class-aware labeling as the detector's capability to detect novel classes matures. Without resorting to complex mechanisms, DCS markedly enhances detection performance across multiple open-vocabulary benchmarks. In the second phase of our research, we focus on developing a mapping method for robotic visual navigation—a fundamental step enabling an agent to comprehend its environment. Preferring the less resource-intensive topological mapping over metric mapping, we innovate beyond the conventional image-as-node approach by constructing an object-level map using image segments. This technique refines the map's granularity and enhances its interconnectivity. To improve the map's semantic clarity and enable the agent to navigate using more reliable landmarks, we incorporate the DCS model to supplement the semantic information of the map and design a novel planning strategy that considers the semantic difference between nodes. Our experimental results in a simulator indicate superior navigation outcomes with this mapping method. This integration not only demonstrates the practical applicability of our initial research but also paves the way for my future research on robotics navigation in dynamic settings.
Recommended Citation
H. Zhang, "Open-Vocabulary Object Detection and its Application in Robotic Navigation,", Apr 2024.
Comments
Thesis submitted to the Deanship of Graduate and Postdoctoral Studies
In partial fulfilment of the requirements for the M.Sc degree in Computer Vision
Advisors:Ian Reid, Shijian Lu
Online access available for MBZUAI patrons