Task-Oriented High-Order Context Graph Networks for Few-Shot Human-Object Interaction Recognition
IEEE Transactions on Systems, Man, and Cybernetics: Systems
Few-shot human-object interaction (FS-HOI) recognition aims at inferring new interactions between human actions and surrounding objects merely with a few available instances. It is beneficial to alleviate the long-tail and combinatorial explosion problems in human-object interaction (HOI). Nevertheless, the existing FS-HOI methods only focus on modeling the relationships between labeled samples and unlabeled samples in the Euclidean domain, which neglects the rich relational structures of the visual information among labeled samples and between human actions and objects. Accordingly, we tackle the few-shot HOI task in the non-Euclidean domain and present a graph-based model, namely, task-oriented high-order context graph network (THCG-Net). It contains a task attention module (TA-Module) and a high-order context graph module (HG-Module). In TA-Module, an attention mechanism is designed by utilizing task information to build a task-oriented space, in which the discriminative information for the current task (episode) is captured by embedding the visual features into the task-oriented space. The HG-Module is proposed to construct a task-level graph and takes the context information as high-order knowledge, which provides discriminative guidance for propagating visual information. It captures the discriminability among different categories while highlights the commonality of related categories adaptively, which effectively transfers knowledge to related categories. Extensive experimental results on two benchmark datasets, HICO-FS and TUHOI-FS, are provided. It demonstrates that our THCG-Net significantly outperforms the state-of-the-art approaches, which proves its impressive effectiveness in recognizing various human actions and surrounding objects in few-shot scenarios.
Few-shot learning, graph neural networks (GNNs), human-object interaction (HOI), meta-learning
Z. Ji, P. An, X. Liu, Y. Pang, L. Shao and Z. Zhang, "Task-Oriented High-Order Context Graph Networks for Few-Shot Human-Object Interaction Recognition," in IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 9, pp. 5443-5455, Sept. 2022, doi: 10.1109/TSMC.2021.3125343.