Learning Attentive and Hierarchical Representations for 3D Shape Recognition
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
This paper proposes a novel method for 3D shape representation learning, namely Hyperbolic Embedded Attentive Representation (HEAR). Different from existing multi-view based methods, HEAR develops a unified framework to address both multi-view redundancy and single-view incompleteness. Specifically, HEAR firstly employs a hybrid attention (HA) module, which consists of a view-agnostic attention (VAA) block and a view-specific attention (VSA) block. These two blocks jointly explore distinct but complementary spatial saliency of local features for each single-view image. Subsequently, a multi-granular view pooling (MVP) module is introduced to aggregate the multi-view features with different granularities in a coarse-to-fine manner. The resulting feature set implicitly has hierarchical relations, which are therefore projected into a Hyperbolic space by adopting the Hyperbolic embedding. A hierarchical representation is learned by Hyperbolic multi-class logistic regression based on the Hyperbolic geometry. Experimental results clearly show that HEAR outperforms the state-of-the-art approaches on three 3D shape recognition tasks including generic 3D shape retrieval, 3D shape classification and sketch-based 3D shape retrieval.
3D shape recognition, Hyperbolic neural networks, Multi-granularity view aggregation, View-agnostic/specific attentions
J. Chen, J. Qin, Y. Shen, L. Liu, F. Zhu, and L. Shao, "Learning Attentive and Hierarchical Representations for 3D Shape Recognition", in 16th European Conf. on Computer Vision, (ECCV 2020), vol 12360 LNCS, pp. 105-122, Aug 2020. doi:10.1007/978-3-030-58555-6_7