VMAN: A Virtual Mainstay Alignment Network for Transductive Zero-Shot Learning
Document Type
Article
Publication Title
IEEE Transactions on Image Processing
Abstract
Transductive zero-shot learning (TZSL) extends conventional ZSL by leveraging (unlabeled) unseen images for model training. A typical method for ZSL involves learning embedding weights from the feature space to the semantic space. However, the learned weights in most existing methods are dominated by seen images, and can thus not be adapted to unseen images very well. In this paper, to align the (embedding) weights for better knowledge transfer between seen/unseen classes, we propose the virtual mainstay alignment network (VMAN), which is tailored for the transductive ZSL task. Specifically, VMAN is casted as a tied encoder-decoder net, thus only one linear mapping weights need to be learned. To explicitly learn the weights in VMAN, for the first time in ZSL, we propose to generate virtual mainstay (VM) samples for each seen class, which serve as new training data and can prevent the weights from being shifted to seen images, to some extent. Moreover, a weighted reconstruction scheme is proposed and incorporated into the model training phase, in both the semantic/feature spaces. In this way, the manifold relationships of the VM samples are well preserved. To further align the weights to adapt to more unseen images, a novel instance-category matching regularization is proposed for model re-training. VMAN is thus modeled as a nested minimization problem and is solved by a Taylor approximate optimization paradigm. In comprehensive evaluations on four benchmark datasets, VMAN achieves superior performances under the (Generalized) TZSL setting.
First Page
4316
Last Page
4329
DOI
10.1109/TIP.2021.3070231
Publication Date
4-9-2021
Keywords
transductive, virtual sample generation, Zero-shot learning
Recommended Citation
G. -S. Xie, X. -Y. Zhang, Y. Yao, Z. Zhang, F. Zhao and L. Shao, "VMAN: A Virtual Mainstay Alignment Network for Transductive Zero-Shot Learning," in IEEE Transactions on Image Processing, vol. 30, pp. 4316-4329, 2021, doi: 10.1109/TIP.2021.3070231.
Additional Links
IEEE Link: https://doi.org/10.1109/TIP.2021.3070231
Comments
IR Deposit conditions:
OA version (pathway a) Accepted version
No embargo
When accepted for publication, set statement to accompany deposit (see policy)
Must link to publisher version with DOI
Publisher copyright and source must be acknowledged