VMAN: A Virtual Mainstay Alignment Network for Transductive Zero-Shot Learning

Document Type


Publication Title

IEEE Transactions on Image Processing


Transductive zero-shot learning (TZSL) extends conventional ZSL by leveraging (unlabeled) unseen images for model training. A typical method for ZSL involves learning embedding weights from the feature space to the semantic space. However, the learned weights in most existing methods are dominated by seen images, and can thus not be adapted to unseen images very well. In this paper, to align the (embedding) weights for better knowledge transfer between seen/unseen classes, we propose the virtual mainstay alignment network (VMAN), which is tailored for the transductive ZSL task. Specifically, VMAN is casted as a tied encoder-decoder net, thus only one linear mapping weights need to be learned. To explicitly learn the weights in VMAN, for the first time in ZSL, we propose to generate virtual mainstay (VM) samples for each seen class, which serve as new training data and can prevent the weights from being shifted to seen images, to some extent. Moreover, a weighted reconstruction scheme is proposed and incorporated into the model training phase, in both the semantic/feature spaces. In this way, the manifold relationships of the VM samples are well preserved. To further align the weights to adapt to more unseen images, a novel instance-category matching regularization is proposed for model re-training. VMAN is thus modeled as a nested minimization problem and is solved by a Taylor approximate optimization paradigm. In comprehensive evaluations on four benchmark datasets, VMAN achieves superior performances under the (Generalized) TZSL setting.

First Page


Last Page




Publication Date



transductive, virtual sample generation, Zero-shot learning


IR Deposit conditions:

OA version (pathway a) Accepted version

No embargo

When accepted for publication, set statement to accompany deposit (see policy)

Must link to publisher version with DOI

Publisher copyright and source must be acknowledged