Machine Learning Faculty Publications

Distilling a Powerful Student Model via Online Knowledge Distillation

Shaojie Li, Xiamen University
Mingbao Lin, Xiamen University
Yan Wang, Pinterest Inc.
Yongjian Wu, Tencent
Yonghong Tian, Peking University
Ling Shao, Inception Institute of Artificial Intelligence & Mohamed bin Zayed University of Artificial Intelligence
Rongrong Ji, Xiamen University

Document Type

Article

Publication Title

IEEE Transactions on Neural Networks and Learning Systems

Abstract

Existing online knowledge distillation approaches either adopt the student with the best performance or construct an ensemble model for better holistic performance. However, the former strategy ignores other students' information, while the latter increases the computational complexity during deployment. In this article, we propose a novel method for online knowledge distillation, termed feature fusion and self-distillation (FFSD), which comprises two key components: FFSD, toward solving the above problems in a unified framework. Different from previous works, where all students are treated equally, the proposed FFSD splits them into a leader student set and a common student set. Then, the feature fusion module converts the concatenation of feature maps from all common students into a fused feature map. The fused representation is used to assist the learning of the leader student. To enable the leader student to absorb more diverse information, we design an enhancement strategy to increase the diversity among students. Besides, a self-distillation module is adopted to convert the feature map of deeper layers into a shallower one. Then, the shallower layers are encouraged to mimic the transformed feature maps of the deeper layers, which helps the students to generalize better. After training, we simply adopt the leader student, which achieves superior performance, over the common students, without increasing the storage or inference cost. Extensive experiments on CIFAR-100 and ImageNet demonstrate the superiority of our FFSD over existing works. The code is available at https://github.com/SJLeo/FFSD.

First Page

Last Page

DOI

10.1109/TNNLS.2022.3152732

Publication Date

3-7-2022

Keywords

Computational modeling, Feature fusion, Informatics, knowledge distillation, Knowledge engineering, Memory management, Message passing, online distillation, Optimization, self-distillation., Training

Comments

IR Deposit conditions:

OA version (pathway a) Accepted version

No embargo

When accepted for publication, set statement to accompany deposit (see policy)

Must link to publisher version with DOI

Publisher copyright and source must be acknowledged

Recommended Citation

S. Li et al., "Distilling a Powerful Student Model via Online Knowledge Distillation," in IEEE Transactions on Neural Networks and Learning Systems, doi: 10.1109/TNNLS.2022.3152732.

Link to Full Text

COinS

Machine Learning Faculty Publications

Distilling a Powerful Student Model via Online Knowledge Distillation

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Browse

Contribute

Links

Machine Learning Faculty Publications

Distilling a Powerful Student Model via Online Knowledge Distillation

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Share

Browse

Contribute

Links