A Fast Knowledge Distillation Framework for Visual Recognition

Document Type

Conference Proceeding

Publication Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)


While Knowledge Distillation (KD) has been recognized as a useful tool in many visual tasks, such as supervised classification and self-supervised representation learning, the main drawback of a vanilla KD framework is its mechanism that consumes the majority of the computational overhead on forwarding through the giant teacher networks, making the entire learning procedure inefficient and costly. The recently proposed solution ReLabel suggests creating a label map for the entire image. During training, it receives the cropped region-level label by RoI aligning on a pre-generated entire label map, which allows for efficient supervision generation without having to pass through the teachers repeatedly. However, as the pre-trained teacher employed in ReLabel is from the conventional multi-crop scheme, there are various mismatches between the global label-map and region-level labels in this technique, resulting in performance deterioration compared to the vanilla KD. In this study, we present a Fast Knowledge Distillation (FKD) framework that replicates the distillation training phase and generates soft labels using the multi-crop KD approach, meanwhile training faster than ReLabel since no post-processes such as RoI align and softmax operations are used. When conducting multi-crop in the same image for data loading, our FKD is even more efficient than the traditional image classification framework. On ImageNet-1K, we obtain 80.1% Top-1 accuracy on ResNet-50, outperforming ReLabel by 1.2% while being faster in training and more flexible to use. On the distillation-based self-supervised learning task, we also show that FKD has an efficiency advantage. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

First Page


Last Page




Publication Date



Crops, Deterioration, Personnel training, Supervised learning, Vision, Computational overheads, Label maps, Learning procedures, Performance deterioration, Soft labels, Supervised classification, Teachers', Training phasis, Visual recognition, Visual tasks, Distillation, Artificial Intelligence (cs.AI), Computer Vision and Pattern Recognition (cs.CV), Machine Learning (cs.LG)


IR conditions: non-described