Document Type


Publication Title



Cancer is one of the leading causes of death worldwide, and head and neck (H&N) cancer is amongst the most prevalent types. Positron emission tomography and computed tomography are used to detect and segment the tumor region. Clinically, tumor segmentation is extensively time-consuming and prone to error. Machine learning, and deep learning in particular, can assist to automate this process, yielding results as accurate as the results of a clinician. In this research study, we develop a vision transformers-based method to automatically delineate H&N tumor, and compare its results to leading convolutional neural network (CNN)-based models. We use multi-modal data of CT and PET scans to do this task. We show that the selected transformer-based model can achieve results on a par with CNN-based ones. With cross validation, the model achieves a mean dice similarity coefficient of 0.736, mean precision of 0.766 and mean recall of 0.766. This is only 0.021 less than the 2020 competition winning model in terms of the DSC score. This indicates that the exploration of transformer-based models is a promising research area. © 2022, CC BY-NC-SA.


Publication Date



Computerized tomography, Convolutional neural networks, Deep learning, Medical imaging, Modal analysis, Positron emission tomography, Tumors, Automatic segmentations, Cancer segmentation, Causes of death, Convolutional neural network, CT, Head-and-neck cancer, Head-and-neck tumor, HECKTOR, Multi-modal data, Transformer-based segmentation, Diseases, Computer Vision and Pattern Recognition (cs.CV), Image and Video Processing (eess.IV), Machine Learning (cs.LG)


Preprint: arXiv

Archived with thanks to arXiv

Preprint License: CC BY-NC-SA 4.0

Uploaded 25 March 2022