Machine Learning Faculty Publications

On the Effectiveness of Images in Multi-modal Text Classification: An Annotation Study

Chunpeng Ma, Fujitsu Ltd.
Aili Shen, Amazon
Hiyori Yoshikawa, Fujitsu Ltd.
Tomoya Iwakura, Fujitsu Ltd.
Daniel Beck, University of Melbourne
Timothy Baldwin, University of Melbourne & Mohamed bin Zayed University of Artificial IntelligenceFollow

Document Type

Article

Publication Title

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Combining different input modalities beyond text is a key challenge for natural language processing. Previous work has been inconclusive as to the true utility of images as a supplementary information source for text classification tasks, motivating this large-scale human study of labelling performance given text-only, images-only, or both text and images. To this end, we create a new dataset accompanied with a novel annotation method - Japanese Entity Labeling with Dynamic Annotation - to deepen our understanding of the effectiveness of images for multi-modal text classification. By performing careful comparative analysis of human performance and the performance of state-of-the-art multi-modal text classification models, we gain valuable insights into differences between human and model performance, and the conditions under which images are beneficial for text classification.

First Page

Last Page

DOI

10.1145/3565572

Publication Date

3-10-2023

Keywords

Datasets, multi-modality, natural language processing, neural networks, text classification

Comments

IR Deposit conditions:

OA version (pathway a) Accepted version

No embargo

Publisher copyright and source must be acknowledged

Must link to publisher version with statement that this is the definitive version and DOI

Must state that version on repository is the authors version

Set statement to accompany deposit (see policy)

Recommended Citation

C. Ma et al., “On the effectiveness of images in multi-modal text classification: An annotation study,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 22, no. 3, pp. 1–19, 2023. doi:10.1145/3565572

Additional Links

https://doi.org/10.1145/3565572

Link to Full Text

COinS

Machine Learning Faculty Publications

On the Effectiveness of Images in Multi-modal Text Classification: An Annotation Study

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Browse

Contribute

Links

Machine Learning Faculty Publications

On the Effectiveness of Images in Multi-modal Text Classification: An Annotation Study

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Share

Browse

Contribute

Links