Natural Language Processing Faculty Publications

MultiSpanQA: A Dataset for Multi-Span Question Answering

Haonan Li, School of Computing and Information Systems, The University of Melbourne, Australia
Maria Vasardani, Department of Geospatial Science, RMIT University, Australia
Martin Tomko, Department of Infrastructure Engineering, The University of Melbourne, Australia
Timothy Baldwin, School of Computing and Information Systems, The University of Melbourne, Australia & Mohamed bin Zayed University of Artificial IntelligenceFollow

Document Type

Conference Proceeding

Abstract

Most existing reading comprehension datasets focus on single-span answers, which can be extracted as a single contiguous span from a given text passage. Multi-span questions, i.e., questions whose answer is a series of multiple discontiguous spans in the text, are common in real life but are less studied. In this paper, we present MultiSpanQA, a new dataset that focuses on questions with multi-span answers. Raw questions and contexts are extracted from the Natural Questions (Kwiatkowski et al., 2019) dataset. After multi-span re-annotation, MultiSpanQA consists of over a total of 6,000 multi-span questions in the basic version, and over 19,000 examples with unanswerable questions, and questions with single-, and multi-span answers in the expanded version. We introduce new metrics for the purposes of multi-span question answering evaluation, and establish several baselines using advanced models. Finally, we propose a new model which beats all baselines and achieves the state-of-the-art on our dataset. © 2022 Association for Computational Linguistics.

First Page

1250

Last Page

1260

DOI

10.18653/v1/2022.naacl-main.90

Publication Date

7-2022

Keywords

Advanced modeling, Multi-spans, Question Answering, Question-answering evaluation, Reading comprehension, State of the art

Comments

IR Deposit conditions: non-described

Recommended Citation

H. Li, M. Tomko, M. Vasardani, and T. Baldwin, "MultiSpanQA: A Dataset for Multi-Span Question Answering", in Proceedings of the 2022 Conf. of the North American Chapter of the Assoc. for Computational Linguistics: Human Language Technologies (NAACL 2022), July 2022, pp. 1250-1260, doi: 10.18653/v1/2022.naacl-main.90

Link to Full Text

DOWNLOADS

Since October 11, 2022

Share

COinS

Natural Language Processing Faculty Publications

MultiSpanQA: A Dataset for Multi-Span Question Answering

Document Type

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Browse

Contribute

Links

Natural Language Processing Faculty Publications

MultiSpanQA: A Dataset for Multi-Span Question Answering

Authors

Document Type

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Share

Browse

Contribute

Links