Natural Language Processing Faculty Publications

NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links

Natalia Loukachevitch, Lomonosov Moscow State University
Ekaterina Artemova, HSE University
Tatiana Batura, Novosibirsk State University
Pavel Braslavski, HSE University
Vladimir Ivanov, Innopolis University
Suresh Manandhar, Madan Bhandari University of Science and Technology
Alexander Pugachev, HSE University
Igor Rozhkov, Lomonosov Moscow State University
Artem Shelmanov, Lomonosov Moscow State University & Mohamed bin Zayed University of Artificial IntelligenceFollow
Elena Tutubalina, HSE University
Alexey Yandutov, Lomonosov Moscow State University

Document Type

Article

Publication Title

Language Resources and Evaluation

Abstract

This paper describes NEREL—a Russian news dataset suited for three tasks: nested named entity recognition, relation extraction, and entity linking. Compared to flat entities, nested named entities provide a richer and more complete annotation while also increasing the coverage of relations annotation and entity linking. Relations between nested named entities may cross entity boundaries to connect to shorter entities nested within longer ones, which makes it harder to detect such relations. NEREL is currently the largest Russian dataset annotated with entities and relations: it comprises 29 named entity types and 49 relation types. At the time of writing, the dataset contains 56 K named entities and 39 K relations annotated in 933 person-oriented news articles. NEREL is annotated with relations at three levels: (1) within nested named entities, (2) within sentences, and (3) with relations crossing sentence boundaries. We provide benchmark evaluation of current state-of-the-art methods in all three tasks. The dataset is freely available at https://github.com/nerel-ds/NEREL .

DOI

10.1007/s10579-023-09674-z

Publication Date

9-21-2023

Keywords

Entity linking, Named entity recognition, Nested entities, Nested relations, Relation extraction

Comments

IR conditions: non-described

Recommended Citation

N. Loukachevitch, et al, "NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links," in Lang Resources & Evaluation, Sept 2023. doi:10.1007/s10579-023-09674-z

Additional Links

Publisher link: https://doi.org/10.1007/s10579-023-09674-z

Link to Full Text

COinS

Natural Language Processing Faculty Publications

NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links

Document Type

Publication Title

Abstract

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Browse

Contribute

Links

Natural Language Processing Faculty Publications

NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links

Authors

Document Type

Publication Title

Abstract

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Share

Browse

Contribute

Links