Natural Language Processing Faculty Publications

Introduction to Language Identification

Tommi Jauhiainen, Helsingin Yliopisto
Marcos Zampieri, George Mason University
Timothy Baldwin, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Krister Lindén, Helsingin Yliopisto

Document Type

Article

Publication Title

Synthesis Lectures on Human Language Technologies

Abstract

Language identification (LI) is the task of predicting the language(s) in a text or speech input. The main difference between LI of text and speech is that the characters that make up the text are discrete, whereas with speech, the input is usually a continuous signal. This means that different styles of mathematical methods are needed to process text and speech, traditionally with little methodological overlap between them. In this book, we focus on the language identification of digital text, although we do touch on applications to speech in the case that the speech signal has been translated into a sequence of (discrete) phones. Recognizing the language(s) that a text is written in comes naturally to a human reader familiar with the language(s). Table 1.1 presents excerpts from Wikipedia articles in four different European languages on the topic of Natural Language Processing (NLP), labeled according to the language they are written in. Without referring to the labels, readers of this book will certainly recognize at least one language, and many are likely to identify all of them, even if they can’t read the content in all cases.

First Page

Last Page

DOI

10.1007/978-3-031-45822-4_1

Publication Date

1-2-2024

Keywords

Natural language processing systems, Speech recognition

Comments

IR conditions: non-described

Recommended Citation

T. Jauhiainen et al., "Introduction to Language Identification," Synthesis Lectures on Human Language Technologies, vol. Part F2039, pp. 1 - 17, Jan 2024.

The definitive version is available at https://doi.org/10.1007/978-3-031-45822-4_1

Additional Links

DOI link: https://doi.org/10.1007/978-3-031-45822-4_1

Link to Full Text

COinS

Natural Language Processing Faculty Publications

Introduction to Language Identification

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Browse

Contribute

Links

Natural Language Processing Faculty Publications

Introduction to Language Identification

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Share

Browse

Contribute

Links