Document Type
Article
Publication Title
Speech Communication
Abstract
Automatic Speech Recognition (ASR) systems can be trained to achieve remarkable performance given large amounts of manually transcribed speech, but large labeled data sets can be difficult or expensive to acquire for all languages of interest. In this paper, we review the research literature to identify models and ideas that could lead to fully unsupervised ASR, including unsupervised sub-word and word modeling, unsupervised segmentation of the speech signal, and unsupervised mapping from speech segments to text. The objective of the study is to identify the limitations of what can be learned from speech data alone and to understand the minimum requirements for speech recognition. Identifying these limitations would help optimize the resources and efforts in ASR development for low-resource languages. © 2022 The Author(s)
First Page
76
Last Page
91
DOI
10.1016/j.specom.2022.02.005
Publication Date
4-2022
Keywords
Mapping, Speech, Automatic speech recognition, Automatic speech recognition system, Cross-modal, Cross-modal mapping, Data set, Labeled data, Large amounts, Performance, Speech segmentation, Unsupervised automatic speech recognition, Speech recognition
Recommended Citation
H. Aldarmaki, A. Ullah, S. Ram, and N. Zaki, “Unsupervised automatic speech recognition: A Review,” Speech Communication, vol. 139, pp. 76–91, 2022. doi:10.1016/j.specom.2022.02.005
Comments
Hybrid Gold Open Access
Archived, thanks to Elsevier ScienceDirect
License: CC BY NC-ND 4.0
Uploaded 29 November 2023