Natural Language Processing Dissertations and Theses

Rebooting Language Models for Speech

Amirbek Djanibekov, Mohamed bin Zayed University of Artificial IntelligenceFollow

Date of Award

4-30-2024

Document Type

Thesis

Degree Name

Master of Science in Natural Language Processing

Department

Natural Language Processing

First Advisor

Hanan Aldarmaki

Second Advisor

Gus Xia

Abstract

Integrating speech directly into the text domain has significantly improved the traditional two-step process of converting speech to text and then processing the text. Recent publications even showcase integrating the Large Language Model for context recognition of speech modality. While most methods employ the output of the intermediate layer of the pre-trained models or direct placement of speech hidden representation instead of text embedding space, there is potential in exploring alternative approaches that use querying text information from speech representation context. Exploring alternative methods that derive text information directly from the context of speech representations presents opportunities for efficiency improvements, such as reduced storage needs, parameter efficient computation, etc.. In this study, we propose a new training protocol for speech that utilizes speech codes from the neural encodec model in Automatic Speech Recognition and Automatic Speech Translation tasks, which re-frames sequence classification objectives to generative. Our experiments on the LibriSpeech dataset reveals that our proposed method is effective, though it encounters some challenges with accurately matching the target text. Through evaluating the model’s performance against established benchmarks, we infer that the generated outputs bear a high correlation with the semantic representation of the gold standard labels.

Comments

Thesis submitted to the Deanship of Graduate and Postdoctoral Studies In partial fulfilment of the requirements for the M.Sc degree in Science in Natural Language Processing Advisors: Hanan Aldarmaki,Gus Xia with 2 years embargo period

Recommended Citation

A. Djanibekov, "Rebooting Language Models for Speech,", Apr 2024.

This document is currently not available here.

COinS

Natural Language Processing Dissertations and Theses

Rebooting Language Models for Speech

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Abstract

Comments

Recommended Citation

Browse

Contribute

Links

Natural Language Processing Dissertations and Theses

Rebooting Language Models for Speech

Author

Date of Award

Document Type

Degree Name

Department

First Advisor

Second Advisor

Abstract

Comments

Recommended Citation

Share

Browse

Contribute

Links