Machine Learning Faculty Publications

Interpreting Song Lyrics With An Audio-Informed Pre-Trained Language Model

Yixiao Zhang, Centre for Digital Music, Queen Mary University of London, United Kingdom
Junyan Jiang, Music X Lab, NYU Shanghai, China & Mohamed bin Zayed University of Artificial Intelligence
Gus Xia, Music X Lab, NYU Shanghai, China & Mohamed bin Zayed University of Artificial IntelligenceFollow
Simon Dixon, Centre for Digital Music, Queen Mary University of London, United Kingdom

Document Type

Article

Publication Title

arXiv

Abstract

Lyric interpretations can help people understand songs and their lyrics quickly, and can also make it easier to manage, retrieve and discover songs efficiently from the growing mass of music archives. In this paper we propose BART-fusion, a novel model for generating lyric interpretations from lyrics and music audio that combines a large-scale pre-trained language model with an audio encoder. We employ a cross-modal attention module to incorporate the audio representation into the lyrics representation to help the pre-trained language model understand the song from an audio perspective, while preserving the language model’s original generative performance. We also release the Song Interpretation Dataset, a new large-scale dataset for training and evaluating our model. Experimental results show that the additional audio information helps our model to understand words and music better, and to generate precise and fluent interpretations. An additional experiment on cross-modal music retrieval shows that interpretations generated by BART-fusion can also help people retrieve music more accurately than with the original BART. 1 © 2022, CC BY.

DOI

10.48550/arXiv.2208.11671

Publication Date

8-24-2022

Keywords

Audio acoustics, Computational linguistics, Large dataset

Comments

Preprint: arXiv

Archived with thanks to arXiv

Preprint License: CC by 4.0

Uploaded 27 September 2022

Recommended Citation

Y. Zhang, J. Jiang, G. Xia, and S. Dixon, "Interpreting Song Lyrics With An Audio-Informed Pre-Trained Language Model", 2022, doi:10.48550/arXiv.2208.11671

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Machine Learning Faculty Publications

Interpreting Song Lyrics With An Audio-Informed Pre-Trained Language Model

Document Type

Publication Title

Abstract

DOI

Publication Date

Keywords

Comments

Recommended Citation

Included in

Browse

Contribute

Links

Machine Learning Faculty Publications

Interpreting Song Lyrics With An Audio-Informed Pre-Trained Language Model

Authors

Document Type

Publication Title

Abstract

DOI

Publication Date

Keywords

Comments

Recommended Citation

Included in

Share

Browse

Contribute

Links