Controllable Music Inpainting with Mixed-Level and Disentangled Representation
Document Type
Conference Proceeding
Publication Title
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Abstract
Music inpainting, which is to complete the missing part of a piece given some context, is an important task of automated music generation. In this study, we contribute a controllable inpainting model by combining the high expressivity of mixed-level, disentangled music representations and the strong predictive power of masked language modeling. The model enables flexible user controls over both time scope (inpainted length and location) and semantic features that composers often consider during composition, say rhythm pattern and chords. The key model design is to simultaneously predict disentangled representations of different time ranges. Such design aims to mirror the thought process of a professional composer who can take into account of the music flow of various semantic features at different hierarchies in parallel. Results show that our model produces much higher quality music compared to the baseline, and the subjective evaluation shows that our model generates much better results than the baseline and can generate melodies that are similar to human composition.
DOI
10.1109/ICASSP49357.2023.10096446
Publication Date
5-5-2023
Keywords
Music generation, Music representation learning, Semantics, Predictive models, Signal processing, Rhythm, Acoustics, Multiple signal classification, Mirrors
Recommended Citation
S. Wei, Z. Wang, W. Gao and G. Xia, "Controllable Music Inpainting with Mixed-Level and Disentangled Representation," ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.
Comments
IR conditions: non-described