Controllable Music Inpainting with Mixed-Level and Disentangled Representation

Document Type

Conference Proceeding

Publication Title

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings


Music inpainting, which is to complete the missing part of a piece given some context, is an important task of automated music generation. In this study, we contribute a controllable inpainting model by combining the high expressivity of mixed-level, disentangled music representations and the strong predictive power of masked language modeling. The model enables flexible user controls over both time scope (inpainted length and location) and semantic features that composers often consider during composition, say rhythm pattern and chords. The key model design is to simultaneously predict disentangled representations of different time ranges. Such design aims to mirror the thought process of a professional composer who can take into account of the music flow of various semantic features at different hierarchies in parallel. Results show that our model produces much higher quality music compared to the baseline, and the subjective evaluation shows that our model generates much better results than the baseline and can generate melodies that are similar to human composition.



Publication Date



Music generation, Music representation learning, Self-supervised learning

This document is currently not available here.