This page contains audio samples of our paper “MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation” submitted to ICASSP 2023. Code is here.
In this section, accompaniments of each song were first removed by GSEP (https://studio.gaudiolab.io), provided by GAUDIO Lab, Inc.
We observed that STFT/iSTFT basis provide perceptually better output than learnable encoder-decoder framework, which was originally used in Conv-TasNet and many other literatures in speech separation. Since we did not use mixture consistency loss for training the models on this comparison, outputs of the models were loudness normalized to -27 LUFS to prevent the output scale exploding.