This page contains audio samples of our paper “MedleyVox: An Evaluation Dataset for Multiple Singing Voices Separation” submitted to ICASSP 2023. Code is here.

Pop samples

In this section, accompaniments of each song were first removed by GSEP (, provided by GAUDIO Lab, Inc.


Main vs. rest

Duet audio samples in MedleyVox

Unison audio samples in MedleyVox

Main vs. rest audio samples in MedleyVox

STFT vs. Learnable basis

We observed that STFT/iSTFT basis provide perceptually better output than learnable encoder-decoder framework, which was originally used in Conv-TasNet and many other literatures in speech separation. Since we did not use mixture consistency loss for training the models on this comparison, outputs of the models were loudness normalized to -27 LUFS to prevent the output scale exploding.