BS-Mamba: Band-split Structured State Space Duality for Music Source Separation

0. Contents

  1. Abstract
  2. Samples of MUSDB18 test set


1. Abstract

Recent years, with the ongoing development of deep learning, data-driven approaches have achieved remarkable success in music source separation. However, RNN-based methods have limited modeling capabilities, and Transformer-based methods have high computational complexity, limiting their practical applications. Due to the robust performance and the reduced computational requirements, Mamba has recently attracted significant attention. In this paper, we proposed a state space based model, namely BS-Mamba for music source separation. The BS-Roformer was adopted as the foundational framework and the Rope-Transformer within the framework was replaced with Mamba blocks. Specifically, we designed a T-F Mamba structure that combines Mamba block with multi-head self-attention, broadening contextual modeling capabilities. Experiment results show that BS-Mamba presents significant advantages in music source separation. We achieved comparable performance with only 50\% of parameters compared to BS-Roformer. The proposed model achieves a signal-to-distortion (SDR) of 9.96dB on the MUSDB18-HQ dataset without using extra data.



Samples of MUSDB18 test set

BKS - Bulldozer

mixture:
Stems SCNet BS-Mamba Label
Vocals
Bass
Drums
Other


Enda Reilly - Cur An Long Ag Seol

mixture:
Stems SCNet BS-Mamba Label
Vocals
Bass
Drums
Other


Louis Cressy Band - Good Time

mixture:
Stems SCNet BS-Mamba Label
Vocals
Bass
Drums
Other