Abstract
We propose Schrödinger Bridge Mamba (SBM), a new concept of training-inference framework motivated by the inherent compatibility between Schrödinger Bridge (SB) training paradigm and selective state-space model architecture. We exemplify the concept of SBM with an implementation for speech enhancement. Experiments on a joint denoising and dereverberation task using four benchmark datasets demonstrate that SBM, with only 1-step inference, outperforms strong baselines with 1-step or iterative inference and achieves the best real-time factor (RTF). The integration of SB paradigm and state-space model indicates a promising direction for exploring new deep generative models, with strong potential for application in a broad range of generative tasks beyond audio.
DNS Real Recordings
































































DNS With Reverb
































































DNS No Reverb
































































VoiceBand-Demand































































