SB-SENet:

SB-SENet: Diffusion Model Based on Schrödinger Bridge for Speech Enhancement


Huaifeng Zhang 1, Guigeng Li1, Peifei Wu1, Yong Gao2, Hao Zhang1*,

1College of Electronic Engineerin, Ocean University of China, QingDao, 266100, China

2College of Shipbuilding Engineering, Qingdao Innovation and Development Base of Harbin Engineering University, QingDao, 266000, China

Abstract.

Score-based generative models and diffusion models are increasingly being applied in the field of speech enhancement, demonstrating remarkable performance. However, the lack of accurate structural information in mixed speech samples, which combine speech and Gaussian noise, still poses inference challenges and thereby affects speech quality. This paper introduces a novel generative model, SB-SENet, based on the Schrödinger bridge for speech enhancement. The Schrödinger bridge constructs the optimal transport path from the initial probability distribution to the target probability distribution by minimizing the Kullback-Leibler divergence cost function. This process is part of the entropy-regularized optimal path solution, aiming to approximate the noisy speech sample to the clean speech sample through probability distributions to obtain the predicted sample. Unlike diffusion models, which first learn the forward diffusion process from the noisy speech sample to a Gaussian distribution, SB-SENet directly learns the nonlinear diffusion process from the noisy speech sample to the clean speech sample, preserving more structural information about the initial sample. SB-SENet model utilizes a Transformer to capture unique features of time-series signals and a U-Net network to fuse multi-scale information. The loss function is constructed using a score-based generative framework, incorporating phase loss, magnitude loss, and metric loss to gradually reduce the difference between the predicted sample and the clean speech sample. Experimental results show that, in terms of speech quality, the PESQ score of the enhanced speech by the SB-SENet model proposed in this paper reaches 3.79, achieving state-of-the-art performance compared to recent speech enhancement models.





This page is for research demonstration purposes only.

SB-SENet Model Architecture

Network structure of SB-SENet. Key components include: (a) Magnitude Modulation Module, (b) Schrödinger Bridge-based SDE Solver, (c) Transformer Network, and (d) U-Net Network.

Indicator comparison in denoising experiments

Models Year Input Size(M) PESQ↑ CSIG↑ CBAK↑ COVL↑ STOI↑
Noisy - - - 1.97 3.35 2.44 2.63 0.91
SEGAN 2017 Waveform 43.2 2.16 3.48 2.94 2.80 0.92
TSTNN 2021 Waveform 0.92 2.96 4.10 3.77 3.52 0.95
DEMUCS 2021 Waveform 33.5 3.07 4.31 3.40 3.63 0.95
SE-Conformer 2021 Waveform - 3.13 4.45 3.55 3.82 0.95
MetricGAN 2019 Magnitude - 2.86 3.99 3.18 3.42 -
MetricGAN+ 2021 Magnitude 2.60 3.15 4.14 3.16 3.64 0.93
PFPL 2020 Complex - 3.15 4.18 3.60 3.67 0.95
DPT-FSNet+ 2021 Complex 0.91 3.33 4.58 3.72 4.00 0.96
TridentSE+ 2023 Complex 3.03 3.47 4.70 3.81 4.10 0.96
DB-AIAT 2021 Complex+Magnitude 2.81 3.31 4.61 3.75 3.96 0.96
CMGAN 2022 Complex+Magnitude 1.83 3.41 4.63 3.94 4.12 0.96
SCP-CMGAN 2022 Complex+Magnitude 1.93 3.52 4.75 3.97 4.25 0.96
PHASEN 2020 Magnitude+Phase 20.9 2.99 4.21 3.55 3.61 -
MP-SENet 2023 Magnitude+Phase 2.26 3.50 4.73 3.95 4.22 0.96
SEMamba 2024 Magnitude+Phase 6.49 3.55 4.77 3.95 4.29 0.96
SB-SENet(Ours) 2025 Complex 68.5 3.79 4.72 3.73 4.36 0.95

Output Speech of Ablation Experiments

Select 10 groups of speech samples from each of p232 and p257.
name Clean Noise -MGloss -PESQloss -PESQweight -Trans -SBVE SB-SENet
p232_101
p232_102
p232_103
p232_104
p232_105
p232_106
p232_107
p232_108
p232_109
p232_110
p257_421
p257_422
p257_423
p257_424
p257_425
p257_426
p257_427
p257_428
p257_429
p257_430

Output Speech of Ablation Experiments

Randomly select 10 groups of speech samples from each of p232 and p257.
name Clean Noise CDiffuSE SB-SENet MetricGAN+ SEGAN SEMamba-PCS SEMamba SGMSE+ Spectral-Mask MP-SENet
p232_005
p232_013
p232_095
p232_121
p232_151
p232_162
p232_186
p232_369
p232_403
p232_414
p257_058
p257_139
p257_152
p257_194
p257_199
p257_272
p257_338
p257_367
p257_402
p257_403