목차

Title Page

Abstract

국문요약

Contents

Chapter 1. Introduction 23

1.1. Introduction 23

1.2. Outline of the Dissertation 29

Chapter 2. Improved Speech Enhancement Considering Speech PSD Uncertainty 32

2.1. Previous MMSE clean speech estimator considering speech PSD uncertainty 32

2.1.1. MMSE clean speech estimator based on speech PSD uncertainty 32

2.1.2. Conventional parameter estimation for speech enhancement 36

2.2. Proposed complete speech enhancement framework reflecting speech PSD uncertainty 40

2.2.1. Speech presence probability estimation considering speech PSD uncertainty 41

2.2.2. Noise PSD estimation incorporating speech PSD uncertainty 43

2.2.3. Improved Speech PSD estimation 43

2.2.4. Proposed gain function with speech PSD uncertainty 45

2.3. Experimental Results 50

2.3.1. Performance for noise PSD and speech power spectrum estimation 51

2.3.2. Performance for speech enhancement 56

2.3.3. Performance comparisons with a real-time deep learning-based speech enhancement system 59

2.4. Conclusion 60

Chapter 3. iDeepMMSE: An Improved Deep Learning Approach to MMSE Speech and Noise Power Spectrum Estimation for Speech Enhancement 63

3.1. Deep Xi and DeepMMSE Approaches for Speech Enhancement 63

3.1.1. Signal model 63

3.1.2. Deep Xi 64

3.1.3. DeepMMSE 66

3.2. Proposed Improved DeepMMSE Method 67

3.3. Experiments 71

3.3.1. Dataset 71

3.3.2. Experimental setup 72

3.3.3. Experimental results 73

3.4. Conclusion 75

Chapter 4. Improved Speech Spatial Covariance Matrix Estimation for Online Multi-channel Speech Enhancement 77

4.1. MMSE Multi-channel Speech Enhancement 77

4.1.1. Signal model 77

4.1.2. MWF and MVDR-Wiener filter factorization 78

4.1.3. Speech and noise SCM estimation 80

4.2. Proposed speech SCM estimation 84

4.3. Experiments 89

4.3.1. Experimental settings 89

4.3.2. Experimental results 91

4.3.3. Ablation study 92

4.4. Conclusion 93

Chapter 5. DNN-based Parameter Estimation for MVDR beamforming and Post-Filtering 95

5.1. MVDR beamforming and post-filtering 95

5.2. DNN-based parameter estimation for beamforming and post-filtering 98

5.2.1. DNN-based parameter estimation for MVDR beamforming 98

5.2.2. Deep Xi framework for post-filtering 101

5.3. Experiments 103

5.3.1. Experimental settings 103

5.3.2. Experimental results 104

5.4. Conclusion 105

Chapter 6. Conclusions 107

References 110

Appendix A. Derivation of the MMSE speech power spectrum estimator 128

Table 2.1. The logarithmic errors of the noise PSD estimator in [1] and the proposed method averaged over 7 noise types for various SNRs analyzed in Sec. III.B 54

Table 2.2. The logarithmic errors of the speech power spectrum estimators used as the inputs to the TCS speech PSD estimation for the baseline [2] and proposed systems... 55

Table 2.3. The noise reduction (NR), segmental SNR (SSNR) improvement, PESQ scores and STOI for the method in [2] and the proposed method averaged over 7 types... 55

Table 2.4. Performance comparison of the proposed system with the baseline [2] and a deep learning-based speech enhancement system, ERNN [3], for the VoiceBank-... 61

Table 3.1. The performance of speech enhancement for the causal and non-causal versions of the DEMUCS [5], DeepMMSE with TCN and Conformer, and proposed iDeep-... 74

Table 3.2. The performance of speech enhancement for the DeepMMSE with TCN structure [6], and the causal and non-causal versions of DeepMMSE with Conformer... 74

Table 3.3. The PESQ scores of DeepMMSE and proposed iDeepMMSE depending on the SNR for the Deep Xi dataset. 75

Table 4.1. The average PESQ scores for the different algorithms depending on the noise types. 92

Table 4.2. The average eSTOIs (x100) for the different algorithms depending on the noise types. 92

Table 4.3. The average SISDRs (in dB) for the different algorithms depending on the noise types. 93

Table 4.4. The PESQ scores, eSTOIs, and SISDRs averaged over all noise types for the proposed method with DNN-based a priori SPP estimation by replacing proposed... 94

Table 5.1. The performance of multi-channel speech enhancement for the previous approaches and the proposed methods on the CHiME-4 dataset. 104

Figure 2.1. Block diagram of the baseline speech enhancement system in [2] considering speech PSD uncertainty. 35

Figure 2.2. Block diagram of the proposed speech enhancement framework incorporating the speech PSD uncertainty into every component. 40

Figure 2.3. Examples for the empirical pdf of |S|² and the exponential model in (2.34).[이미지참조] 46

Figure 2.4. Empirically determined values and modeled function for the equivalent noise level ΛndB.[이미지참조] 48

Figure 2.5. Spectral gains Gp, Gb, and Wiener gain as functions of |Y|² when Φs/Φn ＝ 5 dB, µΦs ＝ -20.17 dB, σΦs ＝ 13.97 dB, and Q ＝ 10.[이미지참조] 49

Figure 2.6. Spectral gains Gp, Gb, and Wiener gain as functions of Φs/Φn when Φn ＝ -30 dB, µΦs ＝ -20.17 dB, σΦs ＝ 13.97 dB, and Q ＝ 10.[이미지참조] 50

Figure 2.7. The values of the parameters for hyperhyperprior used in the experiments. 52

Figure 2.8. The logarithmic errors of the noise PSD estimator in [1] and the proposed method for various noise types and SNRs. The graphs with red and gray are for the... 53

Figure 2.9. Example spectrograms of noisy and clean speeches and those enhanced by the baseline [2] and the proposed methods for the street noise with 5 dB SNR. 56

Figure 2.10. The NR, SSNR improvement, and the PESQ scores averaged over 7 types of noises depending on the SNR replacing the sub-modules of the baseline speech... 58

Figure 3.1. Block diagrams of the baseline DeepMMSE and proposed iDeepMMSE frameworks. 67

Figure 3.2. Conformer architecture. 71

Figure 4.1. Block diagram of the proposed multi-channel speech enhancement system. 86

Figure 5.1. Block diagram of the proposed multi-channel speech enhancement system. 98

Algorithm 1. Proposed Multi-channel Speech Enhancement Algorithm With Improved Speech SCM Estimation 89