권호기사보기
기사명 | 저자명 | 페이지 | 원문 | 기사목차 |
---|
대표형(전거형, Authority) | 생물정보 | 이형(異形, Variant) | 소속 | 직위 | 직업 | 활동분야 | 주기 | 서지 | |
---|---|---|---|---|---|---|---|---|---|
연구/단체명을 입력해주세요. |
|
|
|
|
|
* 주제를 선택하시면 검색 상세로 이동합니다.
Title Page
Abstract
국문 초록
Preface
Contents
Chapter 1. Introduction 17
Chapter 2. Related Works 20
2.1. Voice Conversion 20
2.2. Zero-shot Voice Style Transfer 21
2.3. Diffusion Probabilistic Models 23
Chapter 3. Method 24
3.1. Decoupled Denoising Diffusion Models 24
3.2. DDDM-VC 26
3.2.1. Speech Disentanglement 26
3.2.2. Speech Resynthesis 27
3.2.3. Prior Mixup 29
Chapter 4. Experiment and Result 32
4.1. Experimental Setup 32
4.1.1. Datasets 32
4.1.2. Preprocessing 32
4.1.3. Training 33
4.2. Evaluation Metrics 33
4.2.1. Subjective Metrics 33
4.2.2. Objective Metrics 34
4.3. Many-to-Many Voice Conversion 35
4.4. Zero-shot Voice Conversion 36
4.5. One-shot Speaker Adaptation 36
4.6. Zero-shot Cross-lingual Voice Conversion 37
4.7. Ablation Study 38
4.7.1. Prior Mixup 38
4.7.2. Disentangled Denoiser 39
4.7.3. Normalized F0 40
4.7.4. Data-driven Prior 40
Chapter 5. Conclusion 41
Chapter 6. Broader Impact and Limitation 42
6.1. Broader Impact 42
6.2. Limitation 42
Appendices 43
Appendix A. Implementation Details 43
A.1. DDDM-VC 43
A.2. Baseline Models 44
A.3. Computational Cost and Inference Speed 45
A.4. Vocoder 45
Appendix B. Speech Resynthesis on LibriTTS 49
Appendix C. Zero-shot Voice Conversion on LibriTTS 51
Appendix D. One-shotSpeakerAdaptation 53
Appendix E. Style Control 55
Appendix F. t-SNE Visualization 56
Appendix G. Evaluation Details 58
Appendix H. Comparison with PPG-based Voice Conversion Model 61
Appendix I. Comparison with Traditional Voice Conversion Methods 63
Appendix J. Audio Mixing 64
J.1. DDDM-Mixer 64
J.2. Implementation Details 65
J.3. Evaluation 66
Appendix K. Text-to-Speech 67
K.1. DDDM-TTS 67
K.2. Text-to-Vec (TTV) 67
K.3. Implementation Details 69
K.4. Evaluation 69
Bibliography 71
Figure 1.1. Speech synthesis in DDDM and standard diffusion model. Although a single denoiser with same parameter is used for all denoising steps in standard diffusion models,... 18
Figure 3.1. Overall framework of DDDM-VC 25
Figure 3.2. (a) Speech resynthesis from disentangled speech representations (training). (b) Voice conversion from converted speech representations (inference). (c) Prior mixup for... 28
Figure 4.1. CER results for zero-shot cross-lingual VC on unseen languages from CSS10 multi-lingual dataset. 37
Figure D.1. EER and SECS results according to training steps of fine-tuning for one-shot speaker adaptation. 53
Figure D.2. CER and WER results according to training steps of fine-tuning for one-shot speaker adaptation. 53
Figure F.1. t-SNE visualization of content and speaker representation. 57
Figure G.1. The screenshots of the Amazon MTurk MOS survey. $0.08 per 1 hit is paid to participants for nMOS and sMOS. 58
Figure J.1. Overall framework of DDDM-Mixer 65
Figure K.1. Overall framework of DDDM-TTS 68
*표시는 필수 입력사항입니다.
*전화번호 | ※ '-' 없이 휴대폰번호를 입력하세요 |
---|
기사명 | 저자명 | 페이지 | 원문 | 기사목차 |
---|
번호 | 발행일자 | 권호명 | 제본정보 | 자료실 | 원문 | 신청 페이지 |
---|
도서위치안내: / 서가번호:
우편복사 목록담기를 완료하였습니다.
*표시는 필수 입력사항입니다.
저장 되었습니다.