A joint model for pronunciation assessment and mispronunciation detection and diagnosis [전자자료] = 자동발음평가-발음오류검출 통합 모델 / Hyungshin Ryu

국회도서관 홈으로 정보검색 소장정보 검색

결과 내 검색

동의어 포함

고급검색

상세검색
저자 검색
관련 키워드 검색
주제별 검색

완전일치
전방일치
후방일치

인명/단체명

	저자정보	상세정보
인명/단체명을 입력하세요.

전방일치
완전일치
후방일치
부분일치

키워드

대표어
외국어
네이버 백과사전

용어관계 검색결과
대표어	동의어	상위어	하위어	관련어	대립어

대분류

중분류

소분류

소장자료
외부기관 자료

학위논문 A joint model for pronunciation assessment and mispronunciation detection and diagnosis [전자자료] = 자동발음평가-발음오류검출 통합 모델

저자명
Hyungshin Ryu
발행사항
서울 : 서울대학교 대학원, 2023.8
청구기호
전자형태로만 열람 가능함
형태사항
1 온라인자료 : PDF
자료실 전자자료
제어번호
KDMT12024000006536
주기사항
학위논문(석사) -- 서울대학교 대학원, Linguistics Major, 2023.8. 지도교수: Minhwa Chung
연계정보
원문
외부기관 원문

목차보기

Title Page

Abstract

Contents

Chapter 1. Introduction 10

Chapter 2. Related work 14

2.1. Acoustic models 14

2.1.1. Self-supervised learning model 16

2.2. Acoustic features 22

2.2.1. Goodness-of-Pronunciation measure 22

2.3. The limitation of APA and MDD works 24

Chapter 3. Methodology 26

3.1. Proposed method 26

3.1.1. Pre-trained self-supervised learning model 27

3.1.2. Phone recognition model 27

3.1.3. Joint model 28

3.2. Experiment settings 32

3.2.1. Datasets 32

3.2.2. Evaluation metrics 34

3.2.3. Implementation and experiment details 35

Chapter 4. Results 37

4.1. Comparison of results on APA, MDD, and joint models of different architectures 37

4.2. Comparison of results on auxiliary phone recognition 44

4.3. Comparison of results on self-supervised learning model 46

4.4. Results Analysis 48

4.4.1. Analysis on model pronunciation assessment 48

4.4.2. Analysis on model mispronunciation detection and diagnosis 50

Chapter 5. Discussion 56

5.1. Correlation analysis on human assessments and model assessments 56

5.2. Analysis on multi-task learning loss weight 58

Chapter 6. Conclusion 61

References 62

Appendix 69

국문 초록 74

List of Tables

Table 1. Summary of the datasets used for experiments 32

Table 2. Experiment results for APA task with regard to multi-task learning for the architecture CE 37

Table 3. Experiment results for APA task with regard to multi-task learning for the architecture RMSE 38

Table 4. Experiment results for APA task with regard to multi-task learning for the architecture RMSE+GOP 39

Table 5. Experiment results for MDD task with regard to multi-task learning for the architecture CE 41

Table 6. Experiment results for MDD task with regard to multi-task learning for the architecture RMSE 42

Table 7. Experiment results for MDD task with regard to multi-task learning for the architecture RMSE+GOP 42

Table 8. Experiment results APA task with regard to using a fine-tuned model as backbone architecture for the joint model (RMSE+GOP) 44

Table 9. Experiment results for MDD task with regard to using a fine-tuned model as backbone architecture for the joint model (RMSE+GOP) 45

Table 10. Experiment results for APA task with regard to using different backbone self-supervised learning model for the joint model (RMSE+GOP) 47

Table 11. Experiment results for MDD task with regard to using different backbone self-supervised learning model for the joint model (RMSE+GOP) 47

Table 12. Experiment results for APA task with regard to different multi-task learning loss weight for Joint-CAPT-SSL (RMSE+GOP) 59

Table 13. Experiment results for MDD task with regard to different multi-task learning loss weight for Joint-CAPT-SSL (RMSE+GOP) 60

List of Figures

Figure 1. Illustration of the original Wav2vec2.0 from Baevski et al. (2020) 18

Figure 2. Illustration of XLSR from Conneau et al. (2021) 19

Figure 3. Illustration of HuBERT from Hsu, Bolte, et al. (2021) 20

Figure 4. Illustration of WavLM from S. Chen et al. (2022) 21

Figure 5. The training process of the proposed method 26

Figure 6. The architecture of the joint model (CE/RMSE) 29

Figure 7. The architecture of the joint model (RMSE+GOP) 31

Figure 8. Confusion matrices of APA-SSL (RMSE+GOP) 49

Figure 9. Confusion matrices of Joint-CAPT-SSL (RMSE+GOP) 50

Figure 10. False Rejection rate of each consonant on correct pronunciations for Joint-CAPT-SSL (RMSE+GOP) 51

Figure 11. False Rejection rate of each vowel on correct pronunciations for Joint-CAPT-SSL (RMSE+GOP) 52

Figure 12. False Acceptance rate of each consonant on mispronunciations for Joint-CAPT-SSL (RMSE+GOP) 53

Figure 13. False Acceptance rate of each vowel on mispronunciations for Joint-CAPT-SSL (RMSE+GOP) 55

Figure 14. Correlation between pronunciation scores of four aspects and the number of mispronunciations predicted by Joint-CAPT-SSL (RMSE+GOP) 57

Figure 15. Confusion matrices of APA-L1 (RMSE+GOP) 69

Figure 16. Confusion matrices of Joint-CAPT-L1 (RMSE+GOP) 70

Figure 17. False Rejection rate of each consonant on correct pronunciations for MDD-SSL (RMSE+GOP) 71

Figure 18. False Rejection rate of each vowel on correct pronunciations for MDD-SSL 71

Figure 19. False Acceptance rate of each consonant on mispronunciations for MDD-SSL (RMSE+GOP) 72

Figure 20. False Acceptance rate of each vowel on mispronunciations for MDD-SSL 72

Figure 21. Correlation between pronunciation scores of four aspects and the number of mispronunciations predicted by Joint-CAPT-L1 (CE) 73

초록보기

실증 연구에 의하면 비원어민 발음 평가에 있어 전문 평가자가 채점하는 발음 점수와 음소 오류 사이의 상관관계는 매우 높다. 그러나 기존의 컴퓨터기반발음훈련 (Computer-assisted Pronunciation Training; CAPT) 시스템은 자동발음평가 (Automatic Pronunciation Assessment; APA) 과제 및 발음오류검출 (Mispronunciation Detection and Diagnosis; MDD) 과제를 독립적인 과제로 취급하며 각 모델의 성능을 개별적으로 향상시키는 것에만 초점을 두었다. 본 연구에서는 두 과제 사이의 높은 상관관계에 주목, 다중작업학습 기법을 활용하여 자동발음평가와 발음오류검출 과제를 동시에 훈련하는 새로운 아키텍처를 제안한다. 구체적으로는 APA 과제를 위해 교차 엔트로피 손실함수 및 RMSE 손실함수를 실험하며, MDD 손실함수는 CTC 손실함수로 고정된다. 근간 음향 모델은 사전훈련된 자기지도학습기반 모델로 하며, 이때 더욱 풍부한 음향 정보를 위해 다중작업학습을 거치기 전에 부수적으로 음소인식에 대하여 미세조정되기도 한다. 음향 모델과 함께 발음적합점수(Goodness-of-Pronunciation; GOP)가 추가적인 입력으로 사용된다.

실험 결과, 통합 모델이 단일 자동발음평가 및 발음오류검출 모델보다 매우 높은 성능을 보였다. 구체적으로는 Speechocean762 데이터셋에서 자동발음평가 과제에 사용된 네 항목의 점수들의 평균 피어슨상관계수가 0.041 증가하였으며, 발음오류검출 과제에 대해 F1 점수가 0.003 증가하였다. 통합 모델에 대해 시도된 아키텍처 중에서는, Robust Wav2vec2.0 음향모델과 발음적합점수를 활용하여 RMSE/CTC 손실함수로 훈련한 모델의 성능이 가장 좋았다. 모델을 분석한 결과, 통합 모델이 개별 모델에 비해 분포가 낮은 점수 및 발음오류를 더 정확하게 구분하였음을 확인할 수 있었다.

흥미롭게도 통합 모델에 있어 각 하위 과제들의 성능 향상 정도는 각 발음 점수와 발음 오류 레이블 사이의 상관계수 크기에 비례하였다. 또 통합 모델의 성능이 개선될수록 모델의 예측 발음점수, 그리고 모델의 예측 발음오류에 대한 상관성이 높아졌다. 본 연구 결과는 통합 모델이 발음 점수 및 음소 오류 사이의 언어학적 상관성을 활용하여 자동발음평가 및 발음오류검출 과제의 성능을 향상시켰으며, 그 결과 통합 모델이 전문 평가자들의 실제 비원어민 평가와 비슷한 양상을 띤다는 것을 보여준다.

자료명
저자사항
제어번호
*요청자 이름
*전화번호	휴대폰 번호를 입력하세요.
*이메일	@
*요청내용
*오류항목

* 서재명
설명
* 공개수준	비공개 완전공개 * 주의: 국회도서관 이용자 모두에게 공유서재로 서비스 됩니다.

고급검색

다국어입력

학위논문 A joint model for pronunciation assessment and mispronunciation detection and diagnosis [전자자료] = 자동발음평가-발음오류검출 통합 모델

목차보기

초록보기

추천서가 (다양한 추천 자료를 만나보세요)

권호

알림톡 발송로 자료명, 기사명/저자명, 수록지명, 자료실, 서가번호, 전화번호로 구성되어 있습니다.




전화번호

연속간행물 상세정보 입니다.
청구기호
자료명/저자사항
발행사항
형태사항
ISSN

고급검색

다국어입력

학위논문 A joint model for pronunciation assessment and mispronunciation detection and diagnosis [전자자료] = 자동발음평가-발음오류검출 통합 모델

목차보기

초록보기

추천서가 (다양한 추천 자료를 만나보세요)

MARC 보기

오류 데이터 정정요청

알림톡 발송

권호기사보기

연속간행물 권호 선택

연속간행물 권호 선택

우편복사 안내

도서위치안내(서울관)

저자프로필

목차보기

우편복사 안내

우편복사 목록담기

확인

내서재에 담기

새로운 서재

저장

로그인

권호