생몰정보
소속
직위
직업
활동분야
주기
서지
국회도서관 서비스 이용에 대한 안내를 해드립니다.
검색결과 (전체 1건)
원문 있는 자료 (1) 열기
원문 아이콘이 없는 경우 국회도서관 방문 시 책자로 이용 가능
목차보기더보기
title page
Abstract
국문요약
Acknowledgements
Contents
Chapter 1. Introduction 19
1.1. Speech Recognition 20
1.2. Non-native Speech Recognition 21
1.3. Related works 22
1.4. Thesis Organization 24
Chapter 2. ASR System : An Overview 25
2.1. Feature Extraction 25
2.2. Stochastic Modeling of Speech 27
2.2.1. Hidden Markov Model 28
2.2.2. Decoding algorithm - Viterbi 31
2.3. Acoustic Model 32
2.4. Pronunciation Model 33
2.5. Language Model 34
2.6. Experiments and Results 35
2.6.1. Speech database 35
2.6.2. Baseline ASR system 36
2.6.3. Performance evaluation of the baseline ASR system 37
Chapter 3. Pronunciation Model Adaptation 39
3.1. The state-of-the-art in pronunciation adaptation method 39
3.2. Pronunciation adaptation for non-native speech 41
3.2.1. Phoneme recognition and alignment sequence 42
3.2.2. Deriving rules using a decision tree and adapting a dictionary 45
3.3. Example of pronunciation modeling and optimization of a dictionary 46
3.3.1. Phoneme recognition and alignment sequence for native and non-native speech 46
3.3.2. Deriving rules using a decision tree and adapting a dictionary 50
3.4. Experiments and Results 53
3.5. Discussion 55
Chapter 4. Confusability Reduction of Multiple Pronunciation Dictionary 57
4.1. Confusability measure 58
4.1.1. Levenshtein distance 58
4.1.2. Modified Levenshtein distance 60
4.2. Example of confusability reduction 61
4.3. Experiments and Results 62
4.4. Discussion 66
Chapter 5. Combined Method 67
5.1. Decomposition of pronunciation variability for non-native speech 67
5.1.1. Data-driven pronunciation variability analysis 67
5.1.2. Context-independent and context-dependent pronunciation variability 68
5.2. Combination of acoustic and pronunciation model adaptation for non-native speech 68
5.2.1. Acoustic model adaptation 70
5.2.2. Combined method 71
5.3. Experiments and Results 72
Chapter 6. Conclusion and Future Work 75
6.1. Conclusion 75
6.2. Future work 77
References 78
Table 2.1: List of Korean phonemes for native and non-native ASR. 36
Table 2.2: Comparing of the average word error rates (%) of the baseline ASR system using the dictionaries obtained by canonical (CC_Dict), knowledge-based (KB_Dict), and hand-labeled (HL_Dict) transcriptions. 38
Table 3.1: Example of three reference sequences obtained by canonical, knowledge-based, and hand-labeled transcriptions, and an alternative phonetic sequence after recognizing a Korean utterance : "그래서 여러 가지로 의미가 깊은 달이기 때문입니다," which in English means "This is because this month has several deep meanings." 47
Table 3.2: The rule pattern is obtained using Eq. (3.1) for the sentence in Table 3.1. 49
Table 3.3: Comparison of the average word error rate (%) of the non-native and native ASR systems employing the dictionaries adapted by either non-native rules or native rules. 54
Table 3.4: Comparison of the average word error rate (%) of the non-native and native ASR systems employing the dictionaries adapted by the combination of non-native rules and native rules. 55
Table 4.1: Example of confusability measure (CM) scores for all the pronunciation variations obtained by the indirect data-driven method, where a Korean word "멍해" meaning "stupid" is transcribed as /m v N h E/. 61
Table 4.2: Performance evaluation of an ASR system with a) the baseline dictionary, b) a multiple dictionary prior to reduction, and c) optimized multiple pronunciation dictionary by the proposed confusability reduction method. 65
Table 5.1: Comparison of the WERs (%) of the baseline ASR system, an ASR system with adapted acoustic models (adapted-AM), an ASR system with adapted pronunciation model with the baseline acoustic models (adapted-PM), and an ASR system... 73
Figure 1.1: The speech chain 20
Figure 1.2: The motivation of handling non-native speech recognition. 22
Figure 1.3: Three major approaches of handling non-native speech for ASR. 23
Figure 2.1: The overall structure of the construction for the continuous speech recognition system. 26
Figure 2.2: An example of left-to-right HMM model 30
Figure 2.3: The Viterbi algorithm 31
Figure 2.4: An example of pronunciation models about the word, "학교", a) single pronunciation model and b) multiple pronunciation model. 33
Figure 3.1: Procedure for the proposed pronunciation variation modeling method based on an indirect data-driven approach applied to native and non-native speech. 43
Figure 3.2: Example of decision tree building to derive pronunciation variation rules for a phone 'k.' 51
Figure 4.1: Comparison of the average WER (%) of the non-native ASR systems using the multiple pronunciation dictionary optimized (a) by the Levenshtein distance and (b) by the modified Levenshtein distance according to different CM threshold. 63
Figure 5.1: The procedure of the proposed combination adaptation method. 69
Figure 5.2: An example of a) a decision tree for the phoneme /p/ and b) a decision tree for the phoneme /o/ and /v/ for acoustic model adaptation. 71
Figure 6.1: The summary of evaluations for proposed methods. 76
도서위치안내: / 서가번호:
우편복사 목록담기를 완료하였습니다.
* 표시는 필수사항 입니다.
* 주의: 국회도서관 이용자 모두에게 공유서재로 서비스 됩니다.
저장 되었습니다.
로그인을 하시려면 아이디와 비밀번호를 입력해주세요. 모바일 간편 열람증으로 입실한 경우 회원가입을 해야합니다.
공용 PC이므로 한번 더 로그인 해 주시기 바랍니다.
아이디 또는 비밀번호를 확인해주세요