본문 바로가기 주메뉴 바로가기
국회도서관 홈으로 정보검색 소장정보 검색

목차보기

title page

Abstract

국문요약

Acknowledgements

Contents

Chapter 1. Introduction 19

1.1. Speech Recognition 20

1.2. Non-native Speech Recognition 21

1.3. Related works 22

1.4. Thesis Organization 24

Chapter 2. ASR System : An Overview 25

2.1. Feature Extraction 25

2.2. Stochastic Modeling of Speech 27

2.2.1. Hidden Markov Model 28

2.2.2. Decoding algorithm - Viterbi 31

2.3. Acoustic Model 32

2.4. Pronunciation Model 33

2.5. Language Model 34

2.6. Experiments and Results 35

2.6.1. Speech database 35

2.6.2. Baseline ASR system 36

2.6.3. Performance evaluation of the baseline ASR system 37

Chapter 3. Pronunciation Model Adaptation 39

3.1. The state-of-the-art in pronunciation adaptation method 39

3.2. Pronunciation adaptation for non-native speech 41

3.2.1. Phoneme recognition and alignment sequence 42

3.2.2. Deriving rules using a decision tree and adapting a dictionary 45

3.3. Example of pronunciation modeling and optimization of a dictionary 46

3.3.1. Phoneme recognition and alignment sequence for native and non-native speech 46

3.3.2. Deriving rules using a decision tree and adapting a dictionary 50

3.4. Experiments and Results 53

3.5. Discussion 55

Chapter 4. Confusability Reduction of Multiple Pronunciation Dictionary 57

4.1. Confusability measure 58

4.1.1. Levenshtein distance 58

4.1.2. Modified Levenshtein distance 60

4.2. Example of confusability reduction 61

4.3. Experiments and Results 62

4.4. Discussion 66

Chapter 5. Combined Method 67

5.1. Decomposition of pronunciation variability for non-native speech 67

5.1.1. Data-driven pronunciation variability analysis 67

5.1.2. Context-independent and context-dependent pronunciation variability 68

5.2. Combination of acoustic and pronunciation model adaptation for non-native speech 68

5.2.1. Acoustic model adaptation 70

5.2.2. Combined method 71

5.3. Experiments and Results 72

Chapter 6. Conclusion and Future Work 75

6.1. Conclusion 75

6.2. Future work 77

References 78

List of Tables

Table 2.1: List of Korean phonemes for native and non-native ASR. 36

Table 2.2: Comparing of the average word error rates (%) of the baseline ASR system using the dictionaries obtained by canonical (CC_Dict), knowledge-based (KB_Dict), and hand-labeled (HL_Dict) transcriptions. 38

Table 3.1: Example of three reference sequences obtained by canonical, knowledge-based, and hand-labeled transcriptions, and an alternative phonetic sequence after recognizing a Korean utterance : "그래서 여러 가지로 의미가 깊은 달이기 때문입니다," which in English means "This is because this month has several deep meanings." 47

Table 3.2: The rule pattern is obtained using Eq. (3.1) for the sentence in Table 3.1. 49

Table 3.3: Comparison of the average word error rate (%) of the non-native and native ASR systems employing the dictionaries adapted by either non-native rules or native rules. 54

Table 3.4: Comparison of the average word error rate (%) of the non-native and native ASR systems employing the dictionaries adapted by the combination of non-native rules and native rules. 55

Table 4.1: Example of confusability measure (CM) scores for all the pronunciation variations obtained by the indirect data-driven method, where a Korean word "멍해" meaning "stupid" is transcribed as /m v N h E/. 61

Table 4.2: Performance evaluation of an ASR system with a) the baseline dictionary, b) a multiple dictionary prior to reduction, and c) optimized multiple pronunciation dictionary by the proposed confusability reduction method. 65

Table 5.1: Comparison of the WERs (%) of the baseline ASR system, an ASR system with adapted acoustic models (adapted-AM), an ASR system with adapted pronunciation model with the baseline acoustic models (adapted-PM), and an ASR system... 73

List of Figures

Figure 1.1: The speech chain 20

Figure 1.2: The motivation of handling non-native speech recognition. 22

Figure 1.3: Three major approaches of handling non-native speech for ASR. 23

Figure 2.1: The overall structure of the construction for the continuous speech recognition system. 26

Figure 2.2: An example of left-to-right HMM model 30

Figure 2.3: The Viterbi algorithm 31

Figure 2.4: An example of pronunciation models about the word, "학교", a) single pronunciation model and b) multiple pronunciation model. 33

Figure 3.1: Procedure for the proposed pronunciation variation modeling method based on an indirect data-driven approach applied to native and non-native speech. 43

Figure 3.2: Example of decision tree building to derive pronunciation variation rules for a phone 'k.' 51

Figure 4.1: Comparison of the average WER (%) of the non-native ASR systems using the multiple pronunciation dictionary optimized (a) by the Levenshtein distance and (b) by the modified Levenshtein distance according to different CM threshold. 63

Figure 5.1: The procedure of the proposed combination adaptation method. 69

Figure 5.2: An example of a) a decision tree for the phoneme /p/ and b) a decision tree for the phoneme /o/ and /v/ for acoustic model adaptation. 71

Figure 6.1: The summary of evaluations for proposed methods. 76