권호기사보기
| 기사명 | 저자명 | 페이지 | 원문 | 기사목차 |
|---|
결과 내 검색
동의어 포함
Title Page
Contents
Abstract 9
1. Introduction 11
2. Related Works 13
2.1. Speech Recognition Technologies 13
2.2. Speaker Distance Detection Technologies 15
2.3. Lip-Reading Technologies 16
2.4. Multimodal Speech Recognition Technologies 17
3. Lip-Reading based on Visual for Speech Recognition 19
3.1. System Architecture 19
3.2. Pro-Processing 20
3.2.1. Skin Color Segmentation 20
3.2.2. Morphological Operation 22
3.2.3. Blob Detection 23
3.2.4. Maximum Morphological Gradient Combination 24
3.3. Speaker Detection Estimation using Face Images 27
3.3.1. Harr-like Features and Adaboost Algorithm 27
3.3.2. User Distance Estimation 30
3.4. Lip Detection 31
3.4.1. Lip Region Extraction 31
3.4.2. Lip Feature Extraction 32
4. Speech Recognition using HTK 35
4.1. Feature Extraction 35
4.1.1. LPCC 35
4.1.2. MFCC 38
4.2. Speech Recognition 40
5. Speech Recognition System using Detection of Speaker Distance for Mobile Environment 41
5.1. Endpoint Detection in Audio and Video Signals 41
5.2. System Architecture 44
5.3. System Configuration 45
5.4. Developed System 47
6. Experimental and Results 48
6.1. Experimental on Speaker Distance Detection 48
6.1.1. Experimental Environment 48
6.1.2. Performance Evaluation 49
6.2. Experimental on Speech Recognition 49
6.2.1. Experimental Environment 49
6.2.2. Performance Evaluation on Speech Recognition 50
6.3. Experimental on Multimodal Speech Recognition 50
7. Conclusions 52
References 53
논문요약 61
Fig. 2. (a) Original Image (b) Skin Color Segmentation (c) Skin Color Histogram Analysis 21
Fig. 3. (a) Result of morphological operations (b) Face Blob Detection 24
Fig. 4. Morphological gradient 25
Fig. 5. (a) original image (b) morphological gradient of MMGC image (c) morphological gradient of gray image 25
Fig. 6. Result of pre-processing procedure of the face detection 26
Fig. 7. Examples of the Haar-like features 27
Fig. 8. Detection of positive sub-windows using the cascade 28
Fig. 9. (a) positive samples (b) negative samples 29
Fig. 10. Geometric relationship between eyes and lip 31
Fig. 11. Extraction of LPCC Vectors 35
Fig. 12. Extraction of MFCC Vectors 38
Fig. 13. Speech waveform of 'one, two, three, four' and endpoints detected manually, (nbm, nem) and detected automatically in audio signal (nba, nea) and in video signal(nbv, nev) with SNRs of 0㏈ (top) and - 15㏈ (bottom)(이미지참조) 43
Fig. 14. Multimodal speech recognition system. 44
Fig. 15. Structure of Android NDK and Java native Library 46
Fig. 16. (a) initial screen (b) face image and speech input (c) recognition result output 47
Fig. 17. The Result of Distance Estimation 48
Fig. 18. Average Improvement Rate According to the Emphasis Rate of Speech 51
*표시는 필수 입력사항입니다.
| 전화번호 |
|---|
| 기사명 | 저자명 | 페이지 | 원문 | 기사목차 |
|---|
| 번호 | 발행일자 | 권호명 | 제본정보 | 자료실 | 원문 | 신청 페이지 |
|---|
도서위치안내: / 서가번호:
우편복사 목록담기를 완료하였습니다.
*표시는 필수 입력사항입니다.
저장 되었습니다.