A study on multimodal speech recognition using speaker distance for mobile environments = 모바일 환경에서 사용자 거리를 이용한 멀티모달 음성 인식의[실은 인식에] 관한 연구 / Byung Hun Oh

입법지원서비스

의정활동에 필요한 자료를 어디서 찾을지 고민되셨다면, 입법 지원서비스 메뉴를 확인해보세요. 국회도서관에서 제공하는 의회·법률정보부터 AI 분석까지, 국회의 입법 활동을 뒷받침하는 전문정보를 모았습니다.

국회도서관 홈으로 정보검색 소장정보 검색

결과 내 검색

동의어 포함

고급검색

상세검색
저자 검색
관련 키워드 검색
주제별 검색

완전일치
전방일치
후방일치

인명/단체명

	저자정보	상세정보
인명/단체명을 입력하세요.

전방일치
완전일치
후방일치
부분일치

키워드

대표어
외국어
네이버 백과사전

용어관계 검색결과
대표어	동의어	상위어	하위어	관련어	대립어

대분류

중분류

소분류

소장자료
외부기관 자료

학위논문 A study on multimodal speech recognition using speaker distance for mobile environments = 모바일 환경에서 사용자 거리를 이용한 멀티모달 음성 인식의[실은 인식에] 관한 연구

저자명
Byung Hun Oh
발행사항
서울 : 성균관대학교 대학원, 2013.2
청구기호
TM 621.39 -13-191
형태사항
50 p. ; 30 cm
자료실 전자자료
제어번호
KDMT1201325428
주기사항
학위논문(석사) -- 성균관대학교 대학원, Dept. of Electrical and Computer Engineering, 2013.2. 지도교수: Kwang-Seok Hong
연계정보
원문
외부기관 원문

목차보기

Title Page

Contents

Abstract 9

1. Introduction 11

2. Related Works 13

2.1. Speech Recognition Technologies 13

2.2. Speaker Distance Detection Technologies 15

2.3. Lip-Reading Technologies 16

2.4. Multimodal Speech Recognition Technologies 17

3. Lip-Reading based on Visual for Speech Recognition 19

3.1. System Architecture 19

3.2. Pro-Processing 20

3.2.1. Skin Color Segmentation 20

3.2.2. Morphological Operation 22

3.2.3. Blob Detection 23

3.2.4. Maximum Morphological Gradient Combination 24

3.3. Speaker Detection Estimation using Face Images 27

3.3.1. Harr-like Features and Adaboost Algorithm 27

3.3.2. User Distance Estimation 30

3.4. Lip Detection 31

3.4.1. Lip Region Extraction 31

3.4.2. Lip Feature Extraction 32

4. Speech Recognition using HTK 35

4.1. Feature Extraction 35

4.1.1. LPCC 35

4.1.2. MFCC 38

4.2. Speech Recognition 40

5. Speech Recognition System using Detection of Speaker Distance for Mobile Environment 41

5.1. Endpoint Detection in Audio and Video Signals 41

5.2. System Architecture 44

5.3. System Configuration 45

5.4. Developed System 47

6. Experimental and Results 48

6.1. Experimental on Speaker Distance Detection 48

6.1.1. Experimental Environment 48

6.1.2. Performance Evaluation 49

6.2. Experimental on Speech Recognition 49

6.2.1. Experimental Environment 49

6.2.2. Performance Evaluation on Speech Recognition 50

6.3. Experimental on Multimodal Speech Recognition 50

7. Conclusions 52

References 53

논문요약 61

List of Tables

Table 1. Distance estimation results 49

Table 2. Speech recognition results 50

Table 3. Improvement speech recognition results 51

List of Figures

Fig. 2. (a) Original Image (b) Skin Color Segmentation (c) Skin Color Histogram Analysis 21

Fig. 3. (a) Result of morphological operations (b) Face Blob Detection 24

Fig. 4. Morphological gradient 25

Fig. 5. (a) original image (b) morphological gradient of MMGC image (c) morphological gradient of gray image 25

Fig. 6. Result of pre-processing procedure of the face detection 26

Fig. 7. Examples of the Haar-like features 27

Fig. 8. Detection of positive sub-windows using the cascade 28

Fig. 9. (a) positive samples (b) negative samples 29

Fig. 10. Geometric relationship between eyes and lip 31

Fig. 11. Extraction of LPCC Vectors 35

Fig. 12. Extraction of MFCC Vectors 38

Fig. 13. Speech waveform of 'one, two, three, four' and endpoints detected manually, (nbm, nem) and detected automatically in audio signal (nba, nea) and in video signal(nbv, nev) with SNRs of 0㏈ (top) and - 15㏈ (bottom)(이미지참조) 43

Fig. 14. Multimodal speech recognition system. 44

Fig. 15. Structure of Android NDK and Java native Library 46

Fig. 16. (a) initial screen (b) face image and speech input (c) recognition result output 47

Fig. 17. The Result of Distance Estimation 48

Fig. 18. Average Improvement Rate According to the Emphasis Rate of Speech 51

초록보기

최근 스마트 폰과 타블렛 PC의 보급으로 실시간 음성인식 기술을 활용한 다양한 응용기술들이 개발되고 있다. 하지만 사용자가 음성 인식기를 사용하는 장소나 주변 환경, 음성신호의 정확한 끝점 검출 등에 따라 인식 성능은 달라지고, 발화자와 단말기 사이의 거리에 의해서도 성능의 차이를 보인다. 이에 따라, 본 논문에서는 카메라를 통해 실시간으로 입력되는 얼굴 영상을 이용하여 거리를 추정하고, 추정된 결과를 적용하여 입술 영상의 움직임 및 음성의 끝점검출 알고리즘으로 검출된 음성 신호를 거리에 따라 강조함으로써 모바일 환경에서의 음성 인식률을 향상시키는 알고리즘을 제안한다. 실험결과 제안한 방법을 적용한 음성 인식기가 기존의 음성 인식기의 성능보다 13% 우수한 성능을 보였다.

자료명
저자사항
제어번호
*요청자 이름
*전화번호	휴대폰 번호를 입력하세요.
*이메일	@
*요청내용
*오류항목

* 서재명
설명
* 공개수준	비공개 완전공개 * 주의: 국회도서관 이용자 모두에게 공유서재로 서비스 됩니다.

고급검색

다국어입력

학위논문 A study on multimodal speech recognition using speaker distance for mobile environments = 모바일 환경에서 사용자 거리를 이용한 멀티모달 음성 인식의[실은 인식에] 관한 연구

목차보기

초록보기

추천서가 (다양한 추천 자료를 만나보세요)

권호

알림톡 발송로 자료명, 기사명/저자명, 수록지명, 자료실, 서가번호, 전화번호로 구성되어 있습니다.




전화번호

연속간행물 상세정보 입니다.
청구기호
자료명/저자사항
발행사항
형태사항
ISSN

고급검색

다국어입력

학위논문 A study on multimodal speech recognition using speaker distance for mobile environments = 모바일 환경에서 사용자 거리를 이용한 멀티모달 음성 인식의[실은 인식에] 관한 연구

목차보기

초록보기

추천서가 (다양한 추천 자료를 만나보세요)

MARC 보기

오류 데이터 정정요청

알림톡 발송

권호기사보기

연속간행물 권호 선택

연속간행물 권호 선택

우편복사 안내

도서위치안내(서울관)

저자프로필

목차보기

우편복사 안내

우편복사 목록담기

확인

내서재에 담기

새로운 서재

저장

로그인

권호