본문 바로가기 주메뉴 바로가기
국회도서관 홈으로 정보검색 소장정보 검색

초록보기

This paper proposes an audio event classification method using Deep Neural Networks (DNN). The proposed method applies Feed Forward Neural Network (FFNN) to generate event probabilities of ten audio events (dog barks, engine idling, and so on) for each frame. For each frame, mel scale filter bank features of its consecutive frames are used as the input vector of the FFNN. These event probabilities are accumulated for the events and the classification result is determined as the event with the highest accumulated probability. For the same dataset, the best accuracy of previous studies was reported as about 70% when the Support Vector Machine (SVM) was applied. The best accuracy of the proposed method achieves as 79.23% for the UrbanSound8K dataset when 80 mel scale filter bank features each from 7 consecutive frames (in total 560) were implemented as the input vector for the FFNN with two hidden layers and 2,000 neurons per hidden layer. In this configuration, the rectified linear unit was suggested as its activation function.

권호기사

권호기사 목록 테이블로 기사명, 저자명, 페이지, 원문, 기사목차 순으로 되어있습니다.
기사명 저자명 페이지 원문 목차
Deep Neural Network 언어모델을 위한 Continuous Word Vector 기반의 입력 차원 감소 = Input Dimension Reduction based on Continuous Word Vector for Deep Neural Network Language Model 김광호, 이동현, 임민규, 김지환 pp.3-9

깊은 신경망 특징 기반 화자 검증 시스템의 성능 비교 = Performance Comparison of Deep Feature Based Speaker Verification Systems 김대현, 성우경, 김홍국 pp.9-16

분리행렬의 가중 내적 제한조건을 이용한 FDICA 알고리즘의 수렴속도 향상 = Improvement of convergence speed in FDICA algorithm with weighted inner productconstraint of unmixing matrix 전성일, 배건성 pp.17-25

깊은 신경망을 이용한 오디오 이벤트 분류 = Audio Event Classification Using Deep Neural Networks 임민규, 이동현, 김광호, 김지환 pp.27-33

음성 주파수 분포 분석을 통한 편집 의심 지점 검출 방법 = (A)Speech Waveform Forgery Detection Algorithm Based on Frequency Distribution Analysis 허희수, 소병민, 양일호, 유하진 pp.35-40

Korean Semantic Similarity Measures for the Vector Space Models Lee, Young-In, Lee, Hyun-jung, Koo, Myoung-Wan, Cho, Sook Whan pp.49-55

고카페인 섭취 전·후 음성 특성 비교 = Comparison of Voice Characteristics Before and After High-Caffeine Intake 이아름, 김은연, 유현지, 최예린 pp.59-65

음성장애가 있는 직업적 음성사용자와 비직업적 음성사용자의 음성장애 중증도와 유형에 따른 자기보고식 음성평가 차이 = Comparison of Self-Reporting Voice Evaluations between Professional and Non-Professional Voice Users with Voice Disorders by Severity and Type 김재옥 pp.67-76

갑상선 수술범위에 따른 음성의 음향적 분석 = Acoustic Analysis of Voice Change According to Extent of Thyroidectomy 강영애, 구본석 pp.77-83

식도음성의 모음종류에 따른 음향학적 특성 = Acoustic Features of Oral Vowels in the Esophagus Speakers 윤은미, 목은희, 판후응옥먼, 홍기환 pp.85-92

KayPENTAX Phonatory Aerodynamic System Model 6600의 수행방법에 따른 공기역학 변수 비교 = Comparison of Aerodynamic Variables according to the Execution Methods of KayPENTAX Phonatory Aerodynamic System Model 6600 고혜주, 최홍식, 임성은, 최예린 pp.93-99

벅아이 코퍼스의 모음 길이 연구 = (A)Study on the Vowel Duration of the Buckeye Corpus 정혜정, 윤규철 pp.103-110

오/-/우/ 합성모음 연속체에 대한 중국인 한국어 학습자의 청지각적 경계 = Perceptual Boundary on a Synthesized Korean Vowel /o/-/u/ Continuum by Chinese Learners of Korean Language 윤지현, 김은경, 성철재 pp.111-121

Gender difference in the sound change of lexical pitch accents of South Kyungsang Korean Lee, Hyunjung pp.123-130

Effects of syllable structure and prominence on the alignment and the scaling of the phrase-initial rising tone in Seoul Korean : A preliminary study Kim, Sahyang pp.139-145

경상방언 대학생들이 발음한 국어 한자어 장단음 분석 = (An)Analysis of Short and Long Syllables of Sino-Korean Words Produced by College Students with Kyungsang Dialect 양병곤 pp.131-138

참고문헌 (20건) : 자료제공( 네이버학술정보 )

참고문헌 목록에 대한 테이블로 번호, 참고문헌, 국회도서관 소장유무로 구성되어 있습니다.
번호 참고문헌 국회도서관 소장유무
1 Lu, L., Jiang, H., Zhang, H. (2001). A robust audio classification and segmentation method, in Proc. ACM International Conference on Multimedia, 203-211. 미소장
2 Xu, M. (2003). Creating audio keywords for event detection in soccer video, in Proc. IEEE International Conference on Multimedia and Expo, 281-284. 미소장
3 Cheng, W., Chu, W., Wu, J. (2003). Semantic context detection based on hierarchical audio models, in Proc. ACM SIGMM International Workshop on Multimedia Information Retrieval, 109-115. 미소장
4 Elo, J. P. (2009). Non-speech audio event detection, in Proc. Internationa Conference on Acoustics, Speech and Signal Processing, 1973-1976. 미소장
5 Heittola, T. (2013). Context-dependent sound event detection, EURASIP Journal on Audio, Speech, and Music Processing, 1, 1-13. 미소장
6 Lee, H., Pham, P., Largman, Y., Ng, A. Y. (2009). Unsupervised feature learning for audio classification using convolutional deep belief networks. in Proc. Advances in Neural Information Processing Systems, 1096-1104. 미소장
7 K, Zvi., T, Orith. (2013). Audio event classification using deep neural networks, in Proc. INTERSPEECH, 1482-1486. 미소장
8 Ballan, L. (2009). Deep networks for audio event classification in soccer videos, in Proc. International Conference on Multimedia and Expo, 474-477. 미소장
9 Bengio, Y., LeCun, Y. (2007). Scaling learning algorithms towards AI, Large-scale Kernel Machines, 34(5), 321-360. 미소장
10 Barker, J. (2012). The PASCAL CHiME speech separation and recognition challenge, Computer Speech &Language, 27(3), 621-633. 미소장
11 Downie, S. (2010). The Music Information Retrieval Evaluation eXchange: Some observations and insights, Advances in Music Information Retrieval. Springer, 93-115. 미소장
12 Malkin, R. G. (2007). Multimodal Technologies for Perception of Humans. Springer, 323-330. 미소장
13 Smeaton, F. (2006). Evaluation campaigns and TRECVid, in Proc. ACM International Workshop on Multimedia Information Retrieval, 321-330. 미소장
14 The signal separation evaluation campaign (2007–2010): Achievements and remaining challenges 네이버 미소장
15 Larochelle, H. (2007). An empirical evaluation of deep architectures on problems with many factors of variation. in Proc. International Conference on Machine learning, 473-480. 미소장
16 Dahl, G. E., Sainath, T. N., Hinton, G. E. (2013). Improving deep neural networks for LVCSR using rectified linear units and dropout, in Proc. International Conference on Acoustics, Speech and Signal Processing, 8609-8613. 미소장
17 Bottou, L. (2004). Advanced Lectures on Machine Learning, Sringer, 146-168. 미소장
18 ACM International Conference on Multimedia Retrieval (ICMR 2014) 네이버 미소장
19 Young, S. (1999). The HTK Book. Cambridge, U.K.:Entropic. 미소장
20 Bergstra, J. (2010). Theano: A CPU and GPU math expression compiler. in Proc. Python for Scientific Computing Conference, 4, 3. 미소장