표제지
국문초록
ABSTRACT
목차
I. 서론 13
1.1. 연구 배경 13
1.2. 연구 내용 14
II. 이론적 배경 16
2.1. 효율적 시장 가설 16
2.2. 텍스트 마이닝 17
2.3. 감성분석 17
2.4. 회귀분석 18
2.5. Random forest 19
2.6. XGBoost(Extreme Gradient Boosting) 20
2.7. Bi-LSTM(Bidirectional Long-Short-Term-Memory) 21
III. 연구 데이터 22
3.1. 뉴스 데이터 수집 22
3.2. 뉴스 데이터 전처리 24
3.3. 주식 데이터 수집 24
3.4. 기계학습 기반 감성분석 25
3.5. 분석 데이터 29
IV. 분석 결과 31
4.1. 감성사전 검정 31
4.2. 모델별 성능 평가 34
4.3. 회귀분석 36
4.4. Random Forest 39
4.5. XGBoost 42
4.6. 모델별 성능 순위 45
V. 결론 및 제언 49
참고문헌 51
Table 1.1. Using tool and collection content 14
Table 3.1. Company selection criteria 22
Table 3.2. Selected companies 23
Table 3.3. Number of daily articles by stock 25
Table 3.4. Method of calculation up ratio and down ratio 26
Table 3.5. Sentiment score by word 27
Table 3.6. Sentiment score calculation and binary classification 28
Table 3.7. A fraction of collected data 29
Table 3.8. A fraction of final data 30
Table 4.1. Confusion matrix for sentiment score test by article 31
Table 4.2. Result of sentiment dictionary pre-test 33
Table 4.3. Hyperparameters of the model 35
Table 4.4. RMSE and R² of linear regression by added and deleted sentiment score 36
Table 4.5. Paired T-test of sentiment score added and deleted RMSE and R² means by linear regression (train dataset) 38
Table 4.6. Paired T-test of sentiment score added and deleted RMSE and R² means by linear regression (test dataset) 38
Table 4.7. RMSE and R² of random forest by added and deleted sentiment score 39
Table 4.8. Paired T-test of sentiment score added and deleted RMSE and R² means by random forest (train dataset) 41
Table 4.9. Paired T-test of sentiment score added and deleted RMSE and R² means by random forest (test dataset) 41
Table 4.10. RMSE and R² of XGBoost by added and deleted sentiment score 42
Table 4.11. Paired T-test of sentiment score added and deleted RMSE and R² means by XGBoost (train dataset) 44
Table 4.12. Paired T-test of sentiment score added and deleted RMSE and R² means by XGBoost (test dataset) 44
Table 4.13. Mean of RMSE and R² by dataset, model 45
Table 4.14. Descriptive statistics of test dataset by model on RMSE and R² 46
Table 4.15. Result of ANOVA by model on RMSE mean 47
Table 4.16. Result of post-hoc test by model on RMSE mean 47
Table 4.17. Result of ANOVA by model on R² mean 48
Table 4.18. Result of post-hoc test by of model on R² mean 48
Figure 2.1. Structure of Random Forest 19
Figure 2.2. Structure of XGBoost(Extreme Gradient Boosting) 20
Figure 2.3. Structure of Bi-LSTM(Bidirectional Long-Short-Term-Memory) 21
Figure 3.1. Data preprocessing process 24
Figure 4.1. Performance evaluation process of sentiment dictionary 32
Figure 4.2. Sentiment dictionary test model 32
Figure 4.3. Boxplot of RMSE, R² by model 46