목차

표제지

국문초록

ABSTRACT

목차

I. 서론 13

1.1. 연구 배경 13

1.2. 연구 내용 14

II. 이론적 배경 16

2.1. 효율적 시장 가설 16

2.2. 텍스트 마이닝 17

2.3. 감성분석 17

2.4. 회귀분석 18

2.5. Random forest 19

2.6. XGBoost(Extreme Gradient Boosting) 20

2.7. Bi-LSTM(Bidirectional Long-Short-Term-Memory) 21

III. 연구 데이터 22

3.1. 뉴스 데이터 수집 22

3.2. 뉴스 데이터 전처리 24

3.3. 주식 데이터 수집 24

3.4. 기계학습 기반 감성분석 25

3.5. 분석 데이터 29

IV. 분석 결과 31

4.1. 감성사전 검정 31

4.2. 모델별 성능 평가 34

4.3. 회귀분석 36

4.4. Random Forest 39

4.5. XGBoost 42

4.6. 모델별 성능 순위 45

V. 결론 및 제언 49

참고문헌 51

Table 1.1. Using tool and collection content 14

Table 3.1. Company selection criteria 22

Table 3.2. Selected companies 23

Table 3.3. Number of daily articles by stock 25

Table 3.4. Method of calculation up ratio and down ratio 26

Table 3.5. Sentiment score by word 27

Table 3.6. Sentiment score calculation and binary classification 28

Table 3.7. A fraction of collected data 29

Table 3.8. A fraction of final data 30

Table 4.1. Confusion matrix for sentiment score test by article 31

Table 4.2. Result of sentiment dictionary pre-test 33

Table 4.3. Hyperparameters of the model 35

Table 4.4. RMSE and R² of linear regression by added and deleted sentiment score 36

Table 4.5. Paired T-test of sentiment score added and deleted RMSE and R² means by linear regression (train dataset) 38

Table 4.6. Paired T-test of sentiment score added and deleted RMSE and R² means by linear regression (test dataset) 38

Table 4.7. RMSE and R² of random forest by added and deleted sentiment score 39

Table 4.8. Paired T-test of sentiment score added and deleted RMSE and R² means by random forest (train dataset) 41

Table 4.9. Paired T-test of sentiment score added and deleted RMSE and R² means by random forest (test dataset) 41

Table 4.10. RMSE and R² of XGBoost by added and deleted sentiment score 42

Table 4.11. Paired T-test of sentiment score added and deleted RMSE and R² means by XGBoost (train dataset) 44

Table 4.12. Paired T-test of sentiment score added and deleted RMSE and R² means by XGBoost (test dataset) 44

Table 4.13. Mean of RMSE and R² by dataset, model 45

Table 4.14. Descriptive statistics of test dataset by model on RMSE and R² 46

Table 4.15. Result of ANOVA by model on RMSE mean 47

Table 4.16. Result of post-hoc test by model on RMSE mean 47

Table 4.17. Result of ANOVA by model on R² mean 48

Table 4.18. Result of post-hoc test by of model on R² mean 48

Figure 2.1. Structure of Random Forest 19

Figure 2.2. Structure of XGBoost(Extreme Gradient Boosting) 20

Figure 2.3. Structure of Bi-LSTM(Bidirectional Long-Short-Term-Memory) 21

Figure 3.1. Data preprocessing process 24

Figure 4.1. Performance evaluation process of sentiment dictionary 32

Figure 4.2. Sentiment dictionary test model 32

Figure 4.3. Boxplot of RMSE, R² by model 46