Title Page
Contents
국문초록 7
Abstract 10
I. Introduction 13
A. Background 13
Part II. Literature Review 16
A. Statistical Arbitrage 16
B. Machine Learning Models 21
(1) Logistic Regression 21
(2) Artificial Neural Networks 25
(3) Random Forests 30
(4) Gradient-Boosted Trees 32
(5) Ensemble Methods 36
Part III. Methodology 40
A. Data 40
(1) Data Description 40
(2) Feature Generation 43
(3) Asset Ranking & Simulated Trades 45
(4) Software 46
B. Model Training & Tuning 47
(1) Model Training 47
(2) Logistic Regression Model Specification 48
(3) Deep Neural Network Model Specification 51
(4) Distributed Random Forest Model Specification 54
(5) Extreme Gradient-Boosted Tree Model Specification 56
(6) Simple Soft Voting Ensemble Model Specification 57
(7) Weighted Soft Voting Ensemble Model Specification 59
C. Portfolio Concentration Selection 62
Part IV. Empirical Results 65
A. Asset Selection Comparison 65
(1) Asset Selection Frequencies 65
(2) Bankruptcies, Mergers and Acquisitions 67
B. Performance Metrics 69
(1) Fixed Portfolio Concentration Performance Metrics 69
(2) Dynamic Portfolio Concentration Performance Metrics 78
C. Cumulative Performance, Periodicity, & Holding Periods 81
(1) Fixed Portfolio Concentration Metrics over Time 81
(2) Dynamic Portfolio Concentration Metrics over Time 82
Part V. Discussion 85
A. Significant Findings 85
B. Limitations and Future Research 86
C. Model Implementation 89
Part VI. Conclusion 92
Bibliography 95
Appendices 100
Appendix A. Full Portfolio Concentration Returns 100
Appendix B. Grid Search of Box Constraints 108
Appendix C. Moving Average Analysis 113
[Table 1] Variable Significances of Logistic Regression Model 50
[Table 2] Top Selected Asset Frequencies by Machine Learning Model 66
[Table 3] Unlistings Included in Portfolios 68
[Table 4] Annualized Return by Fixed Number of Assets 71
[Table 5] Annualized Performance of Optimal Fixed Concentration Models 77
[Table 6] Annualized Performance of Dynamic Concentration Models 79
[Table 7] Performance of Dynamic Concentration Models after Transaction Fees 80
[Figure 1] Feed-forward neural net with two inputs, two outputs and one hidden layer 27
[Figure 2] Topology of a deep neural network with two hidden layers 28
[Figure 3] Ensemble framework with n component learners 36
[Figure 4] Variable Importances of Deep Learning Neural Network Model 53
[Figure 5] Variable Importances of Distributed Random Forest Model 55
[Figure 6] Variable Importances of Extreme Gradient-Boosted Tree Model 57
[Figure 7] Annualized Returns of ML Models by Portfolio Concentrations 72
[Figure 8] Annualized Downside Deviation of ML Models by Portfolio Concentrations 75
[Figure 9] Relative Cumulative Return of Fixed Portfolio Concentration Models 81
[Figure 10] Relative Cumulative Return of Dynamic Portfolio Concentration Models 83