목차

표제지

요약

목차

제1장 서론 11

제1절 연구 배경 11

제2절 연구 목적 12

제2장 관련 연구 13

제1절 안드로이드 앱의 구조 및 분석 방법 13

1. 권한(Permission) 14

2. API(Application Programming Interface) 16

제2절 특성 공학 16

1. 특성 선택 16

2. 특성 추출 19

제3절 기계학습 알고리즘 19

1. Linear Regression 19

2. Random Forest 20

3. SVM(Support Vector Machine) 20

4. KNN 21

5. MLP(Multi-Layer Perceptron) 21

제4절 모델 성능 평가 방법 22

제3장 연구 방법론 24

제1절 데이터세트 구성 및 전처리 24

제2절 제안하는 특성정보의 선별 25

제4장 실험 과정 37

제1절 실험 환경 37

제2절 모델 구성 37

제3절 악성 앱 탐지 성능 비교 39

1. Confusion Matrix 비교 39

2. 모델 학습 시간 비교 48

3. ROC AUC 비교 50

제5장 결론 및 향후 과제 52

제1절 연구 결론 52

제2절 향후 과제 53

참고 문헌 54

Abstract 57

Table 1. Structure of APK 13

Table 2. Runtime Permissions 14

Table 3. Protection levels of Android Permission 15

Table 4. Confusion Matrix 22

Table 5. (ANOVA)Feature selection Top 10 of the permissions 25

Table 6. (RFE)Feature selection Top 10 of the permissions 26

Table 7. (Lasso)Feature selection Top 10 of the permissions 26

Table 8. (ANOVA)Feature selection Top 20 of the permissions 27

Table 9. (RFE)Feature selection Top 20 of the permissions 28

Table 10. (Lasso)Feature selection Top 20 of the permissions 29

Table 11. (ANOVA)Feature selection Top 10 of the APIs 30

Table 12. (RFE)Feature selection Top 10 of the APIs 30

Table 13. (Lasso)Feature selection Top 10 of the APIs 31

Table 14. (ANOVA)Feature selection Top 20 of the APIs 31

Table 15. (RFE)Feature selection Top 20 of the APIs 32

Table 16. (Lasso)Feature selection Top 20 of the APIs 33

Table 17. Duplicated feature selection among the top 10 permissions 34

Table 18. Duplicated feature selection among the top 20 permissions 34

Table 19. Duplicated feature selection among the top 10 APIs 35

Table 20. Duplicated feature selection among the top 20 APIs 35

Table 21. Result of selected feature 36

Table 22. Experimental environment 37

Table 23. Parameters value of Machine Learning 38

Table 24. Accuracy(The average of 5 times) 40

Table 25. Precision(The average of 5 times) 42

Table 26. Recall(The average of 5 times) 44

Table 27. F1 score(The average of 5 times) 46

Table 28. Learning time(The average of 5 times, seconds) 48

Fig. 1. Filter Method 17

Fig. 2. Wrapper Method 17

Fig. 3. Embedded Method 18

Fig. 4. Workflow of Proposed Method 24

Fig. 5. Comparison of accuracy 41

Fig. 6. Comparison of Precision 43

Fig. 7. Comparison of Recall 45

Fig. 8. Comparison of F1 score 47

Fig. 9. Comparison of Learning time by machine learning model 49

Fig. 10. ROC Curve, ROC AUC by machine learning model 51