목차

표제지

목차

국문초록 10

Ⅰ. 서론 11

Ⅱ. 관련 연구 15

2.1. 레이블 유무에 따른 이진 분류 예측 모델 연구 15

2.2. 데이터 불균형 문제 해결 기법 관련 연구 16

Ⅲ. 연구 방법 18

3.1. 데이터 정의 18

3.2. 데이터 전처리 19

3.2.1. 생성 알고리즘을 활용한 데이터 리샘플링 20

3.3. 모델 평가 지표 25

Ⅳ. 제안 알고리즘 27

4.1. LSTM Autoencoder를 활용한 이상 탐지 기법 27

4.2. Self-supervised TabNet을 활용한 이상 탐지 기법 30

Ⅴ. 실험 및 성능 비교 분석 32

5.1. 실험 환경 32

5.1.1. SHAP를 활용한 변수 선택 32

5.1.2. TabNet Encoder를 활용한 변수 선택 41

5.2. 실험 결과 43

5.2.1. 비지도 학습과 제안 알고리즘의 비교 실험 43

5.2.2. 최근 데이터의 가중치 변화에 따른 비교 실험 45

Ⅵ. 결론 및 시사점 48

참고문헌 50

ABSTRACT 55

〈Table 1〉 Confusion Matrix (1) 26

〈Table 2〉 Confusion Matrix (2) 26

[Figure 1] Functions of the fraud detection system 12

[Figure 2] Conversion of the 'Time' variable into the time of day 19

[Figure 3] Application of log scaling to the variance of 'Amount' 19

[Figure 4] Structure of the Conditional Tabular GAN 21

[Figure 5] Distribution between 'Time' and 'Amount' for shared data 22

[Figure 6] Distribution between 'Time' and 'Amount' for fraud 0.3 23

[Figure 7] Distribution between 'Time' and 'Amount' for fraud 0.7 24

[Figure 8] Finding the optimal decision threshold 29

[Figure 9] Structure of the LSTM Autoencoder 29

[Figure 10] Self-supervised learning structure of TabNet 31

[Figure 11] Analysis using LIME 34

[Figure 12] Analysis using SHAP 34

[Figure 13] Feature importance of shared data using SHAP 35

[Figure 14] Feature importance of fraud 0.3 using SHAP 36

[Figure 15] Feature importance of fraud 0.7 using SHAP 37

[Figure 16] ROC curves for random forest classifier 39

[Figure 17] Confusion matrix of Random forest classifier 39

[Figure 18] Performance comparison of feature selection techniques 40

[Figure 19] Results of TabNet with SHAP-ed feature selection 41

[Figure 20] Global explainability of TabNet 42

[Figure 21] Local explainability of TabNet 42

[Figure 22] Comparison with unsupervised learning models 44

[Figure 23] Comparison of models considering time sequence 46

[Figure 24] Comparison of model performances considering momentum 47