목차

표제지

목차

국문초록 9

ABSTRACT 10

I. 서론 11

II. 연구 대상 13

2.1. 분석 자료 13

2.1.1. 자료소개 13

2.1.2. 데이터 전처리 16

2.2. 평가지표 18

2.2.1. 평균 제곱 오차 18

2.2.2. 일치 지수 18

2.2.3. 수정결정계수 19

2.2.4. 정밀도 재현율 곡선 20

III. 연구 방법 22

3.1. 합성곱 신경망 22

3.2. 앙상블 학습 24

3.3. 이차계획법 25

3.4. 표현학습 26

3.5. 자기지도학습 27

IV. 성능 향상 방법 28

4.1. 이차계획법 모델 28

4.2. 자기지도학습 모델 30

V. 실험 결과 32

5.1. 이차계획법 실험 결과 32

5.2. 자기지도학습 실험 결과 34

VI. 결론 및 제언 39

참고문헌 40

Table 2.1. Data set 13

Table 2.2. Descriptive statistics of SMILES, sequence and binding affinity 15

Table 5.1. Comparison of models via 3-fold and 5-fold cross validations 33

Table 5.2. Comparison of self-supervised learning and semi-supervised learning models via 3-fold cross validations 35

Table 5.3. Comparison of self-supervised learning and semi-supervised learning models via 5-fold cross validations 36

Figure 2.1. Histogram of binding affinity values 15

Figure 2.2. Histograms of length of SMILES (a) and length of protein sequence (b) 15

Figure 2.3. Confusion matrix 21

Figure 3.1. CNN architecture and training process 23

Figure 3.2. Structure of representation learning 26

Figure 4.1. Similarity-based CNN model 29

Figure 4.2. Quadratic programming in ensemble learning process 29

Figure 4.3. Self-supervised learning model process 31

Figure 4.4. The overall process 31

Figure 5.1. Boxplot for 3-fold cross validation 37

Figure 5.2. Boxplot for 5-fold cross validation 38