목차

Title Page

Contents

ABSTRACT 9

국문초록 10

CHAPTER 1. Introduction 12

CHAPTER 2. Related Work 16

2.1. Lightweight Vision Transformer 16

2.2. Neural Architecture Search 17

2.3. Multi-Scale Feature Representation 18

CHAPTER 3. Our Efficient CNN-Transformer Architecture 20

3.1. Mobile Stems for Multiple Branches 21

3.2. Multi-patch Embedding and Transformer Encoder 23

3.3. Multi-Scale Features Interaction 25

CHAPTER 4. RESULTS AND DISCUSSION 27

4.1. Experimental Setup 27

4.2. Experiments on CIFAR 28

4.3. Experiments on ImageNet 29

4.4. Discussion 31

CHAPTER 5. Conclusion 32

REFERENCES 33

[Table 3-1] Optimized Hyperparameter in Our Multi-branch Configuration 26

[Table 4-1] Quantitative Comparisons on CIFAR-10 and CIFAR-100 28

[Table 4-2] Ablation Study on CIFAR-100 29

[Table 4-3] Quantitative Comparisons on ImageNet-1K 30

[Figure 1-1] Efficient CNN-transformer Architecture 13

[Figure 3-1] System Overview 20

[Figure 3-2] CNN-transformer blocks in Our hybrid Neural Network 22

[Figure 4-1] Qualitative Comparisons with Attention maps 31