Title Page
Contents
ABSTRACT 9
국문초록 10
CHAPTER 1. Introduction 12
CHAPTER 2. Related Work 16
2.1. Lightweight Vision Transformer 16
2.2. Neural Architecture Search 17
2.3. Multi-Scale Feature Representation 18
CHAPTER 3. Our Efficient CNN-Transformer Architecture 20
3.1. Mobile Stems for Multiple Branches 21
3.2. Multi-patch Embedding and Transformer Encoder 23
3.3. Multi-Scale Features Interaction 25
CHAPTER 4. RESULTS AND DISCUSSION 27
4.1. Experimental Setup 27
4.2. Experiments on CIFAR 28
4.3. Experiments on ImageNet 29
4.4. Discussion 31
CHAPTER 5. Conclusion 32
REFERENCES 33
[Table 3-1] Optimized Hyperparameter in Our Multi-branch Configuration 26
[Table 4-1] Quantitative Comparisons on CIFAR-10 and CIFAR-100 28
[Table 4-2] Ablation Study on CIFAR-100 29
[Table 4-3] Quantitative Comparisons on ImageNet-1K 30
[Figure 1-1] Efficient CNN-transformer Architecture 13
[Figure 3-1] System Overview 20
[Figure 3-2] CNN-transformer blocks in Our hybrid Neural Network 22
[Figure 4-1] Qualitative Comparisons with Attention maps 31