본문 바로가기 주메뉴 바로가기
국회도서관 홈으로 정보검색 소장정보 검색

목차보기

Title Page

ABSTRACT

Contents

Chapter 1. Introduction 12

1.1. Object Recognition 12

1.2. Visual Attention-based Object Recognition 13

Chapter 2. Unified Visual Attention Model 16

2.1. Tile-based Parallel Object Recognition 16

2.2. Unified Visual Attention Model 17

2.3. Familiarity Based Top-Down Attention 20

2.3.1. Familiarity of Keypoints 21

2.3.2. Familiarity of Keypoint Clusters 22

2.3.3. Familiarity Map Generation 23

2.4. ROI Selection 28

2.5. UVAM based Object Recognition 29

2.5.1. Object Recognition without Visual Attention 31

2.5.2. Applying the Unified Visual Attention Model to Object Recognition 32

2.5.3. Keypoint Extraction 34

2.5.4. Keypoint Matching 34

2.5.5. Keypoint Clustering 36

Chapter 3. UVAM Evaluation 39

3.1. Performance of Visual Attention Model 39

3.2. Robustness to Target Object Type 42

3.3. Robustness to Background Clutter 44

3.4. Failure Mode of the UVAM 46

3.5. Robustness on Natural Images 47

Chapter 4. Feature Matching with Locality Sensitive Hashing 49

4.1. Locality Sensitive Hashing (LSH) 49

4.2. Database Compression using Huffman Coding 52

4.3. Temporal Locality 53

Chapter 5. Object Recognition Processor 54

5.1. Feature Extraction Cluster (FEC) 55

5.2. Fine-grained Task Scheduling 57

5.3. Intelligent Inference Engine (IIE) 60

5.4. Feature Matching Processor (FMP) 64

5.4.1. Hash Function and Hash Tables 65

5.4.2. Parallel Huffman Decoder 66

5.4.3. Query Re-ordering Buffer 67

5.5. Power Mode Controller (PMC) 67

5.6. Hierarchical Star + Ring Network-on-Chip 71

5.6.1. Intelligent Bandwidth Regulation 72

5.6.2. UVAM based Object Recognition 73

5.6.3. Feature Extraction Task Execution Time 74

5.6.4. Neuro-Fuzzy Workload Prediction 75

5.6.5. Weighted Round Robin Bandwidth Regulation 76

Chapter 6. Visual Attention Engine 78

6.1. Saliency based Visual Attention Algorithm 79

6.2. Architecture 81

6.2.1. Cellular Neural Networks 81

6.2.2. Overall Architecture 84

6.2.3. VAE Cell 87

6.2.4. Processing Element 89

6.2.5. Data I/O 91

6.2.6. Controller 92

6.3. Operation 93

6.3.1. CNN Operation 94

6.3.2. Visual Attention 96

Chapter 7. Chip Implementation 98

7.1. Implementation Methodology 98

7.2. Implementation Results 100

Chapter 8. Chip Evaluation 102

8.1. Test Platform 102

8.2. Augmented Reality Headset Demonstration 106

Chapter 9. Conclusion 108

References 110

Summary 114

Curriculum Vitae 117

List of Tables

TABLE 1. PERFORMANCE OF UNIFIED VISUAL ATTENTION MODEL ON NATURAL IMAGES. 48

TABLE 2. HUFFMAN CODE MAPPING. 52

TABLE 3. VOLTAGE AND FREQUENCY SCALING OF PPL POWER MODES. 70

TABLE 4. VAE EXECUTION TIME SUMMARY. 97

List of Figures

Figure 1. Attention recognition loop of the unified visual attention map. 14

Figure 2. (a) 3 steps of SIFT-based object recognition and (b) tile-based object recognition. 16

Figure 3. Outline of the Unified Visual Attention Model. 18

Figure 4. Usefulness of bottom-up saliency-based attention for (a) a scene conditioned for visual pop-out and (b) a scene with salient background clutter. The circles mark target objects in the scene, and the arrows mark the point of highest saliency in the scene. 19

Figure 5. PDF of the familiarity of keypoints extracted from target objects and those extracted from distractors. 22

Figure 6. Conceptualization of the F-Map generation process. Single keypoint matches and inconclusive cluster matches (2 keypoints) are viewed as evidence of a target object and are represented as positive valued ellipses on the F-Map.... 25

Figure 7. Overview of FF and FB F-map generation. The feed-forward process executes a reduced version of object recognition on the entire reduced resolution input image. The feedback loop executes detailed object recognition on a small ROI of the full resolution input image. 26

Figure 8. Example of the operation of the UVAM on a scene with many salient distracters. Three target objects are successfully recognized after 64 iterations of the attention-recognition feedback loop.... 27

Figure 9. SIFT based object recognition without visual attention. 29

Figure 10. Average execution times of each stage of SIFT feature extraction and matching without visual attention for scenes with low, medium, and high keypoint density. The contribution of keypoint clustering to total execution time is negligible and is not shown. 31

Figure 11. SIFT based object recognition flow with the UVAM applied. The light grey region depicts the reduced flow for FF F-Map generation, the dark grey region the flow for detailed object recognition, and the medium grey region the steps that are shared between the two flows. 33

Figure 12. (a) Percentage of correct matches and (b) execution time of keypoint matching using approximate nearest neighbor search with varying values of ∈ when compared to exact nearest neighbor search(∈=0)....(이미지참조) 35

Figure 13. Objects and background images used for test image generation. (a) Since we are interested in the performance of the attention model, a subset of 75 of the more easily recognizable objects were chosen from the COIL-100 object database to reduce the impact of the limitations of... 39

Figure 14. Performance summary of the proposed UVAM compared to different configurations of visual attention. (a) The number of analyzed keypoints, and (b) the execution times of each configuration are compared. 41

Figure 15. Complementary operation of the bottom-up S-Map and the top-down F-Map. 43

Figure 16. Comparison of the performance of bottom-up saliency based visual attention, and the UVAM for three scenes with varying amounts of salient clutter. The number of attended ROIs increases proportionally to the amount of salient clutter for bottom-up saliency based visual... 45

Figure 17. Testing of the UVAM on natural images. The results are similar to those for the synthesized test images. 48

Figure 18. (a) Database preprocessing for LSH (b) vector querying using LSH. 50

Figure 19. PDF of symbols in SIFT descriptor vectors. 52

Figure 20. Overlapping buckets among 8 consecutive queries. 53

Figure 21. Block diagram of the proposed heterogeneous many-core processor. 54

Figure 22. The Vector Processing Element. 55

Figure 23. The Scalar Processing Element. 56

Figure 24. Fine-grained task scheduling of SIFT feature recognition on heterogeneous PEs. 57

Figure 25. Latency reduction of tile-based pipelining. 58

Figure 26. Workload balancing through SPE sharing. 59

Figure 27. The Intelligent Inference Engine. 60

Figure 28. Parameterized Gaussian membership function (PGMF) circuit. 61

Figure 29. Parameterized sweep of PGMF output. 61

Figure 30. Perturbation learning in the Intelligent Inference Engine. 63

Figure 31. Feature matching processor block diagram. 64

Figure 32. FMP matching results. 65

Figure 33. Parallel Huffman decoder. 66

Figure 34. Workload prediction of the Intelligent Inference Engine. 69

Figure 35. Measured minimum VDD of PPL versus frequency. 70

Figure 36. Hierarchical Star + Ring Network-on-Chip. 71

Figure 37. Execution time balancing by bandwidth regulation. 72

Figure 38. Task diagram of UVAM based object recognition. 73

Figure 39. Predicting NF using the IIE.(이미지참조) 75

Figure 40. Global router architecture. 76

Figure 41. Bandwidth regulation results. 77

Figure 42. Detailed steps of the saliency based visual attention algorithm. 80

Figure 43. Comparison of CNN implementations: (a)conventional analog, (b)conventional digital, and (c)the time-multiplexed PE topology. 82

Figure 44. Block diagram of the Visual Attention Engine. 84

Figure 45. VAE cell - PE interconnections. 85

Figure 46. Bit slice of a VAE cell's register file. 87

Figure 47. Bit slice of a VAE cell's shift register. 88

Figure 48. Processing element circuit and read/write buses. 89

Figure 49. Pipelined PE operation of VAE. 90

Figure 50. Data I/O path of the VAE. 91

Figure 51. Simplified programmer's model of the VAE. 93

Figure 52. Procedure for convolution with a 3x3 kernel using the VAE. 95

Figure 53. Gabor-type filter impulse response. 97

Figure 54. (a) Design partitioning and (b) hierarchical implementation flow. 98

Figure 55. Chip photograph and summary. 100

Figure 56. Shmoo plot of chip. 101

Figure 57. Test platform block diagram. 102

Figure 58. Test platform photograph. 103

Figure 59. (a) Power efficiency and (b) effective power efficiency comparison. 103

Figure 60. Measured dynamic voltage and frequency control of the PPL. 104

Figure 61. Augmented reality headset. 106

Figure 62. Augmented reality headset display snapshots. 107

초록보기

광범위한 물체에 대한 물체인식은 매우 어려운 작업으로, 현재의 PC에서조차 실시간으로 구현하는 것이 어렵다. Scale Invariant Feature Transform (SIFT)와 같은 local 특징 점에 기반한 알고리즘은 제한된 실험에서 95%에 달하는 높은 인식률을 보이지만, 복잡도가 높아 PC에서 640x480 이미지 한 장을 처리하는데 1초 이상이 걸린다. 휴대용 기기와 같은 임베디드 플랫폼은 PC에 비하여 처리속도가 느리기 때문에 실시간 처리에 해당하는 초당 30 프레임의 성능을 얻기 위해서는 알고리즘적인 접근과 하드웨어적인 접근이 병행되어야 한다.

본 연구는 저전력 실시간 물체인식기를 구현하기 위한 효율적인 알고리즘과 이를 위해 최적화된 시스템-온-칩, 그리고 이 둘을 집적한 시스템 시연을 아우른다. 통합시각주의모델이라고 명명된 알고리즘은 입력된 이미지에서 물체인식에 쓸모 있는 영역만을 검출하여 선택적으로 물체인식을 수행함으로써 기존보다 적은 계산량으로 기존과 동일한 정확도를 실현하였다. 특히, 기존에 알려진 saliency 기반 시각주의모델의 단점인 복잡한 배경이 있을 때의 성능을 개선하기 위해, 학습된 DB 결과에 기반한 familiarity 라는 새로운 알고리즘을 제안하였다. 실험결과, Saliency와 familiarity를 모두 고려한 통합시각주의모델은 기존의 Scale Invariant Feature Transform (SIFT) 기반 물체인식과 동일한 정확도를 가지면서(사용된 테스트 세트에 대해 95% true positive), 2.7배 빠른 속도를 보여주었다.

통합시각주의모델에 기반한 물체인식 칩이 구현되고 검증되었다. 제안된 물체인식 칩은 SIFT 물체인식을 빠르게 수행하기 위한 36개의 병렬 PE (processing element)들, 통합시각주의모델을 수행하기 위한 전용 블록들, 그리고 저전력을 실현하기 위한 dynamic voltage and frequency scaling 제어 블록이 포함되었다. 36개의 병렬 PE들은 20-way SIMD datapath를 갖는 VPE(vector processing element) 4개와 32개의 1-way datapath를 갖는 SPE(scalar processing element) 16개의 이종멀티코어(heterogeneous multi-core) 구조로 이루어져 있다. VPE들은 image filtering과 같이 픽셀 병렬도가 높은 작업을 전담하고, SPE들은 histogram과 같이 순차적인 작업을 전담함으로써 동일한 PE만을 집적하였을 때보다 효율적인 처리가 가능하다. SIFT 매칭을 수행하는 FMP(feature matching processor)는 locality sensitive hashing(LSH) 알고리즘을 수행하여 계산량을 획기적으로 줄였으며, 데이터 캐싱기법과 압축기법이 적용되어 필요한 메모리 대역폭을 61% 줄일 수 있었다. 프로세서의 각 IP들을 이어주는 NoC(Network-on-Chip)에는 workload를 예측하고 그에 따라 IP에 priority를 달리하는 routing 알고리즘을 적용하여 SIFT feature 계산시 12%의 성능개선을 달성했다. 결과적으로 기존 물체인식 칩에 비해 46% 이상 높은 전력효율을 달성하여 VGA이미지에 대해 345mW 30fps 물체인식을 실현하였다.

제안된 알고리즘과 하드웨어를 기반으로 시스템 시연을 구현하였다. 제작된 칩은 상용 ARM 프로세서, FPGA, 대용량 메모리와 함께 집적되어 물체인식을 수행할 수 있는 검증 플랫폼을 이루었다. 이러한 검증 플랫폼을 기반으로 증강현실(augmented reality)을 위한 시연용 헤드셋 시스템이 개발되었다. 완성된 물체인식 시스템을 헤드셋에 집적함으로써, 휴대용 플랫폼상에서의 저전력 실시간 물체인식의 가능성을 증명하였다.