본문 바로가기 주메뉴 바로가기
국회도서관 홈으로 정보검색 소장정보 검색

목차보기

Title Page

ABSTRACT

Contents

List of Abbreviations 14

List of Symbols 16

Chapter 1. Introduction 18

1.1. Motivation and Background 18

1.2. Problem Description and Objective 20

1.3. Contributions 20

1.4. Disposition 21

Chapter 2. Literature Review 23

2.1. Computer Vision System 23

2.1.1. Image Classification 25

2.1.2. Object Detection 27

2.1.3. Attention Mechanisms 30

2.2. Traditional-based Methods 34

2.3. Machine learning-based Methods 35

Chapter 3. Eye Detection Network 38

3.1. Network Architecture 38

3.1.1. Shrinking Module 38

3.1.2. Inception Module 40

3.1.3. Convolutional Triplet Attention Module 41

3.1.4. Detection Module 42

3.2. Loss Function 44

3.3. Experiments 46

3.3.1. Proposed Datasets for Eye Detection 46

3.3.2. Experimental Setup 48

3.3.3. Experimental Results 49

3.4. Ablation Studies 52

Chapter 4. Eye Classification Network 54

4.1. Network Architecture 54

4.1.1. Backbone Module 55

4.1.2. Classification Module 56

4.2. Loss Function 57

4.3. Experiments 57

4.3.1. Datasets 57

4.3.2. Experimental Setup 57

4.3.3. Experimental Results 58

4.4. Ablation Studies 60

Chapter 5. Real-time Eye Status Monitoring 62

5.1. YOLO5Face Detector 62

5.1.1. Network Architecture 64

5.1.2. Dataset 64

5.1.3. Experimental Results 65

5.2. Integrated System 65

5.3. Real-time Analysis 67

Chapter 6. Conclusion 72

6.1. Conclusions 72

6.2. Future Works 74

Appendix A. Publications 75

A.1. Journal 75

A.2. Conference 75

Bibliography 81

List of Tables

TABLE 3.1. The comparison result of eye detection network on individual proposed datasets. 50

TABLE 3.2. Ablation studies of eye detection network on CEW dataset. The red color indicates the best result. 53

TABLE 4.1. The comparison result of the eye classification network with popular classification networks on CEW and MRL Eye Datasets.... 58

TABLE 4.2. The comparison result of the eye classification network with different optimizers on CEW and MRL Eye Datasets. 60

TABLE 4.3. The comparison result of eye classification networks with the GAP and fully connected (FC) layers on CEW and MRL Eye datasets. 61

TABLE 5.1. The variants of YOLO5Face network 65

TABLE 5.2. Comparison of YOLO5Face networks and existing face detectors on the WIDER FACE validation dataset 66

TABLE 5.3. The speed performance of the system in real-time testing on PC and Jetson Nano device. The red color indicates the best result. 70

TABLE 5.4. The hardware specification of system in (Saurav et al., 2022) and proposed system. 70

List of Figures

FIGURE 1.1. Overview of the proposed driver eye status monitoring system. 19

FIGURE 2.1. The simple human vision system 24

FIGURE 2.2. The computer vision system for image classification 24

FIGURE 2.3. Typical CNN architecture 25

FIGURE 2.4. The VGG16 architecture for image classification. 26

FIGURE 2.5. Confusion matrix. 27

FIGURE 2.6. SSD architecture for object detection 29

FIGURE 2.7. The visualization of IoU. 29

FIGURE 2.8. The SE architecture 32

FIGURE 2.9. The BAM architecture 33

FIGURE 2.10. The CBAM architecture 34

FIGURE 3.1. The proposed eye detection network architecture. 38

FIGURE 3.2. An illustration of convolution layer. 39

FIGURE 3.3. The operation of the max pooling layer. 39

FIGURE 3.4. The architecture of the CReLU block. 40

FIGURE 3.5. The architecture of the inception layer. 40

FIGURE 3.6. The architecture of the CTA module. 41

FIGURE 3.7. The geometry of the predicted bounding box, anchor bounding box, and ground-truth bounding box. 43

FIGURE 3.8. NMS visualization in the eye detection. 45

FIGURE 3.9. The annotation generation process by Python code. 47

FIGURE 3.10. The interface of the LabelImg annotation tool on the FEI dataset. 47

FIGURE 3.11. The sample of annotation and XML file. 48

FIGURE 3.12. The proposed dataset distrubution. 49

FIGURE 3.13. The qualitative results on five proposed datasets. 51

FIGURE 3.14. The several mistakes in detection results on five proposed datasets. 52

FIGURE 4.1. The proposed eye classification network architecture. 54

FIGURE 4.2. The operation of the average pooling layer. 55

FIGURE 4.3. The global average pooling layer. 56

FIGURE 4.4. The qualitative results of proposed eye classification network on CEW and MRL Eye datasets. 59

FIGURE 4.5. The confusion matrices on CEW and MRL Eye datasets. 60

FIGURE 5.1. The real-time testing setting with all devices. 62

FIGURE 5.2. The YOLO5Face architecture and sub modules. 63

FIGURE 5.3. The hardware is used in the real-time eye status monitoring system. 67

FIGURE 5.4. Block diagram of the proposed driver eye status monitoring system. 67

FIGURE 5.5. The qualitative results of the real-time eye status monitoring system with VGA video live-stream on Jetson Nano device... 68

FIGURE 5.6. The qualitative results of the real-time eye status monitoring system with VGA video live-stream on CPU-based PC with... 69

FIGURE 5.7. The qualitative results of the real-time eye status monitoring system with VGA video recording in the car on CPU-based PC. 69

FIGURE 5.8. The speed comparison between the proposed method and others in real-time testing on VGA (640 × 480) resolution. 71

초록보기

 Traffic accidents are the leading death rate among accident categories. One of the major causes of road traffic accidents is driver drowsiness. Many studies have paid attention to this issue and developed driver assistance tools to reduce the risk. These methods mainly analyze driver behavior, vehicle behavior, and driver physiology. This research proposes a driver eye status monitoring system based on lightweight convolutional neural network (CNN) architectures. The overall system consists of three stages: face detection, eye detection, and eye classification. In the first stage, the system utilizes a small real-time face detector based on the YOLOv5 network, named YOLO5Face (YOLO5nFace). The second stage focuses on exploiting the compact CNN network architecture combined with the inception network, and Convolutional Triplet Attention mechanism. Finally, the system uses a simple classification network architecture to classify open or closed eye status. Additionally, this work also provides the datasets for the eye detection task comprised of 10,659 images and 21,318 labels.

As a result, the real-time testing reached the best result at 33.12 FPS (frames per second) and 25.11 FPS on an Intel® CoreTM i7-4770 CPU @ 3.40GHz with 8 GB of RAM (Personal Computer - PC) and a 128-core Nvidia Maxwell GPU with 4 GB of RAM (Jetson Nano device), respectively. This speed is comparable with other previous techniques and it ensures that the proposed method can be applied in real-time systems for driver eye monitoring.