초록

In line with the recently strengthened chemical regulations, a very large amount of chemical substance hazard information is being produced and updated irregularly, and the information is provided in text form on the homepages of ECHA, NCIS, and KOSHA. Manually collecting necessary information from a vast harmfulness information DB has problems such as high cost, long time, and human error in the information collection process. In order to supplement these problems and efficiently collect hazard information, this study developed a hazard information collection algorithm that automatically collects hazard information of chemicals from big data on the ECHA website. In this study, among ECHA's big data, the hazard information collection process, such as classification and display of human and environmental hazards that are classified as GHS, toxicity value end point, exposure period, etc., was developed as an algorithm, and the developed algorithm was coded and RPA with crawling technique applied Crawler was developed. In the process of developing the hazard information collection algorithm, unstandardized data was identified due to the variety of hazard information that the ECHA DB has, and it was determined that a verification procedure was needed to ensure that the crawler accurately collects hazard information. The accuracy of the hazard information collection algorithm was verified by comparing the hazard information manually generated for 25 chemical substances with the hazard information DB generated by the crawler. The algorithm of the hazard items that occurred was supplemented and corrected.