2024-03-29T08:17:14Zhttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_oaipmhoai:ipsj.ixsq.nii.ac.jp:002011022023-11-14T00:51:14Z06164:06165:06630:10004
Segmenting Text in Japanese Historical Document Images using Convolutional Neural NetworksSegmenting Text in Japanese Historical Document Images using Convolutional Neural Networksengneural network; text segmentation; historical documenthttp://id.nii.ac.jp/1001/00201009/Conference Paperhttps://ipsj.ixsq.nii.ac.jp/ej/?action=repository_action_common_download&item_id=201102&item_no=1&attribute_id=1&file_no=1Copyright (c) 2019 by the Information Processing Society of JapanTokyo University of Agriculture and TechnologyTokyo University of Agriculture and TechnologyNational Institute of InformaticsTokyo University of Agriculture and TechnologyHung, Tuan NguyenCuong, Tuan NguyenAsanobu, KitamotoMasaki, NakagawaFor historical document analysis and recognition, there exist many challenges such as damage, fade, show-through, anomalous deformation, various backgrounds, limited resources and so on. These challenges raise the demand for preprocessing historical document images. In this paper, we propose deep neural networks, named Pixel Segmentation Networks (PSNet) for text segmentation from Pre-Modern Japanese text (PMJT) historical document images. The proposed networks are used to segment pixels of text from raw document images with various background styles and image sizes, which is helpful for the later steps in historical document analysis and recognition. For preparing training patterns, we applied the Otsu local binarization method on every single character and extracted the pixel-level labels of all training document images. To evaluate the proposed networks, we used following two metrics: pixel-level accuracy (PlA) and the ratio of intersection over a union of the true test region and its detected region (IoU). Since there is the great imbalance between the number of background pixels and that of text pixels, we normalize the measurements by a weighted parameter based on the frequency of background and text pixels. Then, we made experiments on the PMJT database, which is randomly split into the training set of 1,556 images, validation set of 333 images and testing set of 333 images. The experiments show the best PlA of 98.75%, the frequency-weighted PlA of 95.27%, IoU of 87.89%, and the frequency-weighted IoU of 97.68% when 1,556 images are uses for training. Moreover, the performance of CED-PSNet12 is only degraded as little as around 2 percentage points even when under 100 images, 1/16 of the original training set are used.For historical document analysis and recognition, there exist many challenges such as damage, fade, show-through, anomalous deformation, various backgrounds, limited resources and so on. These challenges raise the demand for preprocessing historical document images. In this paper, we propose deep neural networks, named Pixel Segmentation Networks (PSNet) for text segmentation from Pre-Modern Japanese text (PMJT) historical document images. The proposed networks are used to segment pixels of text from raw document images with various background styles and image sizes, which is helpful for the later steps in historical document analysis and recognition. For preparing training patterns, we applied the Otsu local binarization method on every single character and extracted the pixel-level labels of all training document images. To evaluate the proposed networks, we used following two metrics: pixel-level accuracy (PlA) and the ratio of intersection over a union of the true test region and its detected region (IoU). Since there is the great imbalance between the number of background pixels and that of text pixels, we normalize the measurements by a weighted parameter based on the frequency of background and text pixels. Then, we made experiments on the PMJT database, which is randomly split into the training set of 1,556 images, validation set of 333 images and testing set of 333 images. The experiments show the best PlA of 98.75%, the frequency-weighted PlA of 95.27%, IoU of 87.89%, and the frequency-weighted IoU of 97.68% when 1,556 images are uses for training. Moreover, the performance of CED-PSNet12 is only degraded as little as around 2 percentage points even when under 100 images, 1/16 of the original training set are used.じんもんこん2019論文集20192532602019-12-072019-12-04