Item type |
Symposium(1) |
公開日 |
2019-12-07 |
タイトル |
|
|
タイトル |
Segmenting Text in Japanese Historical Document Images using Convolutional Neural Networks |
タイトル |
|
|
言語 |
en |
|
タイトル |
Segmenting Text in Japanese Historical Document Images using Convolutional Neural Networks |
言語 |
|
|
言語 |
eng |
キーワード |
|
|
主題Scheme |
Other |
|
主題 |
neural network; text segmentation; historical document |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_5794 |
|
資源タイプ |
conference paper |
著者所属 |
|
|
|
Tokyo University of Agriculture and Technology |
著者所属 |
|
|
|
Tokyo University of Agriculture and Technology |
著者所属 |
|
|
|
National Institute of Informatics |
著者所属 |
|
|
|
Tokyo University of Agriculture and Technology |
著者所属(英) |
|
|
|
en |
|
|
Tokyo University of Agriculture and Technology, Tokyo University of Agriculture and Technology, National Institute of Informatics, Tokyo University of Agriculture and Technology |
著者名 |
Hung, Tuan Nguyen
Cuong, Tuan Nguyen
Asanobu, Kitamoto
Masaki, Nakagawa
|
著者名(英) |
Hung, Tuan Nguyen
Cuong, Tuan Nguyen
Asanobu, Kitamoto
Masaki, Nakagawa
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
For historical document analysis and recognition, there exist many challenges such as damage, fade, show-through, anomalous deformation, various backgrounds, limited resources and so on. These challenges raise the demand for preprocessing historical document images. In this paper, we propose deep neural networks, named Pixel Segmentation Networks (PSNet) for text segmentation from Pre-Modern Japanese text (PMJT) historical document images. The proposed networks are used to segment pixels of text from raw document images with various background styles and image sizes, which is helpful for the later steps in historical document analysis and recognition. For preparing training patterns, we applied the Otsu local binarization method on every single character and extracted the pixel-level labels of all training document images. To evaluate the proposed networks, we used following two metrics: pixel-level accuracy (PlA) and the ratio of intersection over a union of the true test region and its detected region (IoU). Since there is the great imbalance between the number of background pixels and that of text pixels, we normalize the measurements by a weighted parameter based on the frequency of background and text pixels. Then, we made experiments on the PMJT database, which is randomly split into the training set of 1,556 images, validation set of 333 images and testing set of 333 images. The experiments show the best PlA of 98.75%, the frequency-weighted PlA of 95.27%, IoU of 87.89%, and the frequency-weighted IoU of 97.68% when 1,556 images are uses for training. Moreover, the performance of CED-PSNet12 is only degraded as little as around 2 percentage points even when under 100 images, 1/16 of the original training set are used. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
For historical document analysis and recognition, there exist many challenges such as damage, fade, show-through, anomalous deformation, various backgrounds, limited resources and so on. These challenges raise the demand for preprocessing historical document images. In this paper, we propose deep neural networks, named Pixel Segmentation Networks (PSNet) for text segmentation from Pre-Modern Japanese text (PMJT) historical document images. The proposed networks are used to segment pixels of text from raw document images with various background styles and image sizes, which is helpful for the later steps in historical document analysis and recognition. For preparing training patterns, we applied the Otsu local binarization method on every single character and extracted the pixel-level labels of all training document images. To evaluate the proposed networks, we used following two metrics: pixel-level accuracy (PlA) and the ratio of intersection over a union of the true test region and its detected region (IoU). Since there is the great imbalance between the number of background pixels and that of text pixels, we normalize the measurements by a weighted parameter based on the frequency of background and text pixels. Then, we made experiments on the PMJT database, which is randomly split into the training set of 1,556 images, validation set of 333 images and testing set of 333 images. The experiments show the best PlA of 98.75%, the frequency-weighted PlA of 95.27%, IoU of 87.89%, and the frequency-weighted IoU of 97.68% when 1,556 images are uses for training. Moreover, the performance of CED-PSNet12 is only degraded as little as around 2 percentage points even when under 100 images, 1/16 of the original training set are used. |
書誌情報 |
じんもんこん2019論文集
巻 2019,
p. 253-260,
発行日 2019-12-07
|
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |