WEKO3
アイテム
軽量なレイアウト認識モデルを活用した 大規模なOCRテキストデータの構造化及び成果物の分析
https://ipsj.ixsq.nii.ac.jp/records/2006234
https://ipsj.ixsq.nii.ac.jp/records/2006234affd86b9-9011-4cf8-b450-6626bdbcdbad
| 名前 / ファイル | ライセンス | アクション |
|---|---|---|
|
2026年12月13日からダウンロード可能です。
|
Copyright (c) 2025 by the Information Processing Society of Japan
|
|
| 非会員:¥660, IPSJ:学会員:¥330, CH:会員:¥0, DLIB:会員:¥0 | ||
| Item type | Symposium(1) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 公開日 | 2025-12-06 | |||||||||
| タイトル | ||||||||||
| 言語 | ja | |||||||||
| タイトル | 軽量なレイアウト認識モデルを活用した 大規模なOCRテキストデータの構造化及び成果物の分析 | |||||||||
| タイトル | ||||||||||
| 言語 | en | |||||||||
| タイトル | Structuring and analyzing the results large-scale OCR text dataobtained using a lightweight layout recognition model | |||||||||
| 言語 | ||||||||||
| 言語 | jpn | |||||||||
| キーワード | ||||||||||
| 主題Scheme | Other | |||||||||
| 主題 | OCR;構造化;テキストデータ;大規模データ資源;レイアウト認識 | |||||||||
| 資源タイプ | ||||||||||
| 資源タイプ識別子 | http://purl.org/coar/resource_type/c_5794 | |||||||||
| 資源タイプ | conference paper | |||||||||
| 著者所属 | ||||||||||
| National Diet Library | ||||||||||
| 著者所属 | ||||||||||
| National Diet Library | ||||||||||
| 著者所属(英) | ||||||||||
| en | ||||||||||
| National Diet Library | ||||||||||
| 著者所属(英) | ||||||||||
| en | ||||||||||
| National Diet Library | ||||||||||
| 著者名 |
青池,亨
× 青池,亨
× 木下,貴文
|
|||||||||
| 著者名(英) |
Toru Aoike
× Toru Aoike
× Takafumi Kinoshita
|
|||||||||
| 論文抄録 | ||||||||||
| 内容記述タイプ | Other | |||||||||
| 内容記述 | The National Diet Library (NDL) has made a considerable effort both to use optical character recognition (OCR) in converting its collection into digital text and to develop OCR technology. Given the sheer volume of the materials that must be handled, even as more advanced methods that yielded more sophisticated results became available, the difficulty of performing large-scale reprocessing on materials that had already undergone OCR processing was a significant challenge. The results of this study, which was made using materials for which copyright protection had expired, show that the usability of large volumes of existing OCR text data can be improved in a fast and resource-efficient manner by applying post-processing with a lightweight layout recognition model. In addition, the results of this study were applied in the development of an experimental feature that has been added to the Next Digital Library in the form of a text mode that displays only the structured text data. | |||||||||
| 論文抄録(英) | ||||||||||
| 内容記述タイプ | Other | |||||||||
| 内容記述 | The National Diet Library (NDL) has made a considerable effort both to use optical character recognition (OCR) in converting its collection into digital text and to develop OCR technology. Given the sheer volume of the materials that must be handled, even as more advanced methods that yielded more sophisticated results became available, the difficulty of performing large-scale reprocessing on materials that had already undergone OCR processing was a significant challenge. The results of this study, which was made using materials for which copyright protection had expired, show that the usability of large volumes of existing OCR text data can be improved in a fast and resource-efficient manner by applying post-processing with a lightweight layout recognition model. In addition, the results of this study were applied in the development of an experimental feature that has been added to the Next Digital Library in the form of a text mode that displays only the structured text data. | |||||||||
| 書誌情報 |
じんもんこん2025論文集 巻 2025, p. 431-436, ページ数 6, 発行日 2025-12-06 |
|||||||||
| 出版者 | ||||||||||
| 言語 | ja | |||||||||
| 出版者 | 情報処理学会 | |||||||||