軽量なレイアウト認識モデルを活用した 大規模なOCRテキストデータの構造化及び成果物の分析

青池,亨; 木下,貴文; Toru Aoike; Takafumi Kinoshita

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

軽量なレイアウト認識モデルを活用した大規模なOCRテキストデータの構造化及び成果物の分析

https://ipsj.ixsq.nii.ac.jp/records/2006234

名前 / ファイル	ライセンス	アクション
IPSJ-CH2025059.pdf (1.6 MB) 2026年12月13日からダウンロード可能です。	Copyright (c) 2025 by the Information Processing Society of Japan
非会員：¥660, IPSJ:学会員：¥330, CH:会員：¥0, DLIB:会員：¥0

Item type

Symposium(1)

公開日

2025-12-06

タイトル

言語

タイトル

軽量なレイアウト認識モデルを活用した大規模なOCRテキストデータの構造化及び成果物の分析

タイトル

言語

タイトル

Structuring and analyzing the results large-scale OCR text dataobtained using a lightweight layout recognition model

言語

jpn

キーワード

主題Scheme

Other

主題

OCR;構造化;テキストデータ;大規模データ資源;レイアウト認識

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

National Diet Library

著者所属

National Diet Library

著者所属(英)

National Diet Library

著者所属(英)

National Diet Library

著者名

青池,亨
木下,貴文

著者名(英)

Toru Aoike
Takafumi Kinoshita

論文抄録

内容記述タイプ

Other

内容記述

The National Diet Library (NDL) has made a considerable effort both to use optical character recognition (OCR) in converting its collection into digital text and to develop OCR technology. Given the sheer volume of the materials that must be handled, even as more advanced methods that yielded more sophisticated results became available, the difficulty of performing large-scale reprocessing on materials that had already undergone OCR processing was a significant challenge. The results of this study, which was made using materials for which copyright protection had expired, show that the usability of large volumes of existing OCR text data can be improved in a fast and resource-efficient manner by applying post-processing with a lightweight layout recognition model. In addition, the results of this study were applied in the development of an experimental feature that has been added to the Next Digital Library in the form of a text mode that displays only the structured text data.

論文抄録(英)

内容記述タイプ

Other

内容記述

書誌情報

じんもんこん2025論文集

巻 2025, p. 431-436, ページ数 6, 発行日 2025-12-06

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-12-03 01:56:43.095756

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

軽量なレイアウト認識モデルを活用した大規模なOCRテキストデータの構造化及び成果物の分析

× 青池,亨

× 木下,貴文

× Toru Aoike

× Takafumi Kinoshita

Versions

Share

Cite as

エクスポート

インデックスリンク

インデックスツリー

アイテム

軽量なレイアウト認識モデルを活用した 大規模なOCRテキストデータの構造化及び成果物の分析

× 青池,亨

× 木下,貴文

× Toru Aoike

× Takafumi Kinoshita

Versions

Share

Cite as

エクスポート

軽量なレイアウト認識モデルを活用した大規模なOCRテキストデータの構造化及び成果物の分析