Centaurus: A Just-in-time Parallel-parser Generator for Ad Hoc Data Processing

Shigeyuki, Sato; Hiroka, Ihara; Kenjiro, Taura; Shigeyuki, Sato; Hiroka, Ihara; Kenjiro, Taura

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Centaurus: A Just-in-time Parallel-parser Generator for Ad Hoc Data Processing

https://ipsj.ixsq.nii.ac.jp/records/202966

名前 / ファイル	ライセンス	アクション
IPSJ-TPRO1301007.pdf (27.7 kB)	Copyright (c) 2020 by the Information Processing Society of Japan
オープンアクセス

Item type

Trans(1)

公開日

2020-01-29

タイトル

Centaurus: A Just-in-time Parallel-parser Generator for Ad Hoc Data Processing

タイトル

言語

タイトル

Centaurus: A Just-in-time Parallel-parser Generator for Ad Hoc Data Processing

言語

eng

キーワード

主題Scheme

Other

主題

[発表概要，Unrefereed Presentation Abstract]

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

Graduate School of Information Science and Technology, The University of Tokyo

著者所属

Center for Technology Innovation, Hitachi Ltd.

著者所属

Graduate School of Information Science and Technology, The University of Tokyo

著者所属(英)

Graduate School of Information Science and Technology, The University of Tokyo

著者所属(英)

Center for Technology Innovation, Hitachi Ltd.

著者所属(英)

Graduate School of Information Science and Technology, The University of Tokyo

著者名

Shigeyuki, Sato
Hiroka, Ihara
Kenjiro, Taura

著者名(英)

Shigeyuki, Sato
Hiroka, Ihara
Kenjiro, Taura

論文抄録

内容記述タイプ

Other

内容記述

It is important to handle data in text formats such as XML, JSON, and CSV because these data very often appear in the context of data exchange. Only parts of these data are typically used afterwards so that it is not worth ingesting the whole of them into databases. It is therefore desired to match and extract the concerned part in a lightweight ad hoc manner. Classically used for such a purpose are linewise regular expression tools such as grep, sed, and awk. These are, however, not powerful enough for text formats commonly used for data exchange because they cannot recognize nested structures in general. To support a lightweight ad hoc data processing, we present Centaurus, a just-in-time parallel-parser generator library. By generating native scannerless LL(*) parsers dynamically, our library enables us to process input data in parallel merely by calling Python functions with LL(*) grammars and Python actions. This presentation gives the design and implementation of Centaurus and reports its experimental performance on data filtering.

論文抄録(英)

内容記述タイプ

Other

内容記述

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA11464814

書誌情報

情報処理学会論文誌プログラミング（PRO）

巻 13, 号 1, p. 18-18, 発行日 2020-01-29

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7802

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-19 20:42:55.650314

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Centaurus: A Just-in-time Parallel-parser Generator for Ad Hoc Data Processing

× Shigeyuki, Sato

× Hiroka, Ihara

× Kenjiro, Taura

× Shigeyuki, Sato

× Hiroka, Ihara

× Kenjiro, Taura

Versions

Share

Cite as

エクスポート