Item type |
Trans(1) |
公開日 |
2020-01-29 |
タイトル |
|
|
タイトル |
Centaurus: A Just-in-time Parallel-parser Generator for Ad Hoc Data Processing |
タイトル |
|
|
言語 |
en |
|
タイトル |
Centaurus: A Just-in-time Parallel-parser Generator for Ad Hoc Data Processing |
言語 |
|
|
言語 |
eng |
キーワード |
|
|
主題Scheme |
Other |
|
主題 |
[発表概要,Unrefereed Presentation Abstract] |
資源タイプ |
|
|
資源タイプ識別子 |
http://purl.org/coar/resource_type/c_6501 |
|
資源タイプ |
journal article |
著者所属 |
|
|
|
Graduate School of Information Science and Technology, The University of Tokyo |
著者所属 |
|
|
|
Center for Technology Innovation, Hitachi Ltd. |
著者所属 |
|
|
|
Graduate School of Information Science and Technology, The University of Tokyo |
著者所属(英) |
|
|
|
en |
|
|
Graduate School of Information Science and Technology, The University of Tokyo |
著者所属(英) |
|
|
|
en |
|
|
Center for Technology Innovation, Hitachi Ltd. |
著者所属(英) |
|
|
|
en |
|
|
Graduate School of Information Science and Technology, The University of Tokyo |
著者名 |
Shigeyuki, Sato
Hiroka, Ihara
Kenjiro, Taura
|
著者名(英) |
Shigeyuki, Sato
Hiroka, Ihara
Kenjiro, Taura
|
論文抄録 |
|
|
内容記述タイプ |
Other |
|
内容記述 |
It is important to handle data in text formats such as XML, JSON, and CSV because these data very often appear in the context of data exchange. Only parts of these data are typically used afterwards so that it is not worth ingesting the whole of them into databases. It is therefore desired to match and extract the concerned part in a lightweight ad hoc manner. Classically used for such a purpose are linewise regular expression tools such as grep, sed, and awk. These are, however, not powerful enough for text formats commonly used for data exchange because they cannot recognize nested structures in general. To support a lightweight ad hoc data processing, we present Centaurus, a just-in-time parallel-parser generator library. By generating native scannerless LL(*) parsers dynamically, our library enables us to process input data in parallel merely by calling Python functions with LL(*) grammars and Python actions. This presentation gives the design and implementation of Centaurus and reports its experimental performance on data filtering. |
論文抄録(英) |
|
|
内容記述タイプ |
Other |
|
内容記述 |
It is important to handle data in text formats such as XML, JSON, and CSV because these data very often appear in the context of data exchange. Only parts of these data are typically used afterwards so that it is not worth ingesting the whole of them into databases. It is therefore desired to match and extract the concerned part in a lightweight ad hoc manner. Classically used for such a purpose are linewise regular expression tools such as grep, sed, and awk. These are, however, not powerful enough for text formats commonly used for data exchange because they cannot recognize nested structures in general. To support a lightweight ad hoc data processing, we present Centaurus, a just-in-time parallel-parser generator library. By generating native scannerless LL(*) parsers dynamically, our library enables us to process input data in parallel merely by calling Python functions with LL(*) grammars and Python actions. This presentation gives the design and implementation of Centaurus and reports its experimental performance on data filtering. |
書誌レコードID |
|
|
収録物識別子タイプ |
NCID |
|
収録物識別子 |
AA11464814 |
書誌情報 |
情報処理学会論文誌プログラミング(PRO)
巻 13,
号 1,
p. 18-18,
発行日 2020-01-29
|
ISSN |
|
|
収録物識別子タイプ |
ISSN |
|
収録物識別子 |
1882-7802 |
出版者 |
|
|
言語 |
ja |
|
出版者 |
情報処理学会 |