ユーザキャッシュを利用したWebアーカイブの構築

若菜勇気; 長谷川大; 佐久田博司

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

ユーザキャッシュを利用したWebアーカイブの構築

https://ipsj.ixsq.nii.ac.jp/records/88018

名前 / ファイル	ライセンス	アクション
IPSJ-GN13086009.pdf (1.0 MB)	Copyright (c) 2013 by the Information Processing Society of Japan
オープンアクセス

Item type

SIG Technical Reports(1)

公開日

2013-01-09

タイトル

ユーザキャッシュを利用したWebアーカイブの構築

タイトル

言語

タイトル

Construction of the Web Archive Using User Cash

言語

jpn

キーワード

主題Scheme

Other

主題

WEB，ネットワーク応用

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

青山学院大学大学院理工学研究科

著者所属

青山学院大学理工学部

著者所属

青山学院大学理工学部

著者所属(英)

Graduate School of Science and Engineering, Aoyama Gakuin University

著者所属(英)

Department of Science and Engineering, Aoyama Gakuin University

著者所属(英)

Department of Science and Engineering, Aoyama Gakuin University

著者名

若菜勇気

論文抄録

内容記述タイプ

Other

内容記述

インターネットにおいて日々変化し続ける Web ページを後世に残すために，各組織が Web アーカイブに取り組んでいる． Web アーカイブでは Web ページを自動的に探索するクローラを用いてアーカイブを行っている．しかし現状の Web アーカイブでは Web ページを収集するクローラでは静的リンクを辿り Web ページを収集しているため，ブラウザやサーバで動的に生成される深層 Web のコンテンツがアーカイブできない問題がある．そこで本稿ではクローラだけではアーカイブが困難であった Web ページのアーカイブを目的とし，ローカルのユーザキャッシュとクローラで収集されたアーカイブを統合した Web アーカイブを提案する．ユーザキャッシュは動的に生成される Web コンテンツ等，多くの深層 Web のコンテンツが保存されている．そのため提案手法ではより収集率の高い Web アーカイブを構築することが可能である．システムの有用性を示すために深層 Web のコンテンツを含む Web ページにおいて，コンテンツの取得数に関して従来のクローラのみの場合のアーカイブとの比較を行った．その結果，本システムでは外部サイトの API で生成された画像ファイルや，サーバで動的に生成されたテキストファイルなどのアーカイブが可能であることを確認した．

論文抄録(英)

内容記述タイプ

Other

内容記述

To leave web contents on Internet, which are changing on every day, to posterity, many organizations are working on archiving them. The web archive has been conducted by using web crawlers. The conventional web crawlers, however, only search web pages by following links written on html files and can only collect static web contents. Therefore, the contents so called the Deep Web, which are dynamically generated on web browsers or on servers, are not archived by the crawlers. In this paper, to successfully archive the Deep Web along with the static contents, we propose a novel archiving system that integrates contents retrieved by a web crawler and from user cashes. The user caches store the Deep Web when users accessed them and the contents were dynamically generated. Therefore, by using user cashes the system can create a web archive with higher reproducibility. To evaluate archive performance, we compared our system with a conventional crawler on the number of contents successfully archived from a web page that contains the Deep Web contents. As results, we confirmed that our proposed system could collect the larger number of contents; especially picture files generated by using API of the outside sites and text files generated on server-side.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA1155524X

書誌情報

研究報告グループウェアとネットワークサービス（GN）

巻 2013-GN-86, 号 9, p. 1-7, 発行日 2013-01-09

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-01-21 16:57:34.363303

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

ユーザキャッシュを利用したWebアーカイブの構築

× 若菜勇気

Versions

Share

Cite as

エクスポート