アラートスコアリングによるITシステム障害の被疑箇所分析－ストレージI/O通信経路を用いたネットワーク障害分析機能の強化－

近藤,玲子; 荻原,一隆; 児玉,武司; 上野,仁; 白石,崇; Reiko Kondo; Kazutaka Ogihara; Takeshi Kodama; Hitoshi Ueno; Takashi Shiraishi

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

アラートスコアリングによるITシステム障害の被疑箇所分析－ストレージI/O通信経路を用いたネットワーク障害分析機能の強化－

https://ipsj.ixsq.nii.ac.jp/records/2002154

名前 / ファイル	ライセンス	アクション
IPSJ-IOT25069006.pdf (1.8 MB) 2999年12月31日からダウンロード可能です。	Copyright (c) 2025 by the Institute of Electronics, Information and Communication Engineers This SIG report is only available to those in membership of the SIG.
IOT:会員：¥0, DLIB:会員：¥0

Item type

SIG Technical Reports(1)

公開日

2025-05-22

タイトル

言語

タイトル

アラートスコアリングによるITシステム障害の被疑箇所分析－ストレージI/O通信経路を用いたネットワーク障害分析機能の強化－

タイトル

言語

タイトル

IT System Failure Point Localization Technique Based on Alert Scoring －Enhanced Network Failure Analysis using Storage I/O Path－

言語

jpn

キーワード

主題Scheme

Other

主題

ICM

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_18gh

資源タイプ

technical report

著者所属

エフサステクノロジーズ株式会社

著者所属

エフサステクノロジーズ株式会社

著者所属

エフサステクノロジーズ株式会社

著者所属

エフサステクノロジーズ株式会社

著者所属

富士通株式会社

著者所属(英)

FSAS TECHNOLOGIES INC.

著者所属(英)

FSAS TECHNOLOGIES INC.

著者所属(英)

FSAS TECHNOLOGIES INC.

著者所属(英)

FSAS TECHNOLOGIES INC.

著者所属(英)

FUJITSU LIMITED

著者名

近藤,玲子
荻原,一隆
児玉,武司
上野,仁
白石,崇

著者名(英)

Reiko Kondo
Kazutaka Ogihara
Takeshi Kodama
Hitoshi Ueno
Takashi Shiraishi

論文抄録

内容記述タイプ

Other

内容記述

仮想化によりシステムが複雑化したことで，一度障害が発生すると仮想や物理の各機器間の依存関係から複数の機器にアラートが伝搬し，多数の機器で様々な種類のアラートが同時に発生することがある．このように多数のアラートが発生すると，そのアラートから障害の根本となった被疑箇所を推定することは難しく，障害対応が長時間化する傾向にある．これに対し我々は障害によって依存関係にある機器が影響を受けてアラートが伝搬することを利用し，アラートをスコアリングすることで被疑箇所を推定する技術を提案してきた．また前回の報告ではアラート情報としてアノマリ検知結果を用いて実験検証を行い，アノマリを検知した段階で障害の被疑箇所が推定できる可能性について評価した．本論文では，前回まで対象としていなかったネットワーク（NW）障害に着目し，CPUやメモリ，ストレージに加え，NWのアラートも統合した障害の被疑箇所推定技術を提案する．NW障害の推定ではVM間通信のようなワークロード通信経路の特定が困難な場合でも，ストレージI/Oの通信経路を用いることでNW障害の被疑箇所を推定する手法を提案し，実験を通してその有効性を検証した．本提案技術によって，運用者は多数のコンポーネントから複数種類のアラートが上がっても，機器やアラートの種類を区別することなく推定された最も可能性の高い障害箇所から優先順位に従って対処すればよいため，障害復旧時間の短縮を見込める．

論文抄録(英)

内容記述タイプ

Other

内容記述

As the system becomes complicated by virtualization, once a failure occurs, an alert propagates to multiple devices from the dependency between virtual and physical devices, and various kinds of alerts may occur simultaneously in many devices. When such a large number of alerts are generated, it is difficult to estimate the root failure location from the alerts, and the response time tends to be longer. On the other hand, we have proposed a technique to estimate the failure location by scoring the alert, using the fact that the failure affects dependent devices and propagates the alert. In the previous report, experimental verification was carried out using the result of the anomaly detection as the alert information, and the possibility of estimating the fault location at the stage of the anomaly detection was evaluated. In this paper, we focus on network (NW) failures, which were not targeted until the previous paper, and propose a failure location estimation technique that integrates not only CPU, memory, and storage but also NW alerts. In particular, when it is difficult to identify a workload communication path such as inter-VM communication, we proposed a method to estimate a failure location by using a communication path of storage I/O and verified its effectiveness through experiments. With the proposed technology, even when multiple kinds of alerts are issued from a large number of components, the operator can expect to reduce the time required to recover from a failure because the operator can take action in order of priority from the most probable failure location, without distinguishing between devices and types of alerts.

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA12326962

書誌情報

研究報告インターネットと運用技術（IOT）

巻 2025-IOT-69, 号 6, p. 1-6, 発行日 2025-05-22

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2188-8787

Notice

SIG Technical Reports are nonrefereed and hence may later appear in any journals, conferences, symposia, etc.

出版者

言語

出版者

情報処理学会

戻る

views

See details

	Views

Versions

Ver.1

2025-05-14 01:00:45.568254

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

アラートスコアリングによるITシステム障害の被疑箇所分析－ストレージI/O通信経路を用いたネットワーク障害分析機能の強化－

× 近藤,玲子

× 荻原,一隆

× 児玉,武司

× 上野,仁

× 白石,崇

× Reiko Kondo

× Kazutaka Ogihara

× Takeshi Kodama

× Hitoshi Ueno

× Takashi Shiraishi

Versions

Share

Cite as

エクスポート