情報学広場：情報処理学会電子図書館

WEKO3

To

lat lon distance

[[sub_check.contents]]

[[sub_check.contents]]

[[sub_radio.contents]]

To

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

SETSUBUN: Revisiting Membership Inference Game for Evaluating Synthetic Data Generation

https://ipsj.ixsq.nii.ac.jp/records/239368

名前 / ファイル	ライセンス	アクション
IPSJ-JNL6509014.pdf (1.3 MB) 2026年9月15日からダウンロード可能です。	Copyright (c) 2024 by the Information Processing Society of Japan
非会員：¥0, IPSJ:学会員：¥0, 論文誌:会員：¥0, DLIB:会員：¥0

Item type

Journal(1)

公開日

2024-09-15

タイトル

タイトル

SETSUBUN: Revisiting Membership Inference Game for Evaluating Synthetic Data Generation

タイトル

言語

en

タイトル

SETSUBUN: Revisiting Membership Inference Game for Evaluating Synthetic Data Generation

言語

言語

eng

キーワード

主題Scheme

Other

主題

[特集:サプライチェーンを安全にするサイバーセキュリティ技術] synthetic data generation, membership inference, privacy protection, evaluation framework

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

著者所属

NTT Social Informatics Laboratories／Osaka University

著者所属

NTT Social Informatics Laboratories

著者所属

NTT Social Informatics Laboratories

著者所属

NTT Social Informatics Laboratories／Osaka University

著者所属

NTT Social Informatics Laboratories

著者所属

NTT Social Informatics Laboratories

著者所属

Osaka University

著者所属(英)

en

NTT Social Informatics Laboratories / Osaka University

著者所属(英)

en

NTT Social Informatics Laboratories

著者所属(英)

en

NTT Social Informatics Laboratories

著者所属(英)

en

NTT Social Informatics Laboratories / Osaka University

著者所属(英)

en

NTT Social Informatics Laboratories

著者所属(英)

en

NTT Social Informatics Laboratories

著者所属(英)

en

Osaka University

著者名

Takayuki, Miura
Masanobu, Kii
Toshiki, Shibahara
Kazuki, Iwahana
Tetsuya, Okuda
Atsunori, Ichikawa
Naoto, Yanai

著者名(英)

Takayuki, Miura
Masanobu, Kii
Toshiki, Shibahara
Kazuki, Iwahana
Tetsuya, Okuda
Atsunori, Ichikawa
Naoto, Yanai

論文抄録

内容記述タイプ

Other

内容記述

Synthetic data generation techniques are promising for anonymizing high-dimensional tabular datasets, and their privacy protection can be evaluated by membership inference attacks. However, the existing evaluation framework has limitations from two perspectives: (1) it cannot evaluate the worst-case because a target sample is chosen randomly; and (2) the decision criterion of an adversary's inference is black box since the adversary conducts membership inference by using machine learning models. In this paper, we propose a framework to overcome the above limitations in a simple and clear fashion. To cope with limitation (1), we introduce a statistical distance to choose a vulnerable target sample. To cope with limitation (2), we propose two interpretable inference methods. One is a method with typical statistics scores, and the other is a method with the Euclidean distance from the target sample. We conduct extensive experiments on two datasets and five synthesis algorithms to confirm the effectiveness of our framework. The experiments show that our framework enables us to evaluate privacy in synthetic data generation techniques more tightly.
------------------------------
This is a preprint of an article intended for publication Journal of
Information Processing(JIP). This preprint should not be cited. This
article should be cited as: Journal of Information Processing Vol.32(2024) (online)
DOI　http://dx.doi.org/10.2197/ipsjjip.32.757
------------------------------

論文抄録(英)

内容記述タイプ

Other

内容記述

Synthetic data generation techniques are promising for anonymizing high-dimensional tabular datasets, and their privacy protection can be evaluated by membership inference attacks. However, the existing evaluation framework has limitations from two perspectives: (1) it cannot evaluate the worst-case because a target sample is chosen randomly; and (2) the decision criterion of an adversary's inference is black box since the adversary conducts membership inference by using machine learning models. In this paper, we propose a framework to overcome the above limitations in a simple and clear fashion. To cope with limitation (1), we introduce a statistical distance to choose a vulnerable target sample. To cope with limitation (2), we propose two interpretable inference methods. One is a method with typical statistics scores, and the other is a method with the Euclidean distance from the target sample. We conduct extensive experiments on two datasets and five synthesis algorithms to confirm the effectiveness of our framework. The experiments show that our framework enables us to evaluate privacy in synthetic data generation techniques more tightly.
------------------------------
This is a preprint of an article intended for publication Journal of
Information Processing(JIP). This preprint should not be cited. This
article should be cited as: Journal of Information Processing Vol.32(2024) (online)
DOI　http://dx.doi.org/10.2197/ipsjjip.32.757
------------------------------

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AN00116647

書誌情報

情報処理学会論文誌

巻 65, 号 9, 発行日 2024-09-15

ISSN

収録物識別子タイプ

ISSN

収録物識別子

1882-7764

公開者

言語

ja

出版者

情報処理学会

戻る

0

views

	Views

Versions

Ver.1

2025-01-19 08:18:25.589317

Show All versions

Share

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX