情報学広場：情報処理学会電子図書館

WEKO3

To

lat lon distance

[[sub_check.contents]]

[[sub_check.contents]]

[[sub_radio.contents]]

To

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

合成データ生成のランダム性に内在する安全性の評価

https://ipsj.ixsq.nii.ac.jp/records/214437

名前 / ファイル	ライセンス	アクション
IPSJCSS2021037.pdf (1.2 MB)	Copyright (c) 2021 by the Information Processing Society of Japan
オープンアクセス

Item type

Symposium(1)

公開日

2021-10-19

タイトル

タイトル

合成データ生成のランダム性に内在する安全性の評価

タイトル

言語

en

タイトル

On Security of Randomness in Synthetic Data Generation

言語

言語

jpn

キーワード

主題Scheme

Other

主題

合成データ生成，プライバシー保護，差分プライバシー，生成モデル

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

著者所属

NTT 社会情報研究所

著者所属

NTT 社会情報研究所

著者所属

NTT 社会情報研究所

著者所属

NTT 社会情報研究所

著者所属

NTT 社会情報研究所

著者所属(英)

en

NTT Social Informatics Laboratories

著者所属(英)

en

NTT Social Informatics Laboratories

著者所属(英)

en

NTT Social Informatics Laboratories

著者所属(英)

en

NTT Social Informatics Laboratories

著者所属(英)

en

NTT Social Informatics Laboratories

著者名

三浦, 尭之
紀伊, 真昇
芝原, 俊樹
市川, 敦謙
千田, 浩司

著者名(英)

Takayuki, Miura
Masanobu, Kii
Toshiki, Shibahara
Atsunori, Ichikawa
Koji, Chida

論文抄録

内容記述タイプ

Other

内容記述

データ利活用の活発化に伴い，活用されるデータに含まれる個人のプライバシーを保護するプライバシー保護技術が数多く提案されている．特に近年，合成データ生成技術を用いたプライバシー保護が注目を集めている．従来の合成データ生成では，データ生成に必要な値，生成パラメータにノイズを加え，差分プライベートにすることで理論的な安全性を保証していた．しかし，企業などが所有するデータのプライバシー保護のために合成データ生成を用いる場合は，生成されたデータのみを公開し，生成パラメータは公開せずに破棄することが多い．この場合，生成パラメータを用いてデータを生成する過程にランダム性があるため，差分プライベートな合成データ生成でなくても，生成されたデータから生成パラメータおよび元データを推定することは難しい問題であると考えられる．本稿では，差分プライベートでない合成データ生成が持つこのランダム性がどの程度のプライバシー保護性を有しているかを考察する．理論的な評価の第一歩目として，平均と分散を生成パラメータとする正規分布に従うデータ生成が満たす安全性を差分プライバシーの考え方に基づいて評価した．さらに，評価方法を高次元データに適用する際の方向性も示した．

論文抄録(英)

内容記述タイプ

Other

内容記述

With the increasing demands for the data utilization, many techniques have been proposed to protect the privacy of individuals in the data. In recent years, privacy protection techniques based on synthetic data generation has attracted much attention. Conventional synthetic data generation guarantees theoretical security by making generation parameters, which are required for the data generation, differentially private.
When enterprises use synthetic data generation to protect their data, they, however, generate synthetic data by their generation parameters and discard them without disclosing them. In addition, since synthetic data generation has its own randomness, it is not easy to estimate the generation parameters and the original data from the generated data. In this paper, we theoretically discuss the difficulty of the estimation. As a first step in the theoretical evaluation, we evaluate the security of synthetic data generation by the normal distribution
with the mean and the variance of the original data, referring to the concept of differential privacy. We also show the future direction of privacy-preserving data generation for high-dimensional data.

書誌情報

コンピュータセキュリティシンポジウム2021論文集

p. 268-275, 発行日 2021-10-19

出版者

言語

ja

出版者

情報処理学会

戻る

0

views

	Views

Versions

Ver.1

2025-01-19 16:37:59.041492

Show All versions

Share

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX