Article | Proceedings of the Annual International Conference on Privacy, Security, and Trust | 2012

Exploring Re-Identification Risks in Public Domains

by Aditi Ramachandran, Lisa Singh, Edward Porter and Frank Nagle

Abstract

While re-identification of sensitive data has been studied extensively, with the emergence of online social networks and the popularity of digital communications, the ability to use public data for re-identification has increased. This work begins by presenting two different cases studies for sensitive data reidentification. We conclude that targeted re-identification using traditional variables is not only possible, but fairly straightforward given the large amount of public data available. However, our first case study also indicates that large-scale re-identification is less likely. We then consider methods for agencies such as the Census Bureau to identify variables that cause individuals to be vulnerable without testing all combinations of variables. We show the effectiveness of different strategies on a Census Bureau data set and on a synthetic data set.

Citation:

Ramachandran, Aditi, Lisa Singh, Edward Porter, and Frank Nagle. "Exploring Re-Identification Risks in Public Domains." Proceedings of the Annual International Conference on Privacy, Security, and Trust (2012).