6 private links
With the aim of informing sound policy about data sharing and privacy, we
describe successful re-identification of patients in an Australian
de-identified open health dataset. As in prior studies of similar datasets, a
few mundane facts often suffice to isolate an individual. Some people can be
identified by name based on publicly available information. Decreasing the
precision of the unit-record level data, or perturbing it statistically, makes
re-identification gradually harder at a substantial cost to utility. We also
examine the value of related datasets in improving the accuracy and confidence
of re-identification. Our re-identifications were performed on a 10% sample
dataset, but a related open Australian dataset allows us to infer with high
confidence that some individuals in the sample have been correctly
re-identified. Finally, we examine the combination of the open datasets with
some commercial datasets that are known to exist but are not in our possession.
We show that they would further increase the ease of re-identification.