RESPONSIBLE OPEN DATA
HOW TO guide to data anonymisation
Data is everywhere. In the digital age, our data footprints include information on what we do, where we go, who we know, what we have, what we like or how we feel. We generate this information while we work, walk, interact, speak, protest or search online. The activities we engage in generate data in their turn, and all this information is useful to shape services, products and cities, for instance, and to promote transparency and accountability.
We have seen data improve care, public transport routing, policing and advertising. But we have also seen data be stolen or manipulated, and data processes infringe upon fundamental rights and privacy, discriminate or go wrong.
Realising the potential of increased data gathering, sharing and publishing, thus, requires an understanding of where and how data can improve outcomes, but also of the risks involved in the process. The implications of data sharing change depending on who is at the receiving end and how they access the data
Rene Magritte (1964) Le Fils de l'Homme
The same piece of health data that can be crucial to save someone's life when made available to their doctor, may also mean they are not taken into account for a job or see their insurance cost increase
This poses an important challenge for data sharing and open data in general. Choosing to never share data and keep it only in the sphere where it is gathered (medical data in a medical setting, to follow on the previous example) is an option that may mean we lose the possibility to explore health risks more generally -data on the health condition of people in one area made available to environmental services may mean we identify the unknown presence of toxic substances.
Or linking search data with medical data may reveal undivulged side effects. In different domains, linking crime data with social service data may also help identify trends previously unknown. Employment data can also help improve public transport by better predicting flows and demand. In less crucial settings, data sharing can help improve commercial services by coupling supply and demand more efficiently, or generating demand. Profiling and targeting also mean that messages can be tailored to a specific target audience -something which can be great in case of emergencies (specific messages to older people, or to those already rescued, for instance), but problematic in politics.
RESPONSIBLE DATA SHARING
One of the approaches to responsible data sharing involves the use of anonymisation techniques, “sanitising” databases to remove personal traits from the data before it is shared. Anonymisation involves masking or removing information that could directly or indirectly identify individuals in such a way that information in the database does not enable re-identification, and cannot be used to learn new information about these individuals other than the information one has a priori.
While removing personal data from a dataset is an easy process, what is difficult is to eliminate personal data and to keep the utility of the dataset. Therefore, all anonymisation techniques strive to find an optimal balance between protecting privacy and enhancing security, on the one hand, and keeping the utility of the information in meaningful ways.
In our work with ODI, we have developed a series of documents to help organisations anonymise data. We will soon release:
Including academic departments and organisations that, like Eticas, can help you anonymise.
READING GUIDE and general introduction threat modelling and anonymisation methods and techniques
A project in collaboration with Open Data Institute