With the ever-increasing shift of focus to data-driven practices, the legal obligations around sharing data are a common barrier. Data protection is an important aspect of data security, with the rights of the users within a dataset needing to be maintained and protected. By following the appropriate data protection principles, the risk of harm to individuals and even groups or communities can be avoided. One such way of protecting data is by anonymisation.
Anonymisation of data aligns with the 8th principle for digital development to address privacy and security. The aims of this Principle are to protect confidential information and the identities of individuals, which might include staff, collaborators or other data subjects. Protecting people’s personal data helps to prevent them from being the target of fraud, scams or identity theft. In turn, this protects the organisation from reputational damage and financial loss. Data security is also necessary by law and helps to protect valuable data assets from being accessed by 3rd parties.
Below are a series of guidelines and recommendations developed by the Open Data Institute for effectively anonymising data, in compliance with national and international data standards and security:
Determine the Lawful and Ethical Foundation
Data protection regulations are designed to allow personal data to be collected, stored and processed all while reducing the risk to personal security and other negative impacts. The definition of personal data can vary across countries but in general, this relates to any data that is directly linked to an individual that can identify them.
At the beginning stages of anonymisation, it is important to consider the wider implications of collecting and sharing data. You should ask important questions to identify and understand the limitations and biases that may be present in the data or collection methodology, so to promote efficient data security practices.
All methods for anonymisation should be carried out in a way that maintains as much of the original dataset value as possible but whilst protecting the privacy of users captured in the data. Use cases are a good way of capturing various user processes and should be used to create a series of key objectives to maintain the core goals of the activity.
Data protection does not directly require anonymisation to remove all risks in a given dataset, however, you must mitigate the risk of individual identification. It is also important to understand the difference between personal and non-personal data, for example, farm location information is non-personal data as it is relevant to a farm, not an individual. Therefore, you can retain location information in certain situations whilst complying with data protection legislation. There may still be risks to collecting, storing and sharing information like this, as such it can be classed as sensitive data and should undergo similar anonymisation practices if the end goal of the dataset is to be shared or open access.
There are various types of anonymisation that can be carried out, the specific methodology is often deduced by the type of data being used. The majority of the time multiple techniques are used to reduce the risk of identification. Some of the most popular anonymisation methodologies are outlined below:
- Generalisation – this approach consists of diluting the recognisable features of a dataset by altering the scale or order of a field e.g. changing city information to regional.
- Randomisation – adding ‘noise’ to a dataset or reordering values. It is important to ensure that the overall patterns and insights of the data are maintained.
- Pseudonymisation – this is not a type of anonymisation but is a useful security measure to consider. Using pseudonyms to replace identifiers can reduce the risk, however with enough information from the wider dataset indirect identification may be possible.
Successful anonymisation is completed once the risk of reidentification is sufficiently low. Resilience testing is used to determine the level of risk left.
A plan of action should be written following successful anonymisation so that if any issues or negative impacts were to come up, the risks to users can be avoided.
Finally, after all of the above steps have been actioned, when the dataset is ready to be published you should also include the risk assessment and anonymisation practices used.
Here are some helpful resources to provide you with more information:
Related News & Blogs
Claire Curry is the Global Team Leader for Specific Objective 1 of CABI’s PlantwisePlus programme, focusing on knowledge delivery to agricultural service providers through digital tools. Claire is a member of the Digital Development team and a ‘Digital…
28 July 2021