Plantwise Blog

Anonimysation is an important aspect to data security and protects individuals and communities from potential risk or harm (Image by ThisIsEngineering, Pexels)

With the ever-increasing shift of focus to data-driven practices, the legal obligations around sharing data are a common barrier. Data protection is an important aspect of data security, with the rights of the users within a dataset needing to be maintained and protected. By following the appropriate data protection principles, the risk of harm to individuals and even groups or communities can be avoided. One such way of protecting data is by anonymisation.

Anonymisation of data aligns with the 8th principle for digital development to address privacy and security. The aims of this Principle are to protect confidential information and the identities of individuals, which might include staff, collaborators or other data subjects. Protecting people’s personal data helps to prevent them from being the target of fraud, scams or identity theft. In turn, this protects the organisation from reputational damage and financial loss.  Data security is also necessary by law and helps to protect valuable data assets from being accessed by 3rd parties.

Below are a series of guidelines and recommendations developed by the Open Data Institute for effectively anonymising data, in compliance with national and international data standards and security:

Determine the Lawful and Ethical Foundation

Data protection regulations are designed to allow personal data to be collected, stored and processed all while reducing the risk to personal security and other negative impacts. The definition of personal data can vary across countries but in general, this relates to any data that is directly linked to an individual that can identify them.

At the beginning stages of anonymisation, it is important to consider the wider implications of collecting and sharing data. You should ask important questions to identify and understand the limitations and biases that may be present in the data or collection methodology, so to promote efficient data security practices.

Objectives

All methods for anonymisation should be carried out in a way that maintains as much of the original dataset value as possible but whilst protecting the privacy of users captured in the data. Use cases are a good way of capturing various user processes and should be used to create a series of key objectives to maintain the core goals of the activity.

Personal information is collected everywhere, on the apps, websites and devices we use. How this information is handled and processed is extremely important and poses a serious risk to individuals (Image by Scott Webb, Pexels)

Assess Risks

Data protection does not directly require anonymisation to remove all risks in a given dataset, however, you must mitigate the risk of individual identification. It is also important to understand the difference between personal and non-personal data, for example, farm location information is non-personal data as it is relevant to a farm, not an individual. Therefore, you can retain location information in certain situations whilst complying with data protection legislation. There may still be risks to collecting, storing and sharing information like this, as such it can be classed as sensitive data and should undergo similar anonymisation practices if the end goal of the dataset is to be shared or open access.

Anonymisation

There are various types of anonymisation that can be carried out, the specific methodology is often deduced by the type of data being used. The majority of the time multiple techniques are used to reduce the risk of identification. Some of the most popular anonymisation methodologies are outlined below:

  • Generalisation – this approach consists of diluting the recognisable features of a dataset by altering the scale or order of a field e.g. changing city information to regional.
  • Randomisation – adding ‘noise’ to a dataset or reordering values. It is important to ensure that the overall patterns and insights of the data are maintained.
  • Pseudonymisation – this is not a type of anonymisation but is a useful security measure to consider. Using pseudonyms to replace identifiers can reduce the risk, however with enough information from the wider dataset indirect identification may be possible.

Resilience Testing

Successful anonymisation is completed once the risk of reidentification is sufficiently low. Resilience testing is used to determine the level of risk left.

Plan Ahead

A plan of action should be written following successful anonymisation so that if any issues or negative impacts were to come up, the risks to users can be avoided.

Publishing

Finally, after all of the above steps have been actioned, when the dataset is ready to be published you should also include the risk assessment and anonymisation practices used.

Here are some helpful resources to provide you with more information:

Leave a Reply

Related News & Blogs

PlantwisePlus Toolkit: Applying Digital Development Principles to real life

Claire Curry is the Global Team Leader for Specific Objective 1 of CABI’s PlantwisePlus programme, focusing on knowledge delivery to agricultural service providers through digital tools. Claire is a member of the Digital Development team and a ‘Digital…

28 July 2021