Data Anonymisation: Managing Personal Data Protection Risk
Data Anonymisation generally refers to the process of removing identifying information such that the remaining data does not identify any particular individual. This is an important step to render the resultant data, which is no longer personal data, suitable for use in research and data mining. Such data analytics can bring greater value to different aspects of our lives, from improving transportation and healthcare services to enhancing public safety.
“There is often greater value inMr Zack Bana,
aggregating data instead of looking at
specific data points.”
Co-Founder and Data Protection Officer
of Beacon Consulting
Data Anonymisation has numerous ways to go about data anonymisation. In the birthday poser, Cheryl says her birthday is a secret, but gives Albert and Bernard two separate sets of clues as to when it might be. Albert is told it could be any one of four months. Bernard is told it could be any one of 10 days, of which only two occur uniquely. This is an example of data reduction where some values are removed from a data set and, it is usually done because those values are not required.
Why Data Anonymisation?
Data Anonymisation of personal data is carried out to render the resultant data suitable for more uses than its original state would permit under data protection regimes. For example, Data Anonymisation may be used for research and data mining where personal identifiers in the data are unnecessary or undesired. Data Anonymisation could also be a protection measure against inadvertent disclosures and security breaches.
Limitations and Challenges
There are often conflicting needs for anonymity and data integrity. Stripping data of too many identifiers may not preserve the usefulness of the data, or might deny potential uses for the data. Data Anonymisation for specific purposes might not be useful for others because its functionality is reduced.
To manage re-identification risks, organisations should consider if the entities receiving the Data Anonymisation are likely to possess or have access to information that can inadvertently lead to re-identification. Personal knowledge is also an important factor in assessing re-identification risks, as the people who are close to an individual, such as an individual’s friends or relatives, will possess unique personal knowledge about the individual. Although this personal knowledge will make it easier for an individual to be identified from an Data Anonymisation by his or her friends than a stranger, it is unlikely to amount to high re-identification risks for the Data Anonymisation.
Data Anonymisation should also be properly safeguarded from unintended recipients, whether they are within or outside the organisation.
Besides Data Anonymisation, other practices that organisations can adopt to minimise the risks of reidentification include:
- Impose additional enforceable restrictions on the use and subsequent disclosure of the data
- Implement processes, including access restrictions, to govern proper use of the Data Anonymisation
- Implement processes and measures for the destruction of data as soon as they no longer serve any business or legal purpose.
Data Anonymisation techniques
The following is a non-exhaustive list of commonly used Data Anonymisation techniques, and examples of how each technique can be used.
- Pseudonymisation: replacing personal identifiers with other references. For example, replacing an individual’s name with a tag or reference number, which is randomly generated.
- Aggregation: displaying values as totals, so that none of the individual values which could identify an individual is shown.
- Replacement: replacing values or a subset of the values with a computed average or a number derived from the values
- Data suppression: removing values that are not required for the purpose. For example, removing ‘ethnicity’ from a dataset of individuals’ attributes.
- Data recoding or generalisation: banding or grouping of categories into broader categories
- Data shuffling: mixing up or replacing values with those of the same type so that the information looks similar but is unrelated to the actual details.
- Masking: removing certain details while preserving the look and feel of the data.
Data Anonymisation in a World of Big Data
Data Anonymisation remains a key tenet in personal data protection, safeguarding individual identities while allowing organisations to use data to gain valuable insights in more ways than would have been permitted under data protection regimes. As businesses embrace big data and use data analytics to do more sense-making and predictive analysis to extract insights to serve customers better and find new growth opportunities, anonymisation with robust re-identification assessment and risk management will be important to allow firms to optimally extract value from data and, at the same time, safeguard personal data.
Also read: 9 Policies For Security Procedures Examples