Privacy By Design as a System Design Strategy: Part 1
- Sagara Gunathunga
- Head of IAM DevRel - WSO2
We live in a global village today. For instance, raw materials of a product may originate from several countries, the product may be designed in one country but assembled in another country, and sold all over the world. Banking is an ideal example of the present day global reality. Data sharing practices in the banking industry used to be very conservative and restrictive, but they have significantly changed now. Open banking initiatives in Europe, UK, and Australia aim to share and open banking data with other banks and institutions.
Businesses are required to compete on a global scale. The way you treat your customers and your strategy to retain customers are some of the essential ingredients for business success.
Translating this into technical terms, you need to have mechanisms to onboard new users, reach prospects, and gain insights into existing customers. The key requirements, common for all these capabilities, are acquiring and retaining personal data and behaviors of individuals within your business. Higher the volume of data collected and processed, higher the chances of reaching out to customers and retaining them. We already have very sophisticated hardware and software systems that can store and process trillions of data records effectively. Such circumstances are ideal to flourish in the business world.
However, this is only one side of the coin. What are the possible impacts on individuals when enterprises store and process their data without any limits or regulations? This has become one of the most important global issues in the contemporary era. Several regulations have been introduced to address these concerns by authorities all over the world, protecting data and giving greater control to individuals regarding the use of their personal and sensitive data.
Some significant privacy regulations include:
- GDPR: General Data Protection Regulation in Europe
- CCPA: California Consumer Privacy Act in California, USA
- DPA: Data Protection Act 2018 in the UK
- Privacy Act 1988 (with recent amendments) in Australia
- POPI: Protection of Personal Information in South Africa
- LGPD: General Data Protection Law in Brazil
- PIPEDA: Personal Information Protection and Electronic Documents Act in Canada
- PIPA: Personal Information Protection Act in Japan
If your business already operates in the above-mentioned regions or you have plans to expand to those areas, complying with the privacy regulation of the particular area is compulsory. This involves many time consuming and costly activities such as introducing or upgrading new hardware, introducing and modifying new software systems, recruiting new staff to cater to privacy-related matters, staff training, etc. However, each time a business expands to a new region, these activities have to be repeated to comply with each regional regulation. Given this scenario, how does a business overcome these issues? Adopting the set of principles known as Privacy By Design (PbD) is an ideal solution here, establishing a set of guidelines and principles for system designs which optimizes the overhead of supporting each and every privacy regulation.
PbD is a framework consisting of 7 principles, as defined by Dr. Ann Cavoukia, the pioneer of establishing this framework:
- Proactive not reactive, preventative not remedial: Anticipate, identify, and prevent privacy invasive events before they occur
- Privacy as the default setting: Build-in the maximum degree of privacy into the default settings for any system or business practice
- Privacy embedded into the design: Embed privacy settings into the design and architecture of information technology systems and business practices instead of implementing them as add-ons
- Full functionality, positive-sum not zero-sum: Create a balance between privacy and security because it is possible to have both
- End-to-end security, full lifecycle protection: Embed strong security measures to the complete lifecycle of data to ensure the secure management of the information from beginning to end
- Visibility and transparency, keep it open: Assure stakeholders that privacy standards are open and transparent
- Respect for user privacy, keep it user-centric: Protect the interests of users by offering strong privacy defaults, appropriate notice, and empowering user-friendly options
Understanding these principles is a mandatory exercise for anyone involved with software designs. Here are some strategies that you can use to implement the above principles in practice.
Understanding and Identifying Personal Data (PII)
A simple yet important task is to identify the required Personally Identifiable Information (PII) to carry out your business process. You should be able to differentiate PII data from other business data because you are going to handle this data through different mechanisms. By definition, PII means any piece of information that can be used to identify an individual uniquely. However, you need to evaluate the above definition based on an underline context. For example, a phone number can be considered a PII if it’s associated with a particular individual but not if it’s associated with an organization.
The PII definition varies from one regulation to another but they all basically adhere to the above core definition. For example, GDPR defines PII as any information relating to an identified or identifiable natural person (data subject). An identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural, or social identity of that natural person. CCPA defines personal data as any information that identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer, or household which includes identifiers such as any unique personal identifier, IP address, electronic network activity information, including, browser and search history, audio, electronic, visual, thermal, and olfactory information and geolocation data.
Separating Personal Data Handling from Other Data
Once you identify PII data required for the business, the next step would be designing a system in such a way that a single component is responsible for storing and processing PII data, and all other components contact this particular component whenever a PII data requirement arises.
Let’s consider the following system which consists of 4 components. Each of these components is involved in PII data storing and processing in some capacity. For instance, sales app contains PII data from customers and sales team members and HR app contains PII data of all the employees.
The obvious problem here is that when you have multiple components that process PII data you need to evaluate all of these components when you prepare for particular privacy regulation and it multiplies cost and time factors. In contrast to that, when you have only one component which stores and handles all PII data, you only have to worry about that component in terms of privacy regulatory requirements. Additionally, it also enhances the chance of data breaches as well.
The following diagram depicts the same system after moving PII data into a separate component.
An overview of the advantages granted by this design principle includes:
- Reduces development and maintenance cost
- Reduces development and maintenance time
- Reduces system complexity
- Reduces the chance of personal data breaches
- Adapts to future expansions easily
- Improves maintainability
- Eases system and security audits
Anonymization and Pseudonymization of Personal Data
Although there are some common usage similarities, anonymization and pseudonymization are two completely different techniques that can be used in system designs. Pseudonymization is the process of removing or replacing personally identifiable information from data sets by using artificial identifiers. After pseudonymization, a particular individual is no longer identifiable without the use of additional information.
There are several usages of pseudonymization related to this discussion:
- Correlate PII data at inter-components level: For example, in the previously discussed system with 4 applications and a common PII data component, the applications can maintain pseudo-identifiers as user-ids instead of storing any PII data within their systems. Whenever the above business applications require actual PII data for business processes, they could request for such data from the PII data component by providing a pseudo identifier.
- Correlate PII data within the intra-elements of a component Even within one component, it’s an unnecessary risk to duplicate and scatter PII data in several places such as multiple tables. Instead, the best practice is to maintain PII data in a single or few tables and other tables use pseudo-identifiers to refer to those PII data.
Anonymization is the process of removing personally identifiable information from data sets. After anonymization, an individual is no longer identifiable based on the remaining data within the system.
There are a number of practical use of anonymization in system designs:
- It is not required to maintain records of the actual individual for some business data. For example, to identify sales patterns of a specific shoe brand you don’t need the actual name and contact details of the buyer in such cases it is possible to apply anonymization.
- Anonymization can be also used in a further step of pseudonymization depending on the context. For example, if a customer has asked to remove his PII data along with references, you may apply anonymization.
Security Measures for PII Storage
There are a number of aspects to consider when planning storage and hardware infrastructure of PII data. Some of the most important points are given below:
- Apply hashing techniques whenever possible instead of storing sensitive PII data. For example, instead of storing a credit card number you can just store the hash value which is derived from the card number.
- Apply software level encryption whenever required, also make sure to use encrypted transmission links such as SSL.
- Use hardware storage which supports hardware-level encryption mechanism wherever possible.
- Establish proper governance and access policies to access these storage devices.
PII Repository Design
When it comes to PII repository design, you need to consider the data retention period and retention policies as the first-class requirement. That is, you should not think data stored in the system resides there forever unless someone specifically removes them. Instead, you should define a lifetime and retention policy for each data record type. You also need to develop the necessary tools and procedures to remove data that exceeds their lifetime or once the retention policy becomes invalid.
In addition to that, you need to design some audit tools within the system itself to identify privacy violations and correct them. For example, after the removal of customer data based on his request, the system should be capable of conducting audit trails and generating the necessary reports.
Conclusion
In this article, we discussed opportunities and challenges that are common for any enterprise that deals with the privacy concerns of individuals. Instead of considering support for each privacy regulation as an orthogonal effort that greatly affects cost and time, it is possible to view these concerns in a holistic manner by applying PbD principles, optimizing time and costs whilst supporting each and every privacy regulation. The second part of this article series will explore the following points which are also significant in software designs. These include on-demand PII data sharing, transparency, consent management, enabling customer rights, and strong authentication.