How PI and PII Is Managed Between Multiple Geographical Regions
- Ruwan Abeykoon
- Engineer - WSO2
Photo by olia danilevich
A walkthrough of PI and PII
Personal information (PI) is an individual's digital information, including but not limited to their:
- Health and financial records
- Purchasing habits
A subset of this information can become personally identifiable information (PII) when it can be used to locate someone with a high degree of accuracy. Some examples are:
- Home address
- IP address and time of use
- Government issued identification (SSN, drivers license, passport number, etc.)
Why are people concerned with protecting PI and PII?
Most people in the industrialized world are connected to the internet in one way or another. They may not access the internet directly, but their PI and PII are captured through various interactions with other people and government/private entities.
Each of these interactions capture some form of PI. For example, your name may be captured by your favorite coffee shop, or your phone number and location may be captured by your taxi hailing company. These details are captured to serve you and do not have any malicious intent.
There has been a considerable outcry from people who believe that some global platforms are able to capture your PI and PII data, and use that information to influence you in a way that may not be in your best interest. Some examples of these are:
- An advertising platform suggesting advertisements of products or services based on your profile information
- Influencing your vote for a political party
- A sales person calling to sell you a product that you did an internet search on
The concern about PI and PII applies to an identifiable natural person only. It does not apply to legal persons such as incorporated companies, government institutions, clubs, etc., whose primary reason for existence is to provide a boundary between natural persons and the outside world in a controlled way.
What are the concerns
Where your personal data is stored is regulated by certain legislations (such as the GDPR in Europe). The underlying concept is to keep the data of a natural person within the geographical boundary of each legal domain. The data can be accessed in a controlled manner, but should not be stored outside of the region.
Data in flight
Data that is accessed within the region or provided outside the region is also governed based on absolute need. For example, a system should not be able to access a person’s phone number unless there is a need to call or SMS the individual, and the need should benefit that individual.
How WSO2 Identity Server can help
WSO2 Identity Server has three methods of logical data separation:
- User store - this is where a user’s profile information is kept and where almost all data related to PI and PII is stored.
- Transactional data - this includes the sessions created, and the tokens created on behalf of the user.
- Logs, analytics and reporting data - this includes access logs, analytics-related data, and data used to generate reports. These are forms of logs where the person’s interactions with the system are tracked and audited.
A user store is the location where all the authoritative information about the user profile is held. A user store can be located in the respective geographical location to honor the data residency regulations.
Transactional data and tokens (access token and identity token), may have some PI and PII information required by the application. However, the retention time of the data only spans the validity of the token itself. An identity provider (IdP) typically flushes out all the expired and revoked tokens. IdPs will keep these tokens in secure and fairly anonymized records, so that it is not computationally feasible to search for a token that is allocated to a user. This means that there is very little concern on the data.
On the flip side, as the token is supposed to be used by applications and APIs, the respective application or API can choose to cache or store the token for further processing. The nature of the application may require PI and PII to be present at the token for its business purposes. It is evident that appropriate safeguards need to be employed in the application and API in order to fully safeguard the business’s PI and PII. This may be done by deploying the application or API replicas in each geographical region.
These are the types of data that falls in this category:
- Access token/identity token
- Authorization code
- Session data (SSO session, login session, login flow session)
- One-time password (OTP)
- Backup code
- Recovery code
Logs, analytics and reporting data
The user’s activities need to be logged, monitored, and analyzed for various business purposes:
- Auditing - to track down who performs which activity at what time. Auditing is typically done for administrative reasons and required mosty for legal purposes. Those who perform the administrative actions are typically employees of an organization, and are bound by contracts. As such, there is less concern for PI and PII for these types of data.
- Analytics data - to see the growth, traffic patterns, etc. for planning purposes. Here, all the user’s interactions need to be tracked. However, there is no need to track PI and PII for analytical purposes, as the requirement is met by having anonymized user information and statistics on activities. IP addresses can be processed only to extract region by city, and not to pinpoint the street, house or real IP information.
Reporting data - reporting is a gray area as it falls in between analytical data collection and audit data collection. Some organizations require certain actions of the end users to be recorded, stored, and reported for a longer time. It is evident that not every action needs to be reported, but only a few selected sensitive actions due to certain regulations. Hence, the reporting is highly business dependent and each organization has to decide what kind of activity they have to monitor. All other activities are better considered as audit and analytics data. For example, finance Institutions may be legally required to formulate a report of every customer transaction for the regulatory authorities and it is not necessary to keep a record of any other activity of the user.
Data protection with WSO2 Identity Server
There are two patterns of data protection that WSO2 Identity Server employs.
- Protection of data at rest
- Protection of data at rest and in flight
Protection of data at rest
Protection of data at rest is the simplest mechanism, where each user store can be held in the respective geographical location.
This pattern is simplest and inexpensive to implement. A user store having users under each regulation can be stored in a data center that complies with the respective regulations. The user's information can be stored only in the respective user store.
Only user store data is held in the respective geographical region on this pattern. Transactional data such as “token”, “session”, etc. will be stored outside the geographical region. Analytics data and logs may also be stored outside the region. The primary drive for this pattern is the cost-effective deployment, while also maintaining the required level of regulatory compliance.
Protection at rest and in flight
This is an extension of data at rest pattern where there is an isolation on the user store, tokens, transactions, and logs within each region.
The addition to this pattern from the “data at rest” pattern, is that all the data is confined to the relevant regional boundaries, even for the data in flight. Two main critical components have to be employed to achieve this pattern.
- Global service which can route the regional traffic to each region. This can be achieved with:
a. Having regional host name records on DNS where the client application is aware of which region they are connected to. The host name is visible on the URL hostname part.
b. Having an intelligent redirect or forwarding mechanism on the global router, where a single global URL is dispatched to a relevant region, based on IP, cookie, a token, or any sort of identifier that can be directly associated with a regional cluster.
- Regional deployed clusters of WSO2 Identity Server, and the respective data sources/data sinks. Here, there is no data being shared across regions. Analytical and reporting data, and logs can be accessed by navigating each regional cluster only. One can have an aggregator service for analytical or reporting data, to a central location in order to satisfy the business need to have a global single dashboard for the entire estate.
However, designing a single dashboard has to be carefully evaluated based on need for data protection and business insight, which usually conflicts with each other. Designing a single dashboard is outside of the scope of WSO2 identity server. Hence, you can select purposely-built reporting solutions for this purpose.
There are multiple ways to handle regional data protection on PI and PII information with WSO2 Identity Server. It is becoming increasingly expensive to implement a full isolation of PI and PII across regulatory regions. Businesses have to select the best strategy based on budget, expertise, available resources, and the regulations.
This post describes the concepts of how you can achieve PI and PII isolation to varying degrees in WSO2 Identity Server. However, this could also be generalized to any IdP or any application which handles PI and PII.
We encourage you to find out more about WSO2 Identity Server by clicking these links.
- Working with Databases - Explains the runtime databases that are used.
- Configuring User Store - Shows how user stores are organized and configured.
- Data Dictionary - Describes the data tables used.