Many organizations may be retaining personal data and it is important for this information to be properly protected and or anonymized. One method to ensure personal information is appropriately anonymized is through de-identification. This article will explain what de-identification is, how to go about de-identifying personal data, and why it is important. To start, a brief overview of what ePHI, PHI, and PII are, will be outlined.
What is ePHI/PHI/PII?
PHI and ePHI stand for Protected Health Information and Electronic Protected Health Information. PHI represents any personal health information about an individual.
PII stands for Personally Identifiable Information, and is personal information that is non-health related.
PHI is essentially the details of an individual’s past, present or future physical or mental health condition(s) while PII covers a broader set of personal data elements. Examples of PII data elements, per the Summary of the HIPAA Privacy Rule, include:
- All geographic subdivisions smaller than a state, including street address, city, county, precinct, ZIP code, and their equivalent geocodes, except for the initial three digits of the ZIP code if, according to the current publicly available data from the Bureau of the Census: The geographic unit formed by combining all ZIP codes with the same three initial digits contains more than 20,000 people; and The initial three digits of a ZIP code for all such geographic units containing 20,000 or fewer people is changed to 000
- All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older
- Telephone numbers
- Vehicle identifiers and serial numbers, including license plate numbers
- Fax numbers
- Device identifiers and serial numbers
- Email addresses
- Web Universal Resource Locators (URLs)
- Social security numbers
- Internet Protocol (IP) addresses
- Medical record numbers
- Biometric identifiers, including finger and voiceprints
- Health plan beneficiary numbers
- Full-face photographs and any comparable images
- Account numbers
- Any other unique identifying number, characteristic, or code, except as permitted under the Privacy Rule’s implementation specification allowing the assignment of a unique code to the set of de-identified health information to permit re-identification by the covered entity (re-identification implementation specification); and
- Certificate/license numbers
What is the De-Identification of PHI/PII?
Many organizations are under regulatory requirements to properly de-identify and protect personal data as it relates to employees, customers, vendors, etc.
De-identification of PHI or PII ensures personal data cannot be linked to an individual. De-identification is achieved by removing certain data elements from a data set so that the information could no longer be used to identify a specific individual. Some examples of data elements that would be removed to de-identify a data set of PHI would include, but is not limited to, the following:
- Driver License Number
- Phone Number
With that said, it is important to note that due to the sum of data elements that remain after de-identification, the identity of an individual may still be possible to discern. For example, say the name, address, SS#, and driver licenses are removed from a data set, but the following elements are retained: age, gender, height, weight, phone number, zip code, education, race, health status, etc. The combination of these data elements can still be used to accurately identify an individual. Therefore, any data elements not required to be maintained for the purposes of data processing should be stripped/removed to the minimum amount necessary. This is to ensure the identification of individuals cannot occur.
How Do You De-Identify Data?
According to the U.S. Department of Health and Human Services at HHS.gov the two methods of de-identification include:
- A formal determination by a qualified statistician
- As covered in the previous section, specific data elements are removed to ensure the identification of an individual cannot be made.
Why is it Important to De-Identify Personal Data?
Safeguarding PHI and ePHI is important to ensure privacy risks are mitigated. The de-identification of personal information mitigates privacy risks to individuals while also reducing the organization’s exposure to breach risk (e.g., reputational damage and remediation costs). Further, personal information should be retained for only as long as necessary to fulfill the stated purposes or as required by law or regulations.
For healthcare industry organizations, de-identification of patient data allows covered entities under HIPAA to share their patient data with other organizations—such as for medical research and comparative effectiveness studies. As noted in the previous sections, the de-identification of ePHI involves the removal of certain identifiers from patient data. Doing so detaches the identity of the patient from the patient data and effectively renders the health information no longer subject to HIPAA’s requirements. This can also reduce the cost of compliance by reducing the scope of HIPAA compliance assessments and audits.
If your organization is considering the de-identification of personal information, it is recommended to look at the HIPAA Privacy Rule’s standard for the de-identification of protected health information. This is found in Section 164.514(a) of the rule. Under this standard, health information is not deemed individually identifiable if it does not identify an individual.
It is in this standard that you will see that eighteen (18) specific identifiers of individuals (as listed in the first section above) or of relatives, employers, or household members of the individuals, must be removed from patient records to meet the HIPAA Privacy Rule’s requirements. Although this list originates from HIPAA, the information is useful to any organization seeking to reduce the business risks with maintaining any type of personal information, including data used for identity theft.
Ensuring that specific data elements are removed from personal data sets will help ensure that the personal information retained does not allow for the identification of an individual to occur. In short, the de-identification of personal information is a very important component of protecting PII and mitigating privacy risks.
Olivia Refile (CISSP, CISA, CRISC, GSEC, ISO lead Auditor) specializes in SOC examinations for Linford & Co., LLP. She completed her Bachelors of Business Administration, with a concentration in Management Information Systems from Temple University’s Fox School of Business in 2010. Olivia started her career in IT Risk Management in 2010 specializing in internal, external audits as well as IT security risk assessments. Following her time in risk management Olivia moved solely into external IT Audit and is currently dedicated to performing SOC 1 and SOC 2 examinations.