Data Identifiability and Security Guidance

The UHSRC is charged with ensuring that the confidentiality of all human subject data is adequately protected. This means that certain types of data (e.g., sensitive data that, upon release, could harm the subject) require stricter protections than other types of data. 


The level of data security required depends on two factors: 

1. Identifiability of the data; and

2. Sensitivity of the data.


Because both the potential harm and the identifiability of the data depend on the data being collected and on the sample characteristics, the UHSRC makes data protection and confidentiality judgments on a case-by-case basis.



The following concerns are used in these judgments:

  • Identifiability of Data

Data can be directly identifiable, indirectly identifiable/coded, or anonymous/de-identified

Directly identifiable data contain values or variables that, on their own, identify subjects. Direct identifiers include name, social security number, video recordings, voice recordings, EMU EID number, email address, home address, phone number, medical record number, etc.

Data can be indirectly identifiable in two ways:

1. Data can be coded, which means that, while the data set does not contain any direct identifiers, the data set contains a study ID number that is linked, in a separate file, to direct identifiers. The study ID serves as a code for re-identifying subjects via the key (i.e., a separate file linking study ID numbers with direct identifiers).

2. Data can be indirectly identifiable, which means that the data set does not contain direct identifiers, however individual values, fields, or variables within the data set can be combined to re-identify subjects. For example, a data set containing the fields job title and employer could be indirectly identifiable for certain samples.

Data are considered anonymous if there is no way for a reasonable person to identify subjects, either through direct identifiers, study ID codes linked to direct identifiers, or by combining fields in a data set. De-identified is a term specific to Protected Health Information that refers to the removal of 18 specific identifiers mentioned in the Health Information Portability and Accountability Act (HIPAA). 


  • Data Sensitivity

Sensitive data refers to the probability of harm caused by release or breach of confidentiality of the data. Sensitive data may include information about, but not limited to:

1. Illegal or criminal activity

2. Drug use

3. Sexual behavior or sexuality

4. Job or employment satisfaction

5. Socially-unacceptable or socially-sanctioned activities

6. Documentation status

7. Political opinion


The UHSRC uses a reasonable person standard in determining whether data are sensitive. Additionally, what might be sensitive for one sample might not be for another sample.


For specific questions about data confidentiality, email or call 734-487-3090.

