Skip to main content

Table 1 Summary of de-identification methods for individual-level data

From: Methods for the de-identification of electronic health records for genomic research

De-identification method Techniques Details
Masking (applied to direct identifiers) Suppression/redaction Direct identifiers are removed from the data or replaced with tags
  Random replacement/randomization Direct identifiers are replaced with randomly chosen values (for example, for names and medical record numbers)
  Pseudonymization Unique numbers that are not reversible replace direct identifiers
Generalization (applied to quasi-identifiers) Hierarchy-based generalization Generalization is based on a predefined hierarchy describing how precision on quasi-identifiers is reduced
  Cluster-based generalization Individual transactions are empirically grouped or based on pre-defined utility policies
Suppression (applied to records flagged for suppression) Casewise deletion The full record is deleted
  Quasi-identifier deletion Only the quasi-identifiers are deleted
  Local cell suppression Optimization scheme is applied to the quasi-identifiers to suppress the fewest values but ensure a re-identification probability below the threshold