Methods for the de-identification of electronic health records for genomic research

Table 1 Summary of de-identification methods for individual-level data

De-identification method	Techniques	Details
Masking (applied to direct identifiers)	Suppression/redaction	Direct identifiers are removed from the data or replaced with tags
	Random replacement/randomization	Direct identifiers are replaced with randomly chosen values (for example, for names and medical record numbers)
	Pseudonymization	Unique numbers that are not reversible replace direct identifiers
Generalization (applied to quasi-identifiers)	Hierarchy-based generalization	Generalization is based on a predefined hierarchy describing how precision on quasi-identifiers is reduced
	Cluster-based generalization	Individual transactions are empirically grouped or based on pre-defined utility policies
Suppression (applied to records flagged for suppression)	Casewise deletion	The full record is deleted
	Quasi-identifier deletion	Only the quasi-identifiers are deleted
	Local cell suppression	Optimization scheme is applied to the quasi-identifiers to suppress the fewest values but ensure a re-identification probability below the threshold

ISSN: 1756-994X