You are here

Privacy Preserving Data Mining in Electronic Health Records

THeCS (Trusted Healthcare Services)

Objectives: Electronic health records (EHRs) are very valuable for medical research and clinical trials. Researchers need EHRs to perform clinical trials on new medicines, for example. However, EHRs contain very sensitive data and should not reveal the identity of the patient. Therefore EHRs must be anonymized before they are released to the clinical investigators. The existing anonymization techniques are not sufficient for protecting the privacy of the patients’ data. The problem with techniques like k-anonymity and l-diversity is that they have been shown to be insecure; the anonymized data can easily be de-anonymized. Moreover, there is some health data which cannot be anonymized at all, such as DNA and dental data.

The goal of this work package is to propose new techniques which will enable us to build fundamentally novel solutions. In particular, we will propose techniques for search in encrypted data that would allow the investigators to access EHRs for medical research or clinical trials, while preserving the patients’ privacy. Our ambition is to go even further building algorithms for privacy preserving data mining, which will allow extraction of knowledge.

The system consists of patients who get treatment from a healthcare provider, the healthcare provider who treats the patients and collects medical data, the server which stores the EHRs and the investigators who use EHRs for clinical trials or medical research. One optional entity is a sponsor (e.g. pharmaceutical company) who finances the medical research or clinical trial. To assure the privacy of the patients’ data, EHRs are encrypted and then stored on the server. The system must provide mechanisms which allow the investigators to search in the database with EHRs in order to extract patterns from data sets and deduce knowledge from those patterns.

WP Leader: