Deutsch Intern
    Chair of Computer Science VI - Artificial Intelligence and Applied Computer Science

    Anonymisierung von Textdaten

    Anonymization of Medical Documents

    Medical documents like discharge letters and other reports contain a lot of critical information regarding the identity of a single patient. A single fact like a name or an address is called a Patient Health Information (PHI). It’s clear that all of these PHIs must be removed from all medical documents before they can be used for scientific work.

    To do so, we use a full automatic 2-step anonymization process. First, we use heuristics to remove PHIs that can be found syntactically, like email addresses, dates and phone numbers. Especially for german discharge letters we use rules (in form of regular expressions) that identify whole blocks of PHIs typically found at the beginning and the end of these documents (e.g. “Sehr geehrter Kollege, wir stellen Ihnen den Patienten Max Mustermann, geb. 12.12.2012, wohnhaft in 97074 Würzburg vor”).  To find names of patients and doctors we use a combination of wordlists and rules that match even complex PHIs like “die Ex-Stief-Mutter Emma des Patienten” which often contain relations between patients and other persons.

    In the second step we use the data of the KIS directly to remove all already known patient related data like names, birthdays, phone and address data. For this we implemented an anonymization services that gets called during the export process. Because of this service no identifying patient data must be stored in files or databases persistently, which increases the security of the system.

    The system is based on plain Java, so it’s very fast. We can anonymize a dataset of 1.9 million discharge letters in about 5h. That means in the case of anonymization errors we can fix them very fast and compute the overall dataset anew very quick.

    Example

    Normal Text

    Sehr geehrter Herr Kollege,

    wir berichten Ihnen über den Patienten Max Mustermann, geb. 12.12.2012, wohnhaft in 99999 Musterstadt, der sich am 24.12.2017 in unserer Praxis vorstellte.

    Herr Mustermann wurde von seiner Schwester Erika in unsere Praxis eingeliefert und klagte über schwere Bauchschmerzen.

    ...

    Wir überlassen den Patienten in ihre weitere Behandlung und verleiben mit freundlichen Grüßen,

    Dr. Med. Leber Wurst

    Anonymized Text

    Sehr geehrter Herr Kollege,

    @@@BEGIN@@@

    @@@NAME@@@ wurde von seiner Schwester @@@NAME@@@ in unsere Praxis eingeliefert und klagte über schwere Bauchschmerzen.

    ...

    Wir überlassen den Patienten in ihre weitere Behandlung und verbleiben mit freundlichen Grüßen,

    @@@DOCTOR@@@