De-identification of Norwegian Health Record Notes: An Experimental Approach

The conversion of paper-based health records to electronic health records creates new opportunities within medical research, medical education and patient treatment. However, electronic health records have to be de-identified or anonymized before disclosure, in order to conduct ethically sound research. Manual de-identification is time-consuming and costly, and thus limits the amount of health records that can be disclosed for research purposes. The aim of this project was to develop an application for automatic or semi-automatic de-identification for Norwegian free text clinical notes. As no directly related studies have been performed on Norwegian clinical notes, our approach was highly experimental. We have employed different methods and techniques. These have been evaluated in different combinations to find the best match. The method combination which obtained the best evaluation results constitutes our final de-identification application. The application we have developed is based on pattern matching techniques and a simple statistical method. It produces de-identified output which is evaluated against a manually annotated reference standard consisting of 225 clinical notes. Our best system configurations recognized 77% of the total 3320 sensitive identifiers, with a precision of 68%. Most of the insensitive contents remained intact with a fallout of 5%.

Utgiver

Institutt for datateknikk og informasjonsvitenskap