Vis enkel innførsel

dc.contributor.advisorNytrø, Øysteinnb_NO
dc.contributor.authorOlafsen, Stiannb_NO
dc.date.accessioned2014-12-19T13:31:18Z
dc.date.available2014-12-19T13:31:18Z
dc.date.created2010-09-02nb_NO
dc.date.issued2008nb_NO
dc.identifier347084nb_NO
dc.identifierntnudaim:3583nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/250299
dc.description.abstractIn 2004, a lexicon-based deidentification tool was developed at The Norwegian EHR Research Centre (NSEP). The tool was never properly tested due to lack of proper and available data material. In 2007, an annotated data set consisting of genuine encounter notes from the Norwegian primary health care was created, which had features highly appropriate for deidentification performance analysis. This project was the result of those two works, in addition to the vision of taking deidentification one step further. Questions of importance were which types of data could be found in the data set and how did the lexicon tool handle them? Which changes or additions should be implemented to enhance the overall performance? To answer these questions we had to analyze the lexicon-based deidentification tool, the sensitive data, and the deidentified ouptut from different test runs. In order to interpret and quantify the results, we used true/false positives/negatives in addition to precision, recall and F-measure, which are standard metrics in the deidentification field. Our tool performed with an overall F-measure of 66 %. The annotated data set were found to consist of 5 292 instances of personal health information (PHI), distributed over eighteen different categories. When deidentifying with respect to individual PHI categories, large variations on performance were found, with the best ones resulting in recall values up to 91 %. We found that our lexicon-based deidentification tool could not compete with the results presented by comparision projects. However, to its defence our deidentification tool had to relate to a wider variety of PHI categories than the other tools, many of which it was not constructed to handle at all. Unless PHI ambiguity issues are handled more gracefully, and the local context is interpreted, we found that our pure lexicon-based approach would not be sufficient for handling all types of PHIs.nb_NO
dc.languageengnb_NO
dc.publisherInstitutt for datateknikk og informasjonsvitenskapnb_NO
dc.subjectntnudaimno_NO
dc.subjectMIT informatikkno_NO
dc.subjectKunstig intelligens og læringno_NO
dc.titleDeidentification of Electronic Patient Records: A Lexicon-based Approximationnb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber97nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO


Tilhørende fil(er)

Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel