Automated Annotation of Events Related to Central Venous Catheterization in Norwegian Clinical Notes

Berg, Ingrid Andås

Berg, Ingrid Andås

Master thesis

Åpne

729999_COVER01.pdf (184.2Kb)

729999_FULLTEXT01.pdf (11.97Mb)

Permanent lenke

http://hdl.handle.net/11250/253686

Utgivelsesdato

2014

Metadata

Vis full innførsel

Samlinger

Institutt for datateknologi og informatikk [6788]

Sammendrag

Health personnel are required to use Electronic Health records for documentation and commu- nication. Clinical notes from such records contain valuable information, but unfortunately this is often in narrative form, making it difficult to retrieve and extract information from them. One such problem is to get an overview of the number of patient days for patients with central venous catheter (CVC). The risk of infections increase with an increasing number of patient days. The present study examines the utility of applying NER to extract CVC related events from clinical notes. No studies have previously examined this application for Norwegian Clinical notes. Conditional random fields are used to make models based on different feature sets. The feature sets are combinations of word window, stem, synonymous and International classification for Nursing Practice (ICNP) axis. A corpus manually annotated with CVC event types was used for training and testing different models using three-fold cross-validation. Sixteen different combinations of features were tested. A factorial analysis using the three cross-fold runs as blocks was conducted to determine which features had the greatest effect on performance. Word window, ICNP axis and an interaction effect between these were found to affect performance significantly. Stem had an effect on recall, whereas no such effect was found for precision. An interaction effect between synonymous and ICNP-axis was found to effect precision. Accumulative scores of the different label types gave a precision of 56.29 %, a recall of 39.4 % and a f-measure of 46.33 for the best feature combination. Overlapping labels, errors in corpus and manual annotation are sources of error in the study. Thus, further research is necessary to draw certain conclusions about the present findings.

Utgiver

Institutt for datateknikk og informasjonsvitenskap