Improving Performance of Biomedical Information Retrieval using Document-Level Field Boosting and BM25F Weighting
MetadataShow full item record
Corpora of biomedical information typically contains large amounts of ambiguous data, as proteins and genes can be referred to by a number of different terms, making information retrieval difficult. This thesis investigates a number of methods attempting to increase precision and recall of searches within the biomedical domain, including using the BM25F model for scoring documents and using Named Entity Recognition (NER) to identify biomedical entities in the text. We have implemented a prototype for testing the approaches, and have found that by using a combination of several methods, including using three different NER models at once, a significant increase (up to 11.5%) in mean average precision (MAP) is observed over our baseline result.