Optimal Information Retrieval Model for Molecular Biology Information
MetadataShow full item record
Search engines for biological information are not a new technology. Since the 1960s computers have emerged as an important tool for biologists. Online Mendelian Inheritance in Man (OMIM) is a comprehensive catalogue containing approximately 14 000 records with information about human genes and genetic disorders. An approach called Latent Semantic Indexing (LSI) was introduced in 1990 that is based on Singular Value Decomposition (SVD). This approach improved the information retrieval and reduced the storage requirements. This thesis applies LSI on the collection of OMIM records. To further improve the retrieval effectiveness and efficiency, the author propose a clustering method based on the standard k-means algorithm, called Two step k-means. Both the standard k-means and the Two step k-means algorithms are tested and compared with each other.