• norsk
    • English
  • English 
    • norsk
    • English
  • Login
View Item 
  •   Home
  • Fakultet for informasjonsteknologi og elektroteknikk (IE)
  • Institutt for datateknologi og informatikk
  • View Item
  •   Home
  • Fakultet for informasjonsteknologi og elektroteknikk (IE)
  • Institutt for datateknologi og informatikk
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Making substitution matrices metric

Anfinsen, Jarle
Master thesis
Thumbnail
View/Open
348104_FULLTEXT01.pdf (1.178Mb)
URI
http://hdl.handle.net/11250/250978
Date
2005
Metadata
Show full item record
Collections
  • Institutt for datateknologi og informatikk [3870]
Abstract
With the emergence and growth of large databases of information, efficient methods for storage and processing are becoming increasingly important. The existence of a metric distance measure between data entities enables efficient index structures to be applied when storing the data. Unfortunately, this is often not the case. Amino acid substitution matrices, which are used to estimate similarities between proteins, do not yield metric distance measures. Finding efficient methods for converting a non-metric matrix into a metric one is therefore highly desirable. In this work, the problem of finding such conversions is approached by embedding the data contained in the non-metric matrix into a metric space. The embedding is optimized according to a quality measure which takes the original data into account, and a distance matrix is then derived using the metric distance function of the space. More specifically, an evolutionary scheme is proposed for constructing such an embedding. The work shows how a coevolutionary algorithm can be used to find a spatial embedding and a metric distance function which try to preserve as much of the proximity structure of the non-metrix matrix as possible. The evolutionary scheme is compared to three existing embedding algorithms. Some modifications to the existing algorithms are proposed, with the purpose of handling the data in the non-metric matrix more efficiently. At a higher level, the strategy of deriving a metric distance function from a spatial embedding is compared to an existing algorithm which enforces metricity by manipulating the data in the non-metric matrix directly (the triangle fixing algorithm). The methods presented and compared are general in the sense that they can be applied in any case where a non-metric matrix must be converted into a metric one, regardless of how the data in the non-metric matrix was originally derived. The proposed methods are tested empirically on amino acid substitution matrices, and the derived metric matrices are used to search for similarity in a database of proteins. The results show that the embedding approach outperforms the triangle fixing approach when applied to matrices from the PAM family. Moreover, the evolutionary embedding algorithms perform best among the embedding algorithms. In the case of the PAM250 scoring matrix, a metric distance matrix is found which is more sensitive than the mPAM250 matrix presented in a recent paper. Possible advantages of choosing one method over another are shown to be unclear in the case of matrices from the BLOSUM family.
Publisher
Institutt for datateknikk og informasjonsvitenskap

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit
 

 

Browse

ArchiveCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDocument TypesJournalsThis CollectionBy Issue DateAuthorsTitlesSubjectsDocument TypesJournals

My Account

Login

Statistics

View Usage Statistics

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit