Vis enkel innførsel

dc.contributor.authorPorter, Kyle
dc.contributor.authorPetrovic, Slobodan
dc.date.accessioned2019-08-14T09:47:46Z
dc.date.available2019-08-14T09:47:46Z
dc.date.created2018-12-05T14:31:27Z
dc.date.issued2018
dc.identifier.citationIFIP Advances in Information and Communication Technology. 2018, 532 67-85.nb_NO
dc.identifier.issn1868-4238
dc.identifier.urihttp://hdl.handle.net/11250/2608254
dc.description.abstractFuzzy search is often used in digital forensic investigations to find words that are stringologically similar to a chosen keyword. However, a common complaint is the high rate of false positives in big data environments. This chapter describes the design and implementation of cedas, a novel constrained edit distance approximate string matching algorithm that provides complete control over the types and numbers of elementary edit operations considered in approximate matches. The unique flexibility of cedas facilitates fine-tuned control of precision-recall trade-offs. Specifically, searches can be constrained to the union of matches resulting from any exact edit combination of insertion, deletion and substitution operations performed on the search term. The flexibility is leveraged in experiments involving fuzzy searches of an inverted index of the Enron corpus, a large English email dataset, which reveal the specific edit operation constraints that should be applied to achieve valuable precision-recall trade-offs. The constraints that produce relatively high combinations of precision and recall are identified, along with the combinations of edit operations that cause precision to drop sharply and the combination of edit operation constraints that maximize recall without sacrificing precision substantially. These edit operation constraints are potentially valuable during the middle stages of a digital forensic investigation because precision has greater value in the early stages of an investigation while recall becomes more valuable in the later stages.nb_NO
dc.language.isoengnb_NO
dc.publisherSpringer Verlagnb_NO
dc.titleObtaining precision-recall trade-offs in fuzzy searches of large email corporanb_NO
dc.typeJournal articlenb_NO
dc.typePeer reviewednb_NO
dc.description.versionacceptedVersionnb_NO
dc.source.pagenumber67-85nb_NO
dc.source.volume532nb_NO
dc.source.journalIFIP Advances in Information and Communication Technologynb_NO
dc.identifier.doi10.1007/978-3-319-99277-8_5
dc.identifier.cristin1639504
dc.description.localcodeThis is a post-peer-review, pre-copyedit version of an article published in [IFIP Advances in Information and Communication Technology]. The final authenticated version is available online at: https://doi.org/10.1007/978-3-319-99277-8_5nb_NO
cristin.unitcode194,63,30,0
cristin.unitnameInstitutt for informasjonssikkerhet og kommunikasjonsteknologi
cristin.ispublishedtrue
cristin.fulltextpreprint
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel