Vis enkel innførsel

dc.contributor.advisorBratsberg, Svein Erik
dc.contributor.authorFrøseth, Erik
dc.date.accessioned2019-09-11T10:56:13Z
dc.date.created2015-05-15
dc.date.issued2015
dc.identifierntnudaim:11724
dc.identifier.urihttp://hdl.handle.net/11250/2615847
dc.description.abstractHistograms have a long history in the field of statistics, and have been used in the past few decades to capture the distribution of data in database management systems. In a database system, they can give the query optimizer information about whether the data is skewed, or if it follows a more uniform distribution. This will in turn help the query optimizer in selecting the best execution plan for a given query. This thesis looks at seven different histogram types, and looks at the accuracy and complexity of each of them when used for selectivity estimations. We have also implemented four of these in MySQL, and tested them on two different test suites. The results show that the optimizer benefits from better knowledge of the underlying data, as several of the queries tested runs many times faster. However, histograms are very often a trade-off between simplicity, accuracy and storage requirements. The more exotic histogram types are very often the ones with the highest accuracy, but are also the most difficult types to construct and maintain. The V-optimal histogram is a good example of this; it has overall one of the best accuracies, but the construction complexity makes it very difficult to create for large data sets. One histogram type does not work optimally for all data distributions. At a low number of buckets, the Serial histogram is often the one with the highest accuracy. But at a high number of buckets, it varies which histogram is the most accurate. It is thus important to find the correct histogram type for the specific use case, and pointing out a single histogram as a "winner" is very difficult.en
dc.languageeng
dc.publisherNTNU
dc.subjectInformatikk, Informasjonsforvaltningen
dc.titleEvaluating Histograms for Selectivity Estimationsen
dc.typeMaster thesisen
dc.source.pagenumber131
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi og elektroteknikk,Institutt for datateknologi og informatikknb_NO
dc.date.embargoenddate10000-01-01


Tilhørende fil(er)

Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel