Selectivity Estimation for JSON Data in MySQL

Engstad, Jonatan Sissener

dc.contributor.advisor	Ryeng, Norvald H.
dc.contributor.author	Engstad, Jonatan Sissener
dc.date.accessioned	2023-09-07T17:24:22Z
dc.date.available	2023-09-07T17:24:22Z
dc.date.issued	2023
dc.identifier	no.ntnu:inspera:142737689:30013471
dc.identifier.uri	https://hdl.handle.net/11250/3088036
dc.description.abstract	Databaser er en av grunnsteinene i vår moderne digitaliserte virkelighet. Deres oppgave er ikke bare å lagre data, men også å servere data og svare på spørring -- og dette på mest mulig effektivt vis. De siste to tiårene har et nytt format for datakommunikasjon og lagring, JSON ("JavaScript Object Notation") hatt sitt inntog i databaseverden og mange av de store relasjonelle databaseaktørene har de siste årene lagt til støtte for dette formatet. En av disse databasene er MySQL. For å effektivt kunne svare på store, komplekse spørringer er det vitalt å ha grundig og god databasestatistikk -- altså oversikt over verdiene som finnes ligger i kolonnene og tabellene i databasen. Men for JSON, som er mer komplekst og dynamisk enn de datatypene som vanligvis brukes i relasjonelle databaser finnes det ingen allmen enighet om hvordan denne statistikken burde se ut. I MySQL's tilfelle betyr dette at databasen ikke har tilgang til statistikk for JSON i det hele tatt, noe som fører til tap av ytelse i tilfeller der god statistikk ville utgjort en forskjell. Denne oppgaven undersøker eksisterende tilnærminger til innsamling av statistikk for JSON data, for å finne noe som kunne vært implementert i MySQL. Den beskriver så en implementasjon av et slikt system samt presenterer resultater av målinger som er gjort av resulterende endringer i ytelse og nøyaktighet av estimat.
dc.description.abstract	Databases are one of the cornerstones of our digital society. Their task is not only to store data but also to serve it by responding to user queries. And they must do so as efficiently as possible. A rise in the use of the JSON ("JavaScript Object Notation") data format over the last decades, has led to many of the major relational database vendors adding support for this format. One of these databases is MySQL. To be able to effectively respond to large, complex queries, it is vital to have accurate and up-to-date database statistics -- that is, an overview of the values that exist within the columns and tables of the database. For JSON however, a format that is more complex and dynamic than the data types typically used in relational databases, there is no broad consensus on how these statistics should be gathered. In the case of MySQL, the database has no access to statistics for JSON data at all, resulting in a theoretical performance loss in cases where good statistics would have made a difference. This master's thesis' aim is to investigate existing approaches to collecting statistics for JSON data to find something that could be implemented in MySQL. It will then attempt an implementation of such a system. Finally, the goal is to perform measurements of the implementation's impact on accuracy and performance, to whether further investigation into the topic could be warranted.
dc.language	eng
dc.publisher	NTNU
dc.title	Selectivity Estimation for JSON Data in MySQL
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.ntnu:inspera:142737689:3001 ...
Størrelse:: 8.262Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6559]

Vis enkel innførsel