The Next Generation Index Structures - Learned Indexes

Ryfetten, Jon

dc.contributor.advisor	Bratsberg, Svein Erik
dc.contributor.author	Ryfetten, Jon
dc.date.accessioned	2021-09-15T16:51:20Z
dc.date.available	2021-09-15T16:51:20Z
dc.date.issued	2021
dc.identifier	no.ntnu:inspera:74730513:31757117
dc.identifier.uri	https://hdl.handle.net/11250/2778090
dc.description.abstract	Indekser er i dag svaret når det kommer til effektiv data tilgang. Nylig kom det en ny type indekser, kalt learned indexes. Disse bruker maskin læring som en sentral del. Med nylige oppfinninger kan indekstypen ha muligheten til å bli den nye standarden. I 2017 kom den første learned index som startet et nytt forskingsfelt innenfor databaser. I flere tiår har vi sett mindre oppdatering, nye indekser, og andre mindre forbedringer, men alle har en ting felles. De bruker ikke distribusjonen av data. I motsetning til tradisjonelle indekser gjør learned indexes en radikal paradigmeendring, og fokuserer på distribusjonen av data. Dette har vist seg å være svært effektivt, og man har sett store ytelses forbedringer. Nyere learned indexes forbedrer ytelses med opptil 71% over B-trær, og flytter minneforbruket fra gigabyte til megabyte. Google målte nylig en forbedring i Bigtable på opptil 50% i lesemengde. I denne master oppgaven går vi i dybden på fagfeltet. Kan indeksene levere så bra ytelse som påstår? Vi ser på de forskjellige måtene å oppnå learned indexes, og sammenligner de med tradisjonelle indekser. Vi tar også en titt på hva fremtiden innenfor learned indexes kan være.
dc.description.abstract	Index structures are today the answer when it comes to efficient data access. Recently, a new type of indexes called learned indexes surfaced, employing machine learning at the core. With very recent innovations, the indexes can have the potential to become the "de facto" standard and then be the answer to efficient data access. In 2017, the first learned index paper surfaced, which marked the beginning of the new research field within databases. For decades, we have seen indexes been optimized, tweaked, and updated for new hardware. Improvements such as LSM, bloom filters, and other new indexes have been developed over the years to improve indexes and optimize performance. All of the improvements and optimizations had one thing in common; They assumed nothing about the stored data and only slightly improved the performance over their competitors. Learned indexes are now about to change the situation completely. It makes a radical paradigm change in the way indexes are made by focusing on the data being stored instead of assuming a general distribution of data. It also makes extensive performance claims. Recent publications of learned indexes, such as the ALEX and PGM index, improve the space occupancy by orders of magnitude (from gigabyte to megabyte). It can also improve query and update time by up to 71% over a B-tree. Google recently measured over 50% increase in throughput and a significant increase in lookup time using a learned index in Bigtable. In this thesis, we will take deep dive into the current landscape of learned indexes. Can learned indexes be the new "de facto" standard? We will look at the different approaches, the performance of learned indexes versus traditional indexes, and peek at the future direction for learned indexes.
dc.language	eng
dc.publisher	NTNU
dc.title	The Next Generation Index Structures - Learned Indexes
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.ntnu:inspera:74730513:31757 ...
Størrelse:: 9.454Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6559]

Vis enkel innførsel