Where Database Technology Meets Model-Driven Engineering - Rethinking Internal Data Representation in Genus App Platform
Abstract
Research has recently shown a keen interest in database technology for analytical workloads that enables high-performance analysis and on-the-fly aggregation, where the main motivation is Business Intelligence and Business Discovery products. Such products store data in main memory as compressed columns to maximize memory utilization and CPU throughput. Model-driven Engineering, a discipline that aims to increase developer productivity through the use of models on a higher level of abstraction, automates many of the complex programming tasks, like persistence and interoperability. One such product, Genus App Platform, has evolved over time and become a powerful and expressive tool for rapid application development. However, operations that process and analyze large amounts of data are slow, and the platform has a high memory footprint, mainly because no particular attention has been paid to storage format and structures in the source code. Based on the observation that Genus App Platform has many similarities with an in-memory database, we are motivated to investigate if the challenges in Genus App Platform can be overcome by applying techniques used in read-optimized databases.
In this research, we enhance data representation, implement column storage with dictionary encoding and bitpacking in Genus App Platform to reduce memory footprint and increase the platform's ability to handle and analyze large datasets. We identify core operations that can exploit the new storage format, like join and filter operations. We test our implementation using a benchmark for analytical workloads while monitoring that transactional performance is not negatively affected.
In Genus App Platform, column storage with dictionary encoding, bitpacking, and null pointer compression leads to a memory reduction of 67 \% and a load time reduction of 36 \% for the TPC-H inspired Data Mart Load Benchmark. Also, operations that are adjusted to utilize the column storage format sees a performance impact of one, two, and even three orders of magnitude compared to the original implementation. The new internal data representation in Genus App Platform does not significantly reduce transactional performance. Thus, by using Genus App Platform as a proof-of-concept, we have shown how techniques used by read-optimized databases increase Model-Driven Engineering versatility by enabling such tools to handle and analyze large datasets.