An Event-Based Pipeline for Geospatial Vector Data Management
MetadataShow full item record
Ever since we started exploring the world, information about where things are and how to get there has been valuable and sought-after. The hand-drawn, and later printed, map provided an efficient mechanism for storing and communicating this information. While the digital revolution did not render maps outdated, it changed the landscape. At the core of this revolution lies two important changes to how we think about maps. First, the digital revolution established a clear boundary between the physical map and map data. While a printed map traditionally was the only representation of map data, it is now one of many representations. Digital map data, or geospatial data, is a core component in search engines, navigational services, and recommendation engines, and is extensively used in planning processes, urban development, retail, and real estate. In addition, geospatial data plays a major role in handling climate challenges, and current events have demonstrated its importance in handling a pandemic. Second, the digital revolution democratized the map. Surveying and cartography used to be complex and labour-intensive tasks, and the state usually took the role as a provider and maintainer of maps. While the state still values, produces, and maintains maps, this monopoly is a relic of the past. Private corporations provide a plethora of maps and location-based services and numerous businesses provide value-added services on top of governmental, private, and even personal map data. The rise of crowdsourced encyclopaedias paved the way for the crowdsourced map, where volunteers contribute their time and skills to map the world. Thus, geospatial data, which used to be scarce, is now ubiquitous and plentiful in most parts of the developed world. A common denominator for much of this geospatial data is an open license. The creators provide the data to everyone to use, explore, enhance, and monetize. Why? The reasons are as varied as the actors, but a common theme is a combination of civic duty, personal convictions, moral sense, and cold calculation. Against this backdrop of abundant and freely available geospatial data, a series of interesting research challenges can be outlined. Namely, how do we process, store, and manage such vast amounts of data? And, how do we deal with issues of privacy, accuracy, and accountability? These are not just interesting research questions. They are also highly relevant issues. The geospatial industry is currently looking for solutions to handle geospatial data in a more efficient manner. This in turn drives innovation and digitization, which opens new possibilities. While spatial may not be that special, geospatial data is extremely relevant in a wide range of solutions. Enabling efficient use, re-use, and enhancement of existing data repositories is key for rapid product development. The winners in this race will be the ones who successfully consume, process, store, and manage the flow of heterogenous geospatial data from disparate sources, so that they can add value to the data. These challenges are the starting point of this thesis. We describe how an event-based pipeline for geospatial vector data management can be created and present a solid foundation for implementation. This pipeline will enable efficient updating and versioning of open geospatial datasets and allow access to both current and historical data, while enabling a storage layout that is able to scale horizontally. The individual components of the pipeline are either based on novel research or described using work from both the geospatial industry and academia. This combination of research and re-use ensures both a running start and avoids re-inventing the wheel.
Has partsPaper 1: Sveen, Atle Frenvik. The Open Geospatial Data Ecosystem. Kart og plan 2017 ;Volum 77, årg. 110.(2) s. 108-120. Not included due to copyright restrictions.
Paper 2: Sveen, Atle Frenvik; Erichsen, Anne Sofie S.; Midtbø, Terje. Micro-tasking as a method for human assessment and quality control in a geospatial data import. Cartography and Geographic Information Science 2020 ;Volum 47.(2) s. 141-152 Not included due to copyright restrictions. Available at: http://dx.doi.org/10.1080/15230406.2019.1659187
Paper 3: Sveen, Atle Frenvik. Efficient storage of heterogeneous geospatial data in spatial databases. Journal of Big Data 2019 ;Volum 6. s. -
Paper 4: Sveen, Atle Frenvik. GeomDiff - An Algorithm for Differential Geospatial Vector Data Comparison. Open Geospatial Data, Software and Standards 2020 ;Volum 5.(3) s. -