Reconstruction of 3D Building Models with Semantic Information Using Crowdsourcing Approaches
Abstract
Over the last decade, increasing municipalities have decided to gear up the construction of smart cities. The transformation to a smart city requires to digitalize the whole city as faithfully as possible. To accomplish this goal, the modelling of the city should consider two major aspects: 1) level of details (LoDs) of 3D building models and 2) the inclusion of semantic information. The LoDs concept following the City Geography Markup Language (CityGML) 2.0 standard depicts the content of 3D building models and meanwhile, LoD3 building models are the research focus of this thesis. 3D building models at LoD3, along with semantic information, play an essential role in smart city-related applications. They not only are appropriate for visualization tasks, but also are beneficial to further advanced analysis, e.g., solar energy potential estimation, flood simulation, urban planning, post-disaster assessment, and telecommunication like the simulation of 5G signal propagation etc. However, LoD3 3D building models are only available in some small regions, and they are lack of the semantic information. Moreover, the cost of generating LoD3 models is high both in time and in labor, since it is impossible for existing methods to detect and reconstruct LoD3 buildings in an automated manner. Data acquisition is another challenge as well because it requires expensive equipment/sensors. Hence, this PhD thesis aims to propose possible solutions to resolve above challenges.
This thesis concentrates on the utilization of crowdsourcing approaches for reconstruction of LoD3 3D building models with semantic information. Given the difficulty of automatically detecting and reconstructing LoD3 building models, and the challenge in data collection, an interactive approach is proposed to reconstruct LoD3 3D building models with semantics from VGI (Volunteered Geographic Information) data. Specifically, a web-based interactive 3D building modelling platform, VGI3D, is developed to reconstruct 3D building models from street-level images, which contain rich information of façade structures, thus ensuring that the reconstructed 3D models have semantic information. VGI3D is designed to have simple interoperability in order to attract and encourage more volunteers for 3D model contributions, and also with the ambition of becoming a VGI platform to collect LoD3 building models with semantics by using the power of crowdsourcing. Moreover, a limited usability testing is conducted among expert and non-expert participants, which proves the usefulness of VGI3D and its promising value for the 3D modelling community.
Usually, the complete roof structures are not visible in street-level images. In VGI3D the roof models are automatically generated by selecting a specific roof type, but their geometric accuracy is not guaranteed. In addition, some buildings have complex roof structures, but they are excluded in the predefined simple roof types of VGI3D. To address these issues, an improved multi-task pointwise network is proposed. This network can simultaneously segment instances (i.e., individual roof planes) and semantics (i.e., groups of roof planes with similar geometric shapes) in standard airborne laser scanning (ALS) point clouds. The segmented roof planes can then be reconstructed into polygon meshes and combined with the façade structures generated from VGI3D. As a result, more accurate and photorealistic 3D building models can be created eventually. Furthermore, to train the proposed network, a new roof dataset (RoofNTNU) with 7 typical roof types in West Europe is established, by taking ALS point clouds with standard point density as training data for automatic and more general segmentation. The experiments on RoofNTNU dataset demonstrate the effectiveness of the proposed method, achieving promising segmentation results: the mean precision (mPrec) of 96.2% for instance segmentation task and mean accuracy (mAcc) of 94.4% for semantic segmentation task.
The VGI3D with simple interoperability alone may not be sufficient to motivate more volunteers to contribute. After the 3D building models are reconstructed, it is necessary to integrate them into a virtual 3D city environment, which allows users to be able to view and interact with their 3D models in a 3D scene, presenting a natural way to perceive 3D objects. This can not only increase their sense of fulfillment in contributing to the VGI community, but also can potentially attract more valuable users from the 3D modelling domain, as well as even increase the interests of non-experts. To accomplish this goal, a 3D visualization platform is therefore developed, where digitizes the real city environment including 3D terrain, 3D building models, 3D road networks, and other road-related 3D objects (e.g., traffic lights/signs, trees, etc.).
For the implementation of the 3D visualization platform, the 3D terrain covering the whole Norway is generated from Digital Terrain Model (DTM) data and is optimized using triangulated irregular network (TIN) and LoDs for faster rendering. 3D building models are obtained from three means: hybrid generation of ALS point clouds and OpenStreetMap (OSM) footprints (LoD2); VGI3D generation (LoD3); and crowdsourced SketchUp models (LoD3). All of them are georeferenced and saved as CityGML format and then are visualized on the platform relying on 3D Tiles technique. Regarding the 3D road networks, a collision detection is utilized to project the original 2D road polylines onto 3D terrain. Then, Catmull-Rom spline algorithm is employed to expand the 3D road polylines to 3D polygons. To collect road-related objects (traffic signs/lights), an automatic method is proposed to detect road objects from VGI street-level images and place them to the approximate correct positions. Two convolutional neural networks (CNNs) are applied to detect and classify the road objects. Additionally, to locate the detected objects, an attributed topological binary tree (ATBT) is firstly established
based on urban rules for image sequences to depict the coherent relations of topologies, attributes and semantics of the road objects. Then the ATBT is further matched with map features on OpenStreetMap (OSM) to determine the correct placed positions. A case study is conducted and achieves near-precise localization results in terms of completeness and positional accuracy.
To summarize, crowdsourcing is a powerful tool to enhance the reconstruction of 3D building models with semantic information, thus accelerating the digitalization of smart cities. VGI3D is a user-friendly and efficient system that requires less input and simple user interaction, making it ideal for quick yet relatively detailed (i.e., LoD3) modelling. RoofNet is a significant enhancement to VGI3D in roof reconstruction. The combination of VGI3D and 3D visualization platform is beneficial to attract more users and motivate them for 3D building contributions at the same time.
The thesis also presents the research directions for future work. For example, it is important to assess the quality of LoD3 3D building models generated by VGI3D, which has not been done yet. Hence, quality evaluation should be a priority in the future. Additionally, VGI3D is still in its early stage of development and has plenty of room for further improvement and optimization. On one hand, it is necessary to further strengthen our CNN model and enable it to detect façade elements as accurately as possible so as to reduce the user interaction costs for updating 3D models. On the other hand, from suggestions/comments of usability testing, it is apparent that reconstructing complex buildings will facilitate applications that place high demands on the 3D models.
Has parts
Paper 1: Zhang, Chaoquan; Fan, Hongchao; Kong, Gefei. VGI3D: an Interactive and Low-Cost Solution for 3D Building Modelling from Street-Level VGI Images. Journal of Geovisualization and Spatial Analysis (JGSA) 2021 ;Volum 5.(2) s. – Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License CC-BY. Available at: http://dx.doi.org/10.1007/s41651-021-00086-7Paper 2: Fan, Hongchao; Kong, Gefei; Zhang, Chaoquan. An Interactive platform for low-cost 3D building modeling from VGI data using convolutional neural network. Big Earth Data 2021 ;Volum 5.(1) s. 49-65. This is an Open Access article distributed under the terms of the Creative Commons Attribution License CC-BY. Available at: https://doi.org/10.1080/20964471.2021.1886391
Paper 3: Zhang, Chaoquan; Fan, Hongchao; Li, Wanzhi. Automated detecting and placing road objects from street-level images. Computational Urban Science 2021 ;Volum 1.(1). Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License CC-BY. Available at: http://dx.doi.org/10.1007/s43762-021-00019-6
Paper 4: Zhang, Chaoquan; Fan, Hongchao. An Improved Multi-Task Pointwise Network for Segmentation of Building Roofs in Airborne Laser Scanning Point Clouds. Photogrammetric Record 2022 ; Volum 37.(179) s. 260-284. This is an open access article under the terms of the Creative Commons Attribution License CC-BY. Available at: http://dx.doi.org/10.1111/phor.12420