A Big Data Approach to Generate Training Data for Automatic Ship Detection - An Integration of AIS and Sentinel-2 MSI
Abstract
A fast and scalable approach to combine global satellite images with ship navigational messages
In this project, we have developed a data processing pipeline that combines multiple sources of data in order to automate the process of ship detection in satellite images. The main concept is based on integrating Sentinel-2 Optical Multi-Spectral Imagery with ship navigational messages provided by the Automatic Identification System (AIS). The successful integration made it possible to automatically generate a training dataset consisting of images labelled with ship positions. A future goal is for this training data set to be used in supervised machine learning to train a neural network to automatically recognize ship features in the images. Our vision is for this to accompany AIS in applications ensuring safety in the marine sector.
Our data processing pipeline includes all aspects of data analytics: collection, preprocessing, cleansing, storing, filtering, combining, analysing, as well as visualising the data. We have designed the system to be modular and highly scalable, such that it can further be developed into supporting real-time analysis of any aerial imagery.
The solution we have developed can be divided into three main parts:
The Image Selection Optimisation is a proposed approach to select satellite images globally, with a high probability of containing ships. This is done by processing global AIS data within an arbitrary time interval. When performing a density analysis on 400 million global ship navigational messages, we experienced a total execution time below 3 minutes. The optimisation enabled by such analyses will further save an immense amount of time in the generation of the training dataset.
The Ship Position Estimation computes the coordinates for every ship within an arbitrary satellite image. This includes complex data integrations, both spatially and temporally. It relies on having access to a complete dataset from AIS around the time the image was sensed. It also performs big corrections, as the provided timestamp was found to be highly inaccurate. The execution time for an arbitrary image is below two seconds.
The Training Data Set Generation extracts regions from the image around each estimated ship position, resulting in smaller training images. This will further be advanced into augmenting the training data set by varying the region extraction using appropriate transformations.