With the new digital devices, the amount of generated data is continuously increasing. Through the World Wide Web, every type of data may be stored in database. The birth of the industry 4.0 represents the involvement of the industrial sector within this process. The huge amount of data produced by industrial activities represent a valuable resource. For instance, the Seveso III directive, whose purpose is to prevent major accidents in establishments handling dangerous substances, underlined the need of monitoring and analysing data, in order to improve the safety performance of the plants. Use past data of events that occurred in these plant, to learn from them to make prediction can be a new approach that has to be considered. In this context, machine learning techniques are suggested to retrieve knowledge from data to be able to support or take decisions. Even if these tools have been known for years, their application is not widespread yet. This work has the aim to suggest a new approach to manage and analyse data from different sources regarding process industry. The knowledge retrieve should help building and supporting prevention safety barriers, sensibility improving the overall risk management. The machine learning tools used to analyse data are from the open source library TensorFlow. In particular, the study considers three prediction models, which use three different approach: a linear model, a deep neural network model and a hybrid of the previous two models. Based on the input, they allow for predictions of the studied event. Two different levels of data source have been considered: the MHIDAS (Major Hazards Incident Data Service) database and an alarm database from a specific chemical establishment. An activity of data mining allowed preparing the databases. Specific keywords were considered to guarantee appropriate data exploitation. The database have been divided in two parts: one for model training, and the other one for model evaluation. Fundamental differences between the developed models have been found based on their target. If it is important to have conservative prediction, e.g. to predict accidents events, the “recall” metric should be prioritized. On the other hand, if the goal is to improve risk awareness and promote response, the “precision” metric should be the priority. Another metric that is able to consider the overall performance of models is the area under the curve precision-recall. Once determined the most suitable models through such metrics, PR (precision-recall) curves can be built with the prediction values returned by the simulation, in order to further refine decision thresholds and consequently increase recall or precision. As general conclusions, a comparison of the results obtained for the two databases (MHIDAS and alarm database) showed that, for the MHIDAS database, a deep neural network model and hybrid models, are the most suitable to achieve good recall of accident prediction. On the other hand, for the alarm database, the linear model is the one that best perform for high precision to promote response and the hybrid model for high recall of critical alarm predictions.