Miniature Machine Learning System

This system provides a configurable neural network class, and a simulator for a CSTR with a van de Vusse reaction, which can generate data to be used for training, and testing the neural network.

The neural network can be applied to a different process, by configuring its input features and outputs. Data must then be fetched from a different source, or the van de Vusse-simulator can be used as a template for developing your own simulator for a different process.

Info

The system is build with Python, uses Conda for packet management, and the PyTorch machine learning framework. The CasADi framework is used to develop the simulator.

The system as it is provides basic functionality for performing different tests of various ML techniques or technologies. The system would massively benefit from further development to create more extensive functionlity.

Contents

The system consist of:

Configurability

The system is created so that any necessary configurations can be performed in config.json, without requiring much knowledge about the code base. The options are as follows:

Neural net

Vdv Model - The process in the simulator

Input generation

Data extraction

Predictions

Pipelines

Training pipeline

Invoke the modules in the following order:
generate_input_vandevusse.py -> simulate_vandevusse.py -> steady_state_extraction.py -> train_model.py -> test_model.py

Prediciton pipeline

Invoke the modules in the following order:
generate_input_vandevusse.py -> simulate_vandevusse.py -> make_predictions_batch.py

Further work

The system consists of several components, which forms ML pipelines. More components could be implemented into the system, such as hyperparameter optimization, more extensive data pre-processing, etc. The system is modular, and integrating extra components should provide few problems.

The steady_state_extraction.py module could also benefit from more extensive functionality, e.g. by performing analyses of the dynamics of the simulation process to implement more helpful steady state extraction. This could include investigating the time constant, allowing for extracting data based on the steady state of the input, which are the features we know are always measurable.

It would be interesting to integrate other technologies into this system, such as Docker, in order to create a containerized system that could be deployed somewhere. Kafka is also an option in order to enable the functionality of an event-driven system.

Automating the pipelines would also be auspicious, and a step in the right direction of creating a system that could be subjected to a proper form of MLOps.