Deep learning algorithms in environmental software – a practical application of neural networks in air quality forecasting

Project Description

Since ancient times, humans have been fascinated with predicting the future. Controlling unknown variables has shaped our mental evolution in a number of ways, and can be considered to be one of the reasons of our “successful” course through history.

We, as a species, have come a long way from the days of reading tea leaves. Our modern-day oracles consist of silicone and wires, and they base their predictions on math, rather than any metaphysical sources of information. Machine learning has become one of the most useful and rapidly-advancing fields of research, and is now being utilised in a wide array of applications on the web, as well as on desktop and mobile apps and devices, to provide a wealth of information to its users. The applications of machine learning are not limited to prediction, and its various algorithms can be used to infer system states based on limited (according to human perception) input data.

Air quality forecasting is one of the problems that machine learning can be applied to in order to provide information on future states of the environment based on specific data, such as the weather and the current concentrations of air pollutants.

But what exactly is machine learning…

Machine learning is a subset of Artificial Intelligence, which uses mathematical and statistical models to extract a sort of understanding of the data it receives, and leverage this understanding to provide predictions on future states of the modelled system. It can also be used to categorise specific data instances by recognising data patterns. This allows machine learning algorithms to be utilised in fraud detection, spam filtering, malware threat detection, and other similar applications [1].

Figure 1: Machine learning categories [3]

Machine learning can be split into four categories [1], [2]:

Supervised learning, where the data sets provided to the algorithms are labelled, so the algorithm builds a general idea of what similarities exist between instances of the same class;
Unsupervised learning, where the algorithm is not provided with any information on which class an instance belongs to, and discovers hidden connections between data points;
Semi-supervised learning, which is a mix of the two paradigms mentioned above. Some data are labelled, but the algorithm can discover other connections between data points;
Reinforcement learning, where an algorithm learns how to perform actions based on positive or negative cues, as it works on its own to discover what steps need to be taken.

Machine learning is ubiquitous, so if you are using the Internet on a regular basis, chances are good that you’ll have come across an application of machine learning (chatbots, recommendation engines, personalised content, etc.)

…and what is deep learning?

Deep learning is a subset of machine learning [3] that is based on large-scale neural networks. A neural network is modelled after the human brain’s neurons, replicating the nodes and connections between them.

Figure 2: Neural network and deep neural network [5]

The schematic above shows how a neural network is structured [4]. The nodes are organised in layers, with each layer receiving information from the previous layer, and forwarding its computations to the next layer. Each node is connected to every node in the next layer. This complex web of nodes and connections enables a neural network to provide predictions with a very high degree of accuracy.

A deep neural network follows the same architecture as a simple neural network, but it consists of many more layers, a fact which allows a deep neural network of reaching more sophisticated conclusions about the data being fed to it.

Benefits of using deep learning for forecasting

Forecasting of any type can be a very complicated task, depending on the specific application. Weather forecasting, for instance, is very tricky, because of the chaotic nature of weather phenomena. Air quality forecasting is difficult, as well, because many factors can affect the composition of the atmosphere, such as traffic, air speed, humidity, etc. A deep learning model has specific advantages that make it ideal for tackling forecasting problems [8].

A deep learning model can receive and handle multiple inputs in an efficient manner. For problems where the output is dependent on many factors, this is ideal. Also, because of its multilayered structure, it can derive complex relationships between data points and discover connections between the data with a very high degree of accuracy, which usually surpasses what a human operator could achieve.

The CALLISTO air quality forecasting algorithm

The CALLISTO Deep Learning algorithm utilises a Long-Short Term Memory (LSTM) [6], [7] neural network model, which works by combining weather data and data on air pollutant concentrations. The model has been trained in a supervised manner by providing labelled data sets of weather conditions and their accompanying air pollutant concentrations. It receives these data, for the past 14 days, from AQHub, DRAXIS’ proprietary environmental data repository and provides its output, which consists of air pollutant concentrations for the next two days. The concentrations of air pollutants can then be fed directly to the AQHub air quality calculators, and provide the users with information and recommendations depending on how clean the air is.