The neural network class

The neural network class: Roberto Lopez.
[email protected]
Artelnics - Making intelligent use of data

The class of neural network implemented in OpenNN is based on the multilayer perceptron. That model is extended here to contain scaling, unscaling, bounding, probabilistic and conditions layers. A set of independent parameters associated to the neural network is also included here for convenience.

Contents:

1. Basic theory

The neural network implemented in OpenNN is based on the multilayer perceptron. That classical model of neural network is also extended with scaling, unscaling, bounding, probabilistic and conditions layers, as well as a set of independent parameters.

Perceptron

A neuron model is the basic information processing unit in a neural network. They are inspired by the nervous cells, and somehow mimic their behaviour. The perceptron is the characteristic neuron model in the multilayer perceptron. Following current practice, the term perceptron is here applied in a more general way than by Rosenblatt, and covers the types of units that were later derived from the original perceptron. The following figure is a graphical representation of a perceptron.

Perceptron neuron model.

Here we identify three basic elements, which transform a vector of inputs into a single output:

A set of parameters consisting of a bias and a vector of synaptic weights.
A combination function.
An activation function or transfer function.

Perceptron layer

Most neural networks, even biological neural networks, exhibit a layered structure. In this work layers are the basis to determine the architecture of a neural network. A layer of perceptrons is composed by a set of perceptrons sharing the same inputs. The architecture of a layer is characterized by the number of inputs and the number of perceptrons. The next figure shows a general layer of perceptrons.

Perceptron layer.

Here we identify three basic elements, which transform a vector of inputs into a vector of outputs:

A set of layer parameters.
A layer combination function.
A layer activation function.

Multilayer perceptron

Layers of perceptrons can be composed to form a multilayer perceptron. Most neural networks, even biological ones, exhibit a layered structure. Here layers and forward propagation are the basis to determine the architecture of a multilayer perceptron. This neural network represent an explicit function which can be used for a variety of purposes.

The architecture of a multilayer perceptron refers to the number of neurons, their arrangement and connectivity. Any architecture can be symbolized as a directed and labelled graph, where nodes represent neurons and edges represent connectivities among neurons. An edge label represents the parameter of the neuron for which the flow goes in. Thus, a neural network typically consists on a set of sensorial nodes which constitute the input layer, one or more hidden layers of neurons and a set of neurons which constitute the output layer.

There are two main categories of network architectures: acyclic or feed-forward networks and cyclic or recurrent networks. A feed-forward network represents a function of its current input; on the contrary, a recurrent neural network feeds outputs back into its own inputs. As it was said above, the characteristic neuron model of the multilayer perceptron is the perceptron. On the other hand, the multilayer perceptron has a feed-forward network architecture.

Hence, neurons in a feed-forward neural network are grouped into a sequence of layers of neurons, so that neurons in any layer are connected only to neurons in the next layer. The input layer consists of external inputs and is not a layer of neurons; the hidden layers contain neurons; and the output layer is also composed of output neurons. The following figure shows the network architecture of a multilayer perceptron.

Multilayer perceptron.

A multilayer perceptron is characterized by:

A network architecture.
A set of parameters.
The layers activation functions.

Communication proceeds layer by layer from the input layer via the hidden layers up to the output layer. The states of the output neurons represent the result of the computation.

In this way, in a feed-forward neural network, the output of each neuron is a function of the inputs. Thus, given an input to such a neural network, the activations of all neurons in the output layer can be computed in a deterministic pass.

Scaling layer

In practice it is always convenient to scale the inputs in order to make all of them to be of order zero. In this way, if all the neural parameters are of order zero, the outputs will be also of order zero. On the other hand, scaled outputs are to be unscaled in order to produce the original units.

In the context of neural networks, the scaling function can be thought as an additional layer connected to the input layer of the multilayer perceptron. The number of scaling neurons is the number of inputs, and the connectivity of that layer is not total, but one-to-one. The following figure illustrates a scaling layer.

Scaling layer.

The scaling layer contains some basic statistics on the inputs. They include the mean, standard deviation, minimum and maximum values. Two scaling methods very used in practice are the minimum-maximum and the mean-standard deviation methods.

Unscaling layer

Also, scaled outputs from a multilayer perceptron are to be unscaled in order to produce the original units. In the context of neural networks, the unscaling function can be interpreted as an unscaling layer connected to the outputs of the multilayer perceptron. The next figure illustrates an unscaling layer.

Unscaling layer.

The unscaling layer contains some basic statistics on the outputs. They include the mean, standard deviation, minimum and maximum values. Two unscaling methods very used in practice are the minimum-maximum and the mean-standard deviation methods.

Bounding layer

Lower and upper bounds are an essential issue for that problems in which some variables are restricted to fall in an interval. Those problems could be intractable if bounds are not applied.

An easy way to treat lower and upper bounds is to post-process the outputs from the neural network with a bounding function. That function can be also be interpreted as an additional layer connected to the outputs. The following figure represents an unscaling layer.

Bounding layer.

Probabilistic layer

A probabilistic function takes the outputs to produce new outputs whose elements can be interpreted as probabilities. In this way, the probabilistic outputs will always fall in the range [0,1], and the sum of all will always be 1. This form of post-processing is often used in patter recognition problems.

The probabilistic function can be interpreted as an additional layer connected to the output layer of the network architecture. The next figure shows a probabilistic layer.

Probabilistic layer.

Note that the probabilistic layer has total connectivity, and that it does not contain any parameter. Two well-known probabilistic methods are the competitive and the softmax methods.

Neural network

A neural network defines a function which is of the following form:

outputs = function(inputs).

The most important element of an OpenNN neural network is the multilayer perceptron. That composition of layers of perceptrons is a very good function approximator.

Many practical applications require, however, extensions to the multilayer perceptron. OpenNN presents a neural network with some of the most standard extensions. They include the scaling, unscaling, bounding, probabilistic or conditions layers.

For instance, a function regression problem might require a multilayer perceptron with scaling and unscaling layers. On the other hand, an optimal control problem may need a multilayer perceptron with a conditions layer.

Finally, some problems might require the use of other adjustable parameters than those belonging to the multilayer perceptron. That kind of parameters are called independent parameters.

Some basic information related to the input and output variables of a neural network includes the name, description and units of that variables. That information will be used to avoid errors such as interchanging the role of the variables, misunderstanding the significance of a variable or using a wrong units system.

2. Software model

As we have seen, the OpenNN neural network is composed by a multilayer perceptron plus some other kinds of layers. In this section we study the software model of the NeuralNetwork class.

Composition

The characterization in classes of the concepts studied in the previous section is as follows:

Perceptron:: The class which represents the concept of perceptron neuron model is called Perceptron.
PerceptronLayer:: The class representing a layer of perceptrons is called PerceptronLayer.
MultilayerPerceptron:: The class which represents a feed-forward architecture of perceptron layers is called MultilayerPerceptron.
Scaling layer:: The class which represents a layer for scaling variables is called ScalingLayer.
Unscaling layer:: The class which represents an unscaling layer is called UnscalingLayer.
Bounding layer:: The class representing a layer of bounding neurons is called BoundingLayer.
Conditions layer:: The class which applies input-output conditions is called ConditionsLayer.
Independent parameters:: A class containing parameters not belonging to the multilayer perceptron is called IndependentParameters.
Neural network:: The class which aggregates all the different neural network concepts is called NeuralNetwork.

The next figure depicts an association diagram for the neural network class.

Composition diagram for the NeuralNetwork class.

Derived classes

The next task is then to establish which classes are abstract and to derive the necessary concrete classes to be added to the system.

The neural network class in OpenNN will be intensively used by any application. Therefore, for performance reasons, all the composing classes have been designed to be concrete.

Let us then examine the classes we have so far:

Perceptron:: The class Perceptron is concrete, and can implement different activation functions.
Perceptron layer:: The class PerceptronLayer is also concrete, since it is defined as a vector of perceptrons.
Multilayer perceptron:: The class MultilayerPerceptron is a concrete class and is itself suitable for instantiation. This class is implemented as a vector of layers of perceptrons.
Scaling layer:: The class ScalingLayer is concrete, and implements the minimum-maximum and mean-standard deviation scaling methods.
Unscaling layer:: The class UnscalingLayer is also concrete, and implements the minimum-maximum and mean-standard deviation unscaling methods.
Bounding layer:: The class BoundingLayer is concrete. It sets to their bound values those inputs which are below or above them.
Probabilistic layer:: The class ProbabilisticLayer is concrete, and implements the competitive and softmax methods.
Conditions layer:: The class ConditionsLayer has also been designed to be concrete. It implements methods to hold one or two conditions. For more difficult situations, further classes must be derived.
Inputs:: The class Inputs is concrete. It mainly stores a few strings with the names, units and descriptions of the neural network inputs.
Outputs:: The class Outputs is concrete. It mainly stores a few strings with the names, units and descriptions of the neural network outputs.
Independent parameters:: The class IndependentParameters is concrete. It contains other adjustable parameters than those belonging to the multilayer perceptron.

Attributes and operations

An attribute is a named value or relationship that exists for all or some instances of a class. An operation is a procedure associated with a class.

In UML class diagrams, classes are depicted as boxes with three sections: the top one indicates the name of the class, the one in the middle lists the attributes of the class, and the bottom one lists the operations.

Perceptron:

A perceptron neuron model has the following attributes:

A bias.
A set of synaptic weights.
The activation function.

It performs the following main operations:

Calculate the neuron output for a given input.
Calculate the derivatives of the output with respect to the inputs.

Perceptron layer:

The perceptron layer has the following members:

A set of perceptrons.

It performs the following methods:

Calculate the layer output for a given input.
Calculate the derivatives of the outputs with respect to the inputs.

Multilayer perceptron:

A multilayer perceptron has the following attributes:

A set of layers of perceptrons.

It performs the following main operations:

Calculate the output for a given input.
Calculate the Jacobian for a given input.
Calculate the Hessian form for a given input.

Scaling layer:

The scaling layer has the following members:

The main statistics of the variables.
The scaling method.

It implements the following main members:

Calculate the scaled variables for unscaled variables.
Calculate the derivatives of the scaling function.

Unscaling layer:

The unscaling layer is similar to the scaling layer, with the following members:

The main statistics of the variables.
The unscaling method.

It implements the following main members:

Calculate the unscaled variables for scaled variables.
Calculate the derivatives of the unscaling function.

Bounding layer:

The bounding layer contains the following attributes:

The lower and upper bounds of the variables.

It performs the following main operations:

Calculate bounded variables for unbounded ones.
Calculate the derivatives of the bounding function.

Probabilistic layer:

The probabilist layer contains:

The probabilistic method.

It computes the following functions:

Calculate probabilistic variables for non-probabilistic ones.
Calculate also the derivatives.

Conditions layer:

The conditions layer contains the following:

The conditions values.
The conditions method.

It performs the following:

Calculate outputs holding some conditions.
Calculate also the derivatives of that conditioned outputs.

Inputs:

This class stores the following data:

The names, units and descriptions of the input variables.

It performs the following:

Write default names for the inputs.

Outputs:

This class stores the following data:

The names, units and descriptions of the output variables.

It performs the following:

Write default names for the outputs.

Independent parameters:

The class representing independent parameters contains the following main members:

A set of parameters.
Information and statistics on the parameters.
Scaling/Unscaling and bounding methods.

The independent parameters class can perform the following operations:

Scale and unscale the parameters.
Bound the parameters.

NeuralNetwork:

The NeuralNetwork class contains the following main members:

A pointer to a multilayer perceptron.
A pointer to a scaling layer.
A pointer to an unscaling layer.
A pointer to a bounding layer.
A pointer to a probabilistic layer.
A pointer to a conditions layer.
A pointer to an inputs object.
A pointer to an outputs object.
A pointer to a set of independent parameters.

It performs the following main operations:

Calculate the outputs from the neural network.
Calculate the derivatives of the outputs.
Calculate the second derivatives of the outputs.

The following figure illustrates the principal attributes and operations in the NeuralNetwork class.

Members and methods in the NeuralNetwork class.

3. Main classes

As it has been said, OpenNN implements quite a general neural network in the class NeuralNetwork. It contains a multilayer perceptron with an arbitrary number of layers of perceptrons. On the other hand, it includes additional layers for inputs scaling, outputs unscaling, outputs bounding, outputs probabilizing or outputs holding some other conditions. This neural network can deal with a wide range of problems. Finally this class includes independent parameters, which can be useful for some problems.

The NeuralNetwork class is one of the most important in OpenNN, having many different members, constructors and methods.

Members

The NeuralNetwork class contains:

A pointer to a multilayer perceptron.
A pointer to a scaling layer.
A pointer to an unscaling layer.
A pointer to a bounding layer.
A pointer to a probabilistic layer.
A pointer to a conditions layer.
A pointer to an inputs object.
A pointer to an outputs object.
A pointer to a set of independent parameters.

All that members are declared as private, and they can only be used with their corresponding get or set methods.

Constructors

There are several constructors for the NeuralNetwork class, with different arguments.

The default activation function for the hidden layers is the hyperbolic tangent, and for the output layer is the linear. No default information, statistics, scaling, boundary conditions or bounds are set.

The easiest way of creating a neural object is by means of the default constructor, which creates an empty neural network.

NeuralNetwork nn;

To construct neural network having a multilayer perceptron with, for example, 3 inputs and 2 output neurons, we use the one layer constructor

NeuralNetwork nn(3, 2);

All the parameters in the multilayer perceptron object that we have constructed so far are initialized with random values chosen from a normal distribution with mean 0 and standard deviation 1. By default, this one-layer perceptron will have linear activation function.

To construct a neural network containing a multilayer perceptron object with, for example, 1 input, a single hidden layer of 3 neurons and an output layer with 2 neurons, we use the two layers constructor

NeuralNetwork nn(1,6,2);

All the parameters here are also initialized at random. By default, the hidden layer will have hyperbolic tangent activation function and the output layer will have linear activation function.

In order to construct a neural network with a more complex multilayer perceptron, its architecture must be specified in a vector of unsigned integers. For instance, to construct a multilayer perceptron with 1 input, 3 hidden layers with 2, 4 and 3 neurons and an output layer with 1 neuron we can write

Vector<unsigned> architecture(5);
architecture[0] = 1;
architecture[1] = 2;
architecture[2] = 4;
architecture[3] = 3;
architecture[4] = 1;

NeuralNetwork nn(architecture);

The network parameters here are also initialized at random.

The independent parameters constructor creates a neural network object without a multilayer perceptron but with a given number of independent parameters,

NeuralNetwork nn(3);

The above object can be used, for instance, as the basis for solving a function optimization problem not related to neural networks.

It is possible to construct a neural network by loading its members from a XML file. That is done in the following way,

NeuralNetwork nn(`neural_network.xml');

Please follow the format of the neural network file strictly.

Finally, the copy constructor can be used to create an object by copying the members from another object,

NeuralNetwork nn1(2, 4, 3);
NeuralNetwork nn2(&nn1);

Methods

This class implements get and set methods for each member. The following sentences show the use of some of them,

NeuralNetwork nn(3, 2);

MultilayerPerceptronPointer* mlpp = nn.get_multilayer_perceptron_pointer();

unsigned inputs_number = mlpp.count_inputs_number();
unsigned outputs_number = mlpp.count_outputs_number();

The number of parameters of the neural network above can be accessed as follows

unsigned parameters_number = nn.count_parameters_number();

The network parameters can be initialized with a given value by using the initialize() method,

NeuralNetwork nn(4, 3, 2);

nn.initialize(0.0);

To calculate the output vector of the network in response to an input vector, we use the method calculate_outputs(). For instance, the following sentence returns the neural network output value for an input value.

Vector<double> inputs(1); 
inputs[0] = 0.5;

Vector<double> outputs = nn.calculate_outputs(inputs);

To calculate the Jacobian matrix of the network in response to an input vector we use the method calculate_Jacobian(). For instance, the following sentence returns the partial derivatives of the outputs with respect to the inputs.

Matrix<double> Jacobian = nn.calculate_Jacobian(inputs);

We can save a neural network object to a data file by using the method save(). For instance, the next code saves the neural network object to the file neural_network.xml.

NeuralNetwork nn;

nn.save(`neural_network.xml');

We can also load a neural network object from a data file by using the method load(). Indeed, the following sentence loads the neural network object from the file neural_network.xml.

nn.load(`neural_network.xml');

Bibliography

OpenNN Copyright © 2014 Roberto Lopez (Artelnics)