Solving function regression problems
Roberto Lopez.
[email protected]
Artelnics - Making intelligent use of data

Function regression is the most popular learning task for neural networks. It is also called modelling. In this tutorial we show how to solve function regression problems with OpenNN, and illustrate the whole process with a simple example.

Contents:
  1. Introduction.
  2. Data set.
  3. Neural network.
  4. Performance functional.
  5. Training strategy.
  6. Testing analysis.

1. Introduction

In a function regression problem, the neural network learns from knowledge represented by a data set consisting of input-target instances. The targets are a specification of what the response to the inputs should be, and are represented as continuous variables. The basic goal here is to model the conditional distribution of the target variables, conditioned on the input variables. This function is called the regression function.

The formulation of a function regression problem requires:

A common feature of most data sets is that the data exhibits an underlying systematic aspect, represented by some function, but is corrupted with random noise. The central goal is to produce a model which exhibits good generalization, or in other words, one which makes good predictions for new data. The best generalization to new data is obtained when the mapping represents the underlying systematic aspects of the data, rather capturing the specific details (i.e. the noise contribution) of the particular data set.

2. Data set

The following table shows the format of a data set for function regression. It consists q instances consisting of n input variables and m target variables. All input and targets are real values.

input_1_1 ··· input_1_n   target_1_1 ··· target_1_m
input_2_1 ··· input_2_n   target_2_1 ··· target_2_m

···

input_q_1 ··· input_q_n   target_q_1 ··· target_q_m
input_q_1 ··· input_q_n   target_q_1 ··· target_q_m

In the example considered here we have a set with 101 instances, and 1 input (x) and 1 target (y) variables. The data set is listed next.

0.0  0.454
0.1  0.723
0.2  0.908
0.3  0.854
0.4  0.587
0.5  0.545
0.6  0.306
0.7  0.129
0.8  0.185
0.9  0.263
1.0  0.503

The first step is to set up the data set. In order to do that we construct a DataSet object and load the data from a file.

DataSet data_set;

data_set.load_data("simplefunctionregression.dat");

A simple statistical analysis must be always performed in order to check for data consistency. Basic statistics of a data set include the mean, standard deviation, minimum and maximum values of input and target variables for the whole data set and the training, generalization and testing subsets. An histogram of each input and target variables should also be plot in order to check the distribution of the available data.

Then we add the information about the inputs and the targets in the Variables object.

Variables* variables_pointer = data_set.get_variables_pointer();

variables_pointer->set_use(0, Variables::Input);
variables_pointer->set_use(1, Variables::Target);

variables_pointer->set_name(0, "x");
variables_pointer->set_name(1, "y");

Matrix inputs_information = variables_pointer->arrange_inputs_information();
Matrix targets_information = variables_pointer->arrange_targets_information();

When solving function regression problems it is always convenient to split the data set into a training, a generalization and a testing subsets. The size of each subset is up to the designer. Some default values could be to use 60%, 20% and 20% of the instances for training, generalization and testing, respectively. On the other hand, there are several data splitting methods. Two common approaches are to generate random indices or to specify the required indices for the training, generalization and testing instances.

For simplicity, in our example we will use all the instances for training.

Instances* instances_pointer = data_set.get_instances_pointer();

instances_pointer->set_training();

Also, it is a must to scale the data with the data statistics. There are two main data scaling methods, the mean and standard deviation and the minimum and maximum. The mean and standard deviation method scales the data for mean 0 and standard deviation 1. The minimum and maximum method scales the data for minimum -1 and maximum 1.

Here we scale all the input and target variables so that they fall in the range [-1,1].

Vector< Statistics > inputs_statistics = data_set.scale_inputs_minimum_maximum();
Vector< Statistics > targets_statistics = data_set.scale_targets_minimum_maximum();

3. Neural network

A neural network is used to represent the regression function. The number of inputs must be equal to the number of inputs in the data set, n, and the number of outputs must be the number of targets, m. This neural network will contain a scaling layer, a multilayer perceptron and an unscaling layer. It might optionally contain a bounding layer. The next figure shows a general neural network for solving function regression problems.
Neural network for function regression.

In general, using a multilayer perceptron with a one hidden layer will be enough. A default value to start with for the size of that layer could be

hidden neurons number= inputs number + outputs number

Please note that the complexity which is needed depends very much on the problem at hand, and the above equation is just a rule of thumb. However, there are standard methods to find the correct complexity of a neural network for function regression problems. The most common is called model selection, which is described later in this section.

The activation functions for the hidden layers and the output layer are also design variables. However, hyperbolic tangent activation function for the hidden layers and linear activation function for the output layer are widely used when solving function regression problems.

Scaling of inputs and unscaling of outputs should not be used in the design phase, since the data set has been scaled already. When moving to a production phase, the inputs scaling and outputs unscaling methods should be coherent with the scaling method used for the data.

The neural network in the above figure spans a parameterized function space. That parameterized space of functions will be the basis to approximate the regression function.

The first step in solving the problem formulated in this section is to choose a network architecture to represent the regression function. Here a multilayer perceptron with a sigmoid hidden layer and a linear output layer is used. The multilayer perceptron must have one input, since there is one input variable; and one output neuron, since there is one target variable. The size of the hidden layer is set to 2. This neural network can be denoted as 1 : 2 : 1. It defines a family V of parameterized functions y(x) of dimensions s = 4, which is the number of neural parameters in the multilayer perceptron.

The following figure is a graphical representation of that network architecture. The neural parameters are initialized at random with a normal distribution of mean 0 and standard deviation 1.

// Neural network

NeuralNetwork neural_network(1, 3, 1);
Inputs* inputs_pointer = neural_network.get_inputs_pointer();
inputs_pointer->set_information(inputs_information);

Outputs* outputs_pointer = neural_network.get_outputs_pointer();
outputs_pointer->set_information(targets_information);

Network architecture for the simple function regression example.

neural_network.construct_scaling_layer();
ScalingLayer* scaling_layer_pointer = neural_network.get_scaling_layer_pointer();
scaling_layer_pointer->set_statistics(inputs_statistics);
scaling_layer_pointer->set_scaling_method(ScalingLayer::NoScaling);

neural_network.construct_unscaling_layer();
UnscalingLayer* unscaling_layer_pointer = neural_network.get_unscaling_layer_pointer();
unscaling_layer_pointer->set_statistics(targets_statistics);
unscaling_layer_pointer->set_unscaling_method(UnscalingLayer::NoUnscaling);

4. Performance functional

The regression function can be evaluated quantitatively by means of a performance functional of the form

Performance functional = objective term + regularization term. 

For function regression problems, the objective term is measures the error between the outputs from the neural network and the targets in the data set. On the other hand, regularization is normally used when the number of instances in the data set is small or when the data is noisy. In other situations, regularization might not be necessary.

The solution approach to a function regression problem for is to obtain a neural network which minimizes the performance functional. Note that neural networks represent functions. In that way, the function regression problem is formulated as a variational problem.

The second step is to assign the multilayer perceptron an objective functional. This is to be the normalized squared error. The variational statement of the function regression problems being considered here is then to find a function y(x) 2 V for which the functional defined on V , takes on a minimum value. Note that Q is here the number of training instances. Evaluation of the objective functional in Equation (7.19) just require explicit expressions for the function represented by the different multilayer perceptrons. On the other hand, evaluation of the objective function gradient vector, is obtained by the back-propagation algorithm derived in Section 5. This technique gives the greatest accuracy and numerical efficiency.

PerformanceFunctional performance_functional(&neural_network, &data_set);

5. Training strategy

The training strategy is entrusted to solve the reduced function optimization problem by minimizing the performance function.

In general, evaluation, gradient and Hessian of the error function can be computed analytically. Zero order training algorithms, such as the evolutionary algorithm, converge extremely slowly and they are not a good choice. On the other hand, second order training algorithms, such as the Newton's method, need evaluation of the Hessian and are neither a good choice. In practice, first order algorithms are recommended for solving function regression problems. The Levenberg-Marquardt is a good choice for small and medium sized problems. Due to storage requirements, that algorithm is not recommended for big sized problems, and a quasi-Newton method with BFGS training direction and Brent training rate is preferable.

In order to study the convergence of the optimization process, it is useful to plot the behaviour of some variables related to the multilayer perceptron, the error functional or the training algorithm as a function of the iteration step. Some common training history variables are: Form all the training history variables, may be the most important one is the evaluation history. Also, it is important to analyze the final values of some variables. The most important training results numbers are:

The third step in solving this problem is to assign the objective function a training algorithm. We use the quasi-Newton method described in Section 6 for training. In this example, we set the training algorithm to stop after 1000 epochs of the training algorithm.

TrainingStrategy training_strategy(&performance_functional);

QuasiNewtonMethod* quasi_Newton_method_pointer = training_strategy.get_quasi_Newton_method_pointer();

quasi_Newton_method_pointer->set_minimum_performance_increase(1.0e-3);

TrainingStrategy::Results training_strategy_results = training_strategy.perform_training();

The presence of noise in the training data set makes the objective function to have local minima. This means that, when solving function regression problems, we should always repeat the learning process from several different starting positions. During the training process the objective function decreases until the stopping criterion is satisfied.

6. Testing analysis

The performance of a neural network can be measured to some extent by the performance evaluation on the testing set, but it is useful to investigate the response in more detail. One option is to perform a regression analysis between the network response and the corresponding targets for an independent testing subset.

This analysis leads to 3 parameters for each output variable. The first two parameters, a and b, correspond to the y-intercept and the slope of the best linear regression relating outputs and targets. The third parameter, R^{2}, is the correlation coefficient between the outputs and the targets.

If we had a perfect fit (outputs exactly equal to targets), the slope would be 1, and the y-intercept would be 0. If the correlation coefficient is equal to 1, then there is perfect correlation between the outputs from the neural network and the targets in the testing subset.

The last step is to test the generalization performance of the trained neural network. Here we compare the values provided by this technique to the actually observed values.

NeuralNetwork nn;

Bibliography

OpenNN Copyright © 2014 Roberto Lopez (Artelnics)