Neural networks basics

In this tutorial, we formulate the learning problem for neural networks and describe some learning tasks they can solve.

Contents:
  1. Learning tasks
  2. Learning problem

1. Learning tasks

Learning tasks for neural networks can be classified according to the source of information for them. In this way, some learning tasks that learn from data sets are function regression, pattern recognition or time series prediction.

The following figure shows the learning tasks for neural networks described in this section. As we can see, they are capable of dealing with a great range of applications. Any of that learning tasks are formulated as being a variational problem. All of them are solved using the three-step approach described in the previous section. Modelling and classification are the most traditional;

Function regression

Function regression is the most popular learning task for neural networks. It is also called modelling. The function regression problem can be regarded as the problem of approximating a function from a data set consisting of input-target instances. The targets are a specification of what the response to the inputs should be. While input variables might be quantitative or qualitative, in function regression, target variables are quantitative.

Loss indices for function regression are usually based on a sum of errors between the outputs from the neural network and the targets in the training data. As the training data is usually deficient, a regularization term might be required to solve the problem correctly.

An example is to design an instrument that can determine serum cholesterol levels from measurements of the spectral content of a blood sample. There are a number of patients for which there are measurements of several wavelengths of the spectrum. For the same patients, there are also measurements of several cholesterol levels based on serum separation.

Pattern recognition

The learning task of pattern recognition gives rise to artificial intelligence. That problem can be stated as the process whereby a received pattern, characterized by a distinct set of features, is assigned to one of a prescribed number of classes. Pattern recognition is also known as classification. Here the neural network learns from knowledge represented by a training data set consisting of input-target instances. The inputs include a set of features that characterize a pattern, and they can be quantitative or qualitative. The targets specify the class that each pattern belongs to and therefore are qualitative.

We can formulate classification problems can be formulated as being modelling problems. As a consequence, the loss index used here is also based on the sum squared error. Anyway, the learning task of pattern recognition is more challenging to solve than that of function regression. This means that a good knowledge of the state of the technique is recommended for success.

A typical example is to distinguish hand-written versions of characters. Images of the characters might be captured and fed to a computer. An algorithm then seeks to which can distinguish as reliably as possible between the characters.

1. Learning problem

Any application for neural networks involves a neural network itself, a data set, and a training strategy. The learning problem is then formulated to find a neural network that fits a data set through a training strategy.

The following figure depicts an activity diagram for the learning problem.
The solving approach here consists of three steps.
  1. The first step is to gather a data set with relevant information about the problem at hand.
  2. In the second step we choose a suitable neural network which will approximate the solution to the problem.
  3. The third step is to train the neural network to fit the data set by finding an optimal set of parameters.

As we will see, the learning problem for neural networks is formulated from a variational point of view. Indeed, learning tasks lie in terms of finding a function that causes some functional to assume an extreme value. Neural networks provide a general framework for solving variational problems.

Data set

The data set contains the information for creating the model. It comprises a data matrix in which columns represent variables and rows represent instances. The data is contained in a file with the following format:

				d_1_1   d_1_2   ...   d_1_q
				...     ...     ...   ...
				d_p_1   d_p_2   ...   d_p_q  
				

Here the number of instances is denoted p, while the number of variables is denoted q.

Variables in a data set can be of three types:
  • The inputs will be the independent variables in the model.
  • The targets will be the dependent variables in the model.
  • The unused variables will neither be used as inputs nor as targets.
It is rarely useful to have a neural network memorize a set of data. Typically, you want the neural network to be able to perform accurately on new data, that is, to generalize. In this way, instances can be:
  • Training instances, which are used to construct the model.
  • Selection instances, which are used for selecting the optimal order of the model.
  • Testing instances, which are used to validate the functioning of the model.
  • Unused instances, which are not used at all.

The crucial point is that testing instances are never used to choose among two or more neural networks. Instances used to choose the best of two or more neural networks are, by definition, generalization instances.

Neural network

A neuron model is a mathematical model of the behaviour of a single neuron in a biological nervous system. The most important neuron model is the so-called perceptron. The perceptron neuron model receives information in the form of numerical inputs. This information is then combined with a set of parameters to produce a message in the form of a single numerical output.

Most neural networks, even biological neural networks, exhibit a layered structure. In this work, layers are the basis for determining the architecture of a neural network. A layer of perceptrons takes a set of inputs to produce a set of outputs.

A deep neural network is built up by organizing layers of perceptrons in a network architecture. In this way, the architecture of a network refers to the number of layers, their arrangement and connectivity. The characteristic network architecture in OpenNN is the so-called feed-forward architecture. The multilayer perceptron can then be defined as a network architecture of perceptron layers. This neural network represents a parameterized function of several variables with excellent approximation properties.

To solve practical applications, different extensions must be added to the multilayer perceptron. Some of them include scaling, unscaling, bounding, probabilistic or conditions layers. Therefore, the neural network in OpenNN is composed of a multilayer perceptron plus some additional layers.

Training strategy

The procedure used to carry out the learning process is called training (or learning) strategy. The training strategy is applied to the neural network in order to obtain the best possible performance. The type of training is determined by how the adjustment of the parameters in the neural network takes place.

Loss index

The loss index plays an important role in the use of a deep neural network. It defines the task the deep neural network is required to do and measures the quality of the representation that the deep neural network is required to learn. The choice of a suitable loss index depends on the particular application.

A loss index in OpenNN is composed of two different terms: error, regularization. Sometimes, a single error term will be enough, but some applications will require regularized solutions.

  • Sum Squared Error
  • L1 regularization
  • L2 regularization

Training algorithm

The most general training strategy in OpenNN will include three different training algorithms: initialization, main and refinement. Most applications will only need one training algorithm, but some complex problems might require the combination of two or three of them.

  • Gradient descent

A generally good training strategy includes the normalized squared error and the quasi-Newton method.

Bibliography

  • C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.
  • H. Demuth, M. Beale, and M. Hagan. Neural Network Toolbox User's Gide. The MathWorks, Inc., 2009.
  • S. Haykin. Neural Networks: A Comprehensive Foundation. Prentice Hall.
  • R. Lopez. Neural Networks for Variational Problems in Engineering. PhD Thesis, Technical University of Catalonia, 2008.