6 steps to build a neural network in OpenNN

This tutorial shows the main steps for building a neural network model using OpenNN. You can find the script for the example we will use on GitHub.

The central goal here is to design a model that makes reasonable classifications for new data or, in other words, one that exhibits good generalization.

If you want to know more about the concepts in this tutorial, you can read this neural network tutorial or the machine learning blog created by Neural Designer.

Contents:

  1. Data set.
  2. Neural network.
  3. Training strategy.
  4. Model selection.
  5. Testing analysis.
  6. Model deployment.

1. Data set

The first step is to prepare the data set, which is the source of information for the classification problem. For that, we need to configure the following concepts:

  • Data source.
  • Variables.
  • Instances.

The data source is the file iris_plant_original.csv. It contains the data for this example in semicolon-separated values (CSV) format and can be loaded as:

// Dataset
Dataset dataset("path_to_source/iris_plant_original.csv", ";", true, false);

There are five columns and 150 rows. The variables in this problem are:

  • sepal_length: Sepal length, used as input.
  • sepal_width: Sepal width, used as input.
  • petal_length: Petal length, used as input.
  • petal_width: Petal width, used as input.
  • class: Iris Setosa, Versicolor, or Virginica, used as target.

In this regard, OpenNN recognizes categorical variables and converts them to numerical values. In this example, the transformation is as follows:

  • iris_setosa: 1 0 0.
  • iris_versicolor: 0 1 0.
  • iris_virginica: 0 0 1.

Then, there will be 7 numerical variables in the dataset. Once we have the data ready, we will obtain information on the variables, including names and statistical descriptives.

const vector<string> inputs_names = dataset.get_feature_names("Input");
const vector<string> targets_names = dataset.get_feature_names("Target");

The instances are divided into training, selection, and testing subsets. They represent 60% (90), 20% (30), and 20% (30) of the original instances, respectively, and are split randomly using the following command:

// Split data set into training, selection and testing samples
dataset.split_samples_random();

To get the input and target features count, we use the following command:

const Index input_features_number = dataset.get_features_number("Input");
const Index target_features_number = dataset.get_features_number("Target");

We scale the dataset to ensure the neural network operates in the best possible conditions. The scaling method for each variable is read from the dataset (defaults to MinimumMaximum). The call returns the descriptives (min, max, mean, standard deviation) computed during scaling, which we will use later to set up the network’s scaling layer:

// Scale input features
const vector<Descriptives> inputs_descriptives = dataset.scale_features("Input");

In this case, we did not scale the targets because they are either 0 or 1, which works well for the neural network.

For more information about the data set methods, see the Dataset class.

2. Neural network

The second step is to choose the correct neural network architecture. For classification problems, it is usually composed of:

  • A scaling layer.
  • One or more dense (fully-connected) layers with a non-linear activation.
  • A final dense layer with Softmax (multi-class) or Sigmoid (binary) activation.

This architecture is already defined in OpenNN as ClassificationNetwork and can be created as follows:

// Neural network architecture
const Index neurons_number = 3;
ClassificationNetwork neural_network(
    {input_features_number},
    {neurons_number},
    {target_features_number}
);

The NeuralNetwork base class is responsible for building the network and adequately organizing the layers of neurons. If you need more complex architectures, see the NeuralNetwork class.

Once the neural network has been created, we can attach the input and output names so the model expression and exported code use meaningful identifiers:

// Set input and output names
neural_network.set_input_names(inputs_names);
neural_network.set_output_names(targets_names);

For the scaling layer, we feed it the descriptives we computed in step 1 and confirm the scaler:

// Configure scaling layer
Scaling* scaling_layer = static_cast<Scaling*>(neural_network.get_first("Scaling"));

scaling_layer->set_descriptives(inputs_descriptives);
scaling_layer->set_scalers("MinimumMaximum");

With that the model is ready. We proceed to the learning process with TrainingStrategy.

3. Training strategy

The third step is to set the training strategy, which is composed of:

  • Loss function.
  • Optimization algorithm.

First, we construct the training strategy object:

// Training strategy
TrainingStrategy training_strategy(&neural_network, &dataset);

Then, set the error term:

// Loss function
training_strategy.set_loss("NormalizedSquaredError");

And finally, the optimization algorithm:

// Optimization algorithm
training_strategy.set_optimization_algorithm("AdaptiveMomentEstimation");

Note that this part is optional: TrainingStrategy picks sensible defaults per network type (for example, cross-entropy loss and Adam for classification). We can now start training:

// Train the model
training_strategy.train();

If we need finer control, OpenNN exposes the optimizer’s hyperparameters. For example, with Adam:

// Configure Adam optimizer
AdaptiveMomentEstimation* adam =
    dynamic_cast<AdaptiveMomentEstimation*>(
        training_strategy.get_optimization_algorithm()
    );

adam->set_loss_goal(1.0e-3f);
adam->set_maximum_epochs(10000);
adam->set_display_period(1000);

// Train the model
training_strategy.train();

For more information about the training strategy methods, see the TrainingStrategy class.

4. Model selection

The fourth step is to set the model selection, which is composed of:

  • Inputs selection algorithm.
  • Neurons selection algorithm.

If you’re unsure about your architecture choice, the model selection class helps identify the architecture with the best generalization, minimizing errors on the selection dataset.

The first step is to construct the model selection object:

// Model selection
ModelSelection model_selection(&training_strategy);

In this example, we want to optimize the number of neurons in the hidden layer using the neurons selection algorithm:

// Perform neurons selection
model_selection.perform_neurons_selection();

Once the algorithm is finished, our model will have the optimal architecture for our problem.

For more information about the model selection methods, see the ModelSelection class.

5. Testing analysis

The fifth step is to evaluate our model. For that purpose, we use the testing analysis class, which validates the model’s generalization performance. Here, we compare the neural network outputs to the corresponding targets in the testing instances of the data set.

First, we reverse the scaling we applied in step 1, so the testing data is back in its original units:

// Unscale input features
dataset.unscale_features("Input", inputs_descriptives);

We are ready to test our model. As in the previous cases, we start by building the testing analysis object:

// Testing analysis
TestingAnalysis testing_analysis(&neural_network, &dataset);

And run the test. In our case, we use a confusion matrix:

// Confusion matrix
const MatrixI confusion = testing_analysis.calculate_confusion();

In a confusion matrix, rows represent targets (real values) and columns represent outputs (predicted values). The diagonal cells show correctly classified cases; the off-diagonal cells show misclassified cases.

For more information about the testing analysis methods, see the TestingAnalysis class.

6. Model deployment

Once our model is completed, the neural network can predict outputs for inputs it has never seen. This process is called model deployment.

For instance, suppose the new inputs are:

  • Sepal length: 5.10 cm.
  • Sepal width: 3.50 cm.
  • Petal length: 1.40 cm.
  • Petal width: 0.20 cm.

In OpenNN, we feed them to the network as a row matrix and read back the class probabilities:

// Model inference
MatrixR inputs(1, 4);
inputs << 5.1, 3.5, 1.4, 0.2;

const MatrixR outputs = neural_network.calculate_outputs(inputs);

The scaling layer is part of the network, so the input is scaled internally — there is no need to pre-scale the row before calling calculate_outputs.

We can also save the trained model as a standalone source file for embedding in C, Python, JavaScript or PHP:

// Model export
ModelExpression model_expression(&neural_network);

model_expression.save("../data/expression.c",  ModelExpression::ProgrammingLanguage::C);
model_expression.save("../data/expression.py", ModelExpression::ProgrammingLanguage::Python);