Diagnose breast cancer from fine-needle aspirate images using OpenNN

This example aims to assess whether a lump in a breast could be malignant (cancerous) or benign (non-cancerous) from digitized images of a fine-needle aspiration biopsy.

The breast cancer database was obtained from the University of Wisconsin Hospitals, Madison, from Dr William H. Wolberg.

Contents:

  1. Application type.
  2. Data set.
  3. Neural network.
  4. Training strategy.
  5. Testing analysis.
  6. Model seployment.
  7. Full Code.

1. Application type

The variable to be predicted can have two values (malignant or benignant tumour). Therefore, this is a binary classification project.

The goal here is to model the probability of a malignant tumour, conditioned on the fine needle aspiration test features.

2. Data set

The first step is to prepare the data set, the source of information for the classification problem. For that, we need to configure the following concepts:

The data source is the file breast_cancer.csv. It contains the data for this example in comma-separated values (CSV) format and can be loaded as

DataSet data_set("path_to_source/breast_cancer.csv",',',true);

The number of columns is 10, and the number of rows is 683. The variables in this problem are:

Once we have the data ready, we will get the information of the variables, such as names and statistical descriptives.

const Tensor<string, 1> inputs_names = data_set.get_input_variables_names();
const Tensor<string, 1> targets_names = data_set.get_target_variables_names();

The instances are divided into training, selection, and testing subsets. They represent 60% (409), 20% (137), and 20% (137) of the original instances, respectively, and are split at random using the following command

data_set.split_samples_random();

To get the input variables number and target variables number, we use the following command

const Index input_variables_number = data_set.get_input_variables_number();
const Index target_variables_number = data_set.get_target_variables_number();

For more information about the data set methods, see DataSet class.

3. Neural network

The second step is to choose the correct neural network architecture. For classification problems, it is usually composed by:

The NeuralNetwork class is responsible for building the neural network and properly organizing the layers of neurons using the following constructor. If you need more complex architectures, you should see NeuralNetwork class.

const Index hidden_neurons_number = 6;
NeuralNetwork neural_network(NeuralNetwork::Classification,
{input_variables_number, hidden_neurons_number, target_variables_number});

Once the neural network has been created, we can introduce information in the layers for a more precise calibration

neural_network.set_inputs_names(inputs_names);
neural_network.set_outputs_names(targets_names);

Therefore, we have already created a good-looking model. Thus we proceed to the learning process with TrainingStrategy.

4. Training strategy

The third step is to set the training strategy, which is composed of:

Firstly, we construct the training strategy object

TrainingStrategy training_strategy(&neural_network, &data_set);

then, set the error term

training_strategy.set_loss_method(TrainingStrategy::NORMALIZED_SQUARED_ERROR);

and finally the optimization algorithm

training_strategy.set_optimization_method(TrainingStrategy::ADAPTIVE_MOMENT_ESTIMATION);

We can now start the training process by using the command

training_strategy.perform_training();

For more information about the training strategy methods, see TrainingStrategy class.

5. Testing analysis

The fourth step is to evaluate our model. For that purpose, we need to use the testing analysis class, whose goal is to validate the model's generalization performance. Here, we compare the neural network outputs to the corresponding targets in the testing instances of the data set.

As in the previous cases, we start by building the testing analysis object

TestingAnalysis testing_analysis(&neural_network, &data_set);

and perform the testing, in our case we use binary classification tests

testing_analysis.print_binary_classification_tests();

For more information about the testing analysis methods, see TestingAnalysis class.

6. Model deployment

Once our model is completed, the neural network is ready to predict outputs for inputs that it has never seen. This process is called model deployment.

To generate predictions with new data, you can use

neural_network.calculate_outputs();

For instance, the new inputs are:

and in OpenNN we can write it as

Tensor<type, 2> inputs(1,9);
inputs.setValues({{type(4),type(3),type(3),type(2),type(3),type(4),type(3),type(2),type(1)}});
neural_network.calculate_outputs(inputs);

or save the model.

neural_network.save_expression_c("../data/expression.txt");
neural_network.save_expression_python("../data/expression.txt");

The model can be implemented in python, php, ... .

7. Full Code

Joining all steps, we obtain the following code:

// DataSet
DataSet data_set("../data/breast_cancer.csv", ';', true);
const Index input_variables_number = data_set.get_input_variables_number();
const Index target_variables_number = data_set.get_target_variables_number();
// Neural Network
const Index hidden_neurons_number = 6;
NeuralNetwork neural_network(NeuralNetwork::Classification,
{input_variables_number,hidden_neurons_number,target_variables_number});
// Training Strategy
TrainingStrategy training_strategy(&neural_network, &data_set);
training_strategy.set_loss_method(TrainingStrategy::CROSS_ENTROPY_ERROR);
training_strategy.set_optimization_method(TrainingStrategy::QUASI_NEWTON_METHOD);
training_strategy.perform_training();
// Testing Analysis
TestingAnalysis testing_analysis(&neural_network, &data_set);
testing_analysis.print_binary_classification_tests();
// Model deployment
Tensor inputs(1,4);
inputs.setValues({{type(4),type(3),type(3),type(2),type(3),type(4),type(3),type(2),type(1)}});
neural_network.calculate_outputs(inputs);
// Save results
neural_network.save_expression_c("../data/breast_cancer.txt");
neural_network.save_expression_python("../data/breast_cancer.py");

This code can be exported to your C++ project.

References: