Text Classification: Amazon reviews classification

Classify Amazon reviews into positive and negative using OpenNN

Generally, the feedback provided by a customer on a product can be categorized into positive and negative. Interpreting customer feedback through product reviews helps companies evaluate how satisfied the customers are with their products/services. This example aims to assess whether a review of an Amazon product could be positive or negative from its content.

// Retrieve input and target variable information

const vector<string> input_words =
    data_set.get_variable_names(DataSet::VariableUse::Input);

const vector<string> targets_names =
    data_set.get_variable_names(DataSet::VariableUse::Target);

const Index words_number =
    data_set.get_variables_number(DataSet::VariableUse::Input);

const Index target_variables_number =
    data_set.get_variables_number(DataSet::VariableUse::Target);

For more information on the data set methods, see the DataSet class.

3. Neural network

The second step is to choose the correct neural network architecture. For text classification problems, it is usually composed of:

A scaling layer.
Two perceptron layers.
A probabilistic layer.
An unscaling layer.

The NeuralNetwork class is responsible for building the neural network and adequately organizing the layers of neurons using the following constructor. If you need more complex architectures, you should see the NeuralNetwork class.

// Initialize text classification neural network

const Index hidden_neurons_number = 6;

NeuralNetwork neural_network(
    NeuralNetwork::ModelType::TextClassification,
    {words_number},
    {hidden_neurons_number},
    {target_variables_number}
);

Once the neural network has been created, we can introduce information in the layers for a more precise calibration:

// Assign input and output names to the neural network

neural_network.set_input_names(inputs_words);
neural_network.set_output_names(targets_names);

Therefore, we have already created a good-looking model. Thus, we proceed to the learning process with TrainingStrategy.

4. Training strategy

The third step is to set the training strategy, which is composed of:

Loss index.
Optimization algorithm.

Firstly, we construct the training strategy object

// Initialize training strategy
TrainingStrategy training_strategy(&neural_network, &data_set);

Then, we set the error term

// Set cross-entropy loss function
training_strategy.set_loss_method(
    TrainingStrategy::LossMethod::CROSS_ENTROPY_ERROR
);

and finally, the optimization algorithm

// Set optimization method to Adam
training_strategy.set_optimization_method(
    TrainingStrategy::OptimizationMethod::ADAPTIVE_MOMENT_ESTIMATION
);

We can now start the training process by using the command

// Execute training process
training_strategy.perform_training();

For more information on the training strategy methods, see the TrainingStrategy class.

5. Testing analysis

The fourth step is to evaluate our model. For that purpose, we need to use the TestingAnalysis class, whose goal is to validate the model’s generalization performance. Here, we compare the neural network outputs to the corresponding targets in the testing instances of the data set.

We start by building the testing analysis object

// Initialize testing analysis
TestingAnalysis testing_analysis(&neural_network, &data_set);

and perform the testing. In our case, we use binary classification tests

// Print binary classification test results
testing_analysis.print_binary_classification_tests();

For more information on the testing analysis methods, see the TestingAnalysis class.

6. Model deployment

Once our model is completed, the neural network is ready to predict outputs for inputs it has never seen. This process is called model deployment.

To generate predictions with new data, you can use

// Compute network outputs
neural_network.calculate_outputs();

For instance, the new input review is:

«Highly recommend for anyone who has a Bluetooth phone.»

and, in OpenNN, we can write it as

// Convert text reviews into input tensors

string review_1 =
    "Highly recommend for anyone who has a Bluetooth phone.";
Tensor<type, 1> processed_review_1 =
    data_set.sentence_to_data(review_1);

string review_2 =
    "You have to hold the phone at a particular angle for the other party to hear you clearly.";
Tensor<type, 1> processed_review_2 =
    data_set.sentence_to_data(review_2);

Once the reviews are transformed into numeric tensors, their predictions can be calculated utilizing:

// Build input batch and compute network outputs

Tensor<type, 2> input_data(2, words_number);

for (Index i = 0; i < words_number; i++)
{
    input_data(0, i) = processed_review_1(i);
    input_data(1, i) = processed_review_2(i);
}

Tensor<type, 2> outputs =
    neural_network.calculate_outputs(input_data);

You can also save the model using:

// Export network expressions

neural_network.save_expression(C, "../data/expression.txt");
neural_network.save_expression(Pithon, "../data/expression.txt");

You can implement the model in Python, PHP, and so on.

7. Full code

Joining all steps, we obtain the following code:

// --------------------
// DataSet
// --------------------

DataSet data_set;
data_set.set_data_path("path_to_source/amazon_cells_labelled.csv");
data_set.set_text_separator(DataSet::Separator::Tab);
data_set.read_csv();
data_set.split_samples_random();

const vector<string> input_words =
    data_set.get_variable_names(DataSet::VariableUse::Input);

const vector<string> targets_names =
    data_set.get_variable_names(DataSet::VariableUse::Target);

const Index words_number =
    data_set.get_variables_number(DataSet::VariableUse::Input);

const Index target_variables_number =
    data_set.get_variables_number(DataSet::VariableUse::Target);

// --------------------
// Neural Network
// --------------------

const Index hidden_neurons_number = 6;

NeuralNetwork neural_network(
    NeuralNetwork::ModelType::TextClassification,
    {words_number},
    {hidden_neurons_number},
    {target_variables_number}
);

// --------------------
// Training Strategy
// --------------------

TrainingStrategy training_strategy(&neural_network, &data_set);

training_strategy.set_loss_method(
    TrainingStrategy::LossMethod::CROSS_ENTROPY_ERROR
);

training_strategy.set_optimization_method(
    TrainingStrategy::OptimizationMethod::ADAPTIVE_MOMENT_ESTIMATION
);

training_strategy.perform_training();

// --------------------
// Testing Analysis
// --------------------

TestingAnalysis testing_analysis(&neural_network, &data_set);
testing_analysis.print_binary_classification_tests();

// --------------------
// Model deployment
// --------------------

string review_1 =
    "Highly recommend for anyone who has a Bluetooth phone.";
Tensor<type, 1> processed_review_1 =
    data_set.sentence_to_data(review_1);

string review_2 =
    "You have to hold the phone at a particular angle for the other party to hear you clearly.";
Tensor<type, 1> processed_review_2 =
    data_set.sentence_to_data(review_2);

Tensor<type, 2> input_data(2, words_number);

for (Index i = 0; i < words_number; i++)
{
    input_data(0, i) = processed_review_1(i);
    input_data(1, i) = processed_review_2(i);
}

Tensor<type, 2> outputs =
    neural_network.calculate_outputs(input_data);

// --------------------
// Save results
// --------------------

opennn::NeuralNetwork::ProgrammingLanguage C;
opennn::NeuralNetwork::ProgrammingLanguage Python;

neural_network.save_expression(C, "../data/amazon_reviews.txt");
neural_network.save_expression(Python, "../data/amazon_reviews.py");

This code can be exported to your C++ project.

References:

The data for this problem has been taken from the Kaggle Repository.

Text Classification: Amazon reviews classification

Classify Amazon reviews into positive and negative using OpenNN

Contents:

1. Application type

2. Data set

3. Neural network

4. Training strategy

5. Testing analysis

6. Model deployment

7. Full code

References:

Artificial Intelligence Techniques, Ltd.

Find Us At:

© 2024 - Artificial Intelligence Techniques, Ltd. All rights reserved.