DataSet class

This is the documentation for the python DataSet class and its methods in the OpenNN python module.

This class represents the concept of data set for data modelling problems, such as function regression, classification and time series prediction. It basically consists of a data matrix plus Variables and Instances objects.

Initialization methods

DataSet()

Default initializaion method. It creates a data set object with zero instances and zero inputs and target variables. It also initializes the rest of class members to their default values.

DataSet(data)

Data initialization method. It creates a data set object from a data matrix. It also initializes the rest of the class members to their default values.


DataSet(new_instances_number, new_variables_number)

Instances and variables number initialization method. It creates a data set object with given instances and variables numbers. All the variables are set as inputs. It also initializes the rest of the class members to their default values.


DataSet(instances_number, inputs_number, targets_number)

Instances number, input variables number and target variables number initialization method. It creates a data set object with given instances and inputs and target variables numbers. It also initializes the rest of the class members to their default values.


DataSet(data_file_name)

File initialization method. It creates a data set object by loading the object members from a data file. Please mind about the file format. This is specified in the User's Guide.


DataSet(data_file_name, separator)

File and separator initialization method. It creates a data set object by loading the object members from a data file. It also sets a separator. Please mind about the file format. This is specified in the User's Guide.


DataSet(other_data_set)

Copy initialization method. It creates a copy of an existing inputs targets data set object.

General methods

variables()

Returns a constant reference to the variables objects composing this data set object.

set_data_file_name(new_data_file_name)

Sets the name of the data file. It also loads the data from that file. Moreover, it sets the variables and instances objects.


set_separator(new_separator)

Sets a new separator from a string.


load_data()

This method loads the data file.

print_data()

Prints to the sceen the values of the data matrix.

scale_inputs_minimum_maximum()

Scales the input variables with the calculated minimum and maximum values from the data matrix. It updates the input variables of the data matrix. It also returns a vector of vectors with the minimum and maximum values of the input variables.

scale_targets_minimum_maximum()

Scales the target variables with the calculated minimum and maximum values from the data matrix. It updates the target variables of the data matrix. It also returns a vector of vectors with the statistics of the input target variables.

instances()

Returns a constant reference to the instances objects composing this data set object.

data()

Returns a reference to the data matrix in the data set. The number of rows is equal to the number of instances. The number of columns is equal to the number of variables.

training_data()

Returns a matrix with the training instances in the data set. The number of rows is the number of training instances. The number of columns is the number of variables.

selection_data()

Returns a matrix with the selected instances in the data set. The number of rows is the number of selection instances. The number of columns is the number of variables.

testing_data()

Returns a matrix with the testing instances in the data set. The number of rows is the number of testing instances. The number of columns is the number of variables.

inputs_eigen()

Returns a matrix with the input variables in the data set. The number of rows is the number of instances. The number of columns is the number of input variables.

targets_eigen()

Returns a matrix with the target variables in the data set. The number of rows is the number of instances. The number of columns is the number of target variables.

training_inputs()

Returns a matrix with training instances and input variables. The number of rows is the number of training instances. The number of columns is the number of input variables.

training_targets()

Returns a matrix with training instances and target variables. The number of rows is the number of training instances. The number of columns is the number of target variables.

selection_inputs()

Returns a matrix with selection instances and input variables. The number of rows is the number of selection instances. The number of columns is the number of input variables.

selection_targets()

Returns a matrix with selection instances and target variables. The number of rows is the number of selection instances. The number of columns is the number of target variables.

testing_inputs()

Returns a matrix with testing instances and input variables. The number of rows is the number of testing instances. The number of columns is the number of input variables.

testing_target()

Returns a matrix with testing instances and target variables. The number of rows is the number of testing instances. The number of columns is the number of target variables.

set_variable_use(i, new_use)

Set the use for a variable in the DataSet


input_target_correlations()

Calculates the linear correlations between all outputs and all inputs. It returns a matrix with the number of rows, the targets number and the number of columns the inputs number. Each element contains the linear correlation between a single target and a single output.