DataSet class

This is the documentation for the python DataSet class and its methods in the OpenNN python module.

This class represents the concept of data set for data modelling problems, such as function regression, classification and time series prediction. It basically consists of a data matrix plus a Variables and an Instances objects.

Initialization methods

  • DataSet()

    Default initializaion method. It creates a data set object with zero instances and zero inputs and target variables. It also initializes the rest of class members to their default values.

  • DataSet(data)

    Data initialization method. It creates a data set object from a data matrix. It also initializes the rest of class members to their default values.

    • data Data matrix.

  • DataSet(new_instances_number, new_variables_number)

    Instances and variables number initialization method. It creates a data set object with given instances and variables numbers. All the variables are set as inputs. It also initializes the rest of class members to their default values.

    • new_instances_number Number of instances in the data set.
    • new_variables_number Number of variables.

  • DataSet(instances_number, inputs_number, targets_number)

    Instances number, input variables number and target variables number initialization method. It creates a data set object with given instances and inputs and target variables numbers. It also initializes the rest of class members to their default values.

    • new_instances_number Number of instances in the data set.
    • new_inputs_number Number of input variables.
    • new_targets_number Number of target variables.

  • DataSet(data_file_name)

    File initialization method. It creates a data set object by loading the object members from a data file. Please mind about the file format. This is specified in the User's Guide.

    • data_file_name Data file file name.

  • DataSet(data_file_name, separator)

    File and separator initialization method. It creates a data set object by loading the object members from a data file. It also sets a separator. Please mind about the file format. This is specified in the User's Guide.

    • data_file_name Data file file name.
    • separator Data file file name.

  • DataSet(other_data_set)

    Copy initialization method. It creates a copy of an existing inputs targets data set object.

    • other_data_set Data set object to be copied.

General methods

  • variables()

    Returns a constant reference to the variables object composing this data set object.

  • set_data_file_name(new_data_file_name)

    Sets the name of the data file. It also loads the data from that file. Moreover, it sets the variables and instances objects.

    • new_data_file_name Name of the file containing the data.

  • set_separator(new_separator)

    Sets a new separator from a string.

    • new_separator String with the separator value.
  • load_data()

    This method loads the data file.

  • print_data()

    Prints to the sceen the values of the data matrix.

  • scale_inputs_minimum_maximum()

    Scales the input variables with the calculated minimum and maximum values from the data matrix. It updates the input variables of the data matrix. It also returns a vector of vectors with the minimum and maximum values of the input variables.

  • scale_targets_minimum_maximum()

    Scales the target variables with the calculated minimum and maximum values from the data matrix. It updates the target variables of the data matrix. It also returns a vector of vectors with the statistics of the input target variables.

  • instances()

    Returns a constant reference to the instances object composing this data set object.

  • data()

    Returns a reference to the data matrix in the data set. The number of rows is equal to the number of instances. The number of columns is equal to the number of variables.

  • training_data()

    Returns a matrix with the training instances in the data set. The number of rows is the number of training instances. The number of columns is the number of variables.

  • selection_data()

    Returns a matrix with the selection instances in the data set. The number of rows is the number of selection instances. The number of columns is the number of variables.

  • testing_data()

    Returns a matrix with the testing instances in the data set. The number of rows is the number of testing instances. The number of columns is the number of variables.

  • inputs_eigen()

    Returns a matrix with the input variables in the data set. The number of rows is the number of instances. The number of columns is the number of input variables.

  • targets_eigen()

    Returns a matrix with the target variables in the data set. The number of rows is the number of instances. The number of columns is the number of target variables.

  • training_inputs()

    Returns a matrix with training instances and input variables. The number of rows is the number of training instances. The number of columns is the number of input variables.

  • training_targets()

    Returns a matrix with training instances and target variables. The number of rows is the number of training instances. The number of columns is the number of target variables.

  • selection_inputs()

    Returns a matrix with selection instances and input variables. The number of rows is the number of selection instances. The number of columns is the number of input variables.

  • selection_targets()

    Returns a matrix with selection instances and target variables. The number of rows is the number of selection instances. The number of columns is the number of target variables.

  • testing_inputs()

    Returns a matrix with testing instances and input variables. The number of rows is the number of testing instances. The number of columns is the number of input variables.

  • testing_target()

    Returns a matrix with testing instances and target variables. The number of rows is the number of testing instances. The number of columns is the number of target variables.

  • set_variable_use(i, new_use)

    Set the use for a variable in the DataSet

    • i Variable index.
    • new_use String with the use value (Input, Target, Unused, Time).
  • input_target_correlations()

    Calculates the linear correlations between all outputs and all inputs. It returns a matrix with number of rows the targets number and number of columns the inputs number. Each element contains the linear correlation between a single target and a single output.