Data set class

The DataSet class contains all the information to build our model. Every column represents a particular variable, and each row corresponds to one sample.

Throughout this tutorial, we will use the Iris data set to give a notion of how to use the most important methods of the DataSet class.


Sepal length   

Sepal width   

Petal length   

Petal width   

Iris flower   
5.1 3.5 1.4 0.2 Setosa
4.9 3.0 1.4 0.2 Setosa
4.7 3.2 1.3 0.2 Setosa
4.6 3.1 1.5 0.2 Setosa
5.0 3.6 1.4 0.2 Setosa
5.4 3.9 1.7 0.4 Setosa

Iris data set

You can download the iris data set here.

The DataSet class offers a wide variety of constructors. The easiest and the most common way to create a dataset object is by means of the default constructor, which creates an empty data set.

DataSet dataset;

Once we have created the dataset object, the next step is to fill it with data:

data_set.set_data_file_name("../data/iris_plant.csv");
data_set.set_separator("Space");
data_set.read_csv();

By default the last column is set as target and the remainder as inputs. Note that, in this case, the last column contains three different categories. It is possible to set the name of each columns by means of the DataSet class member called columns as follows

data_set.set_column_name(0, "sepal_length");
data_set.set_column_name(1, "sepal_width");
data_set.set_column_name(2, "petal_length");
data_set.set_column_name(3, "petal_width");
data_set.set_column_name(4, "iris_type");

The name of the categories of the last column is set automatically during the loading process.

It is also possible to set the use of each of the instances of the data set. For example, they can be splitted randomly or sequentially as follows

data_set.split_instances_random(0.7, 0.1, 0.2);
data_set.split_instances_sequential(0.7, 0.1, 0.2);

The first number corresponds to the ratio of training instances, the second number represents the ratio of selection and the last one correspond to the ratio of testing instances. By default, these values are set to 0.6, 0.2 and 0.2.

Finally, DataSet class implements some useful preprocessing methods, below we present some of them:

If you need more information about Dataset class visit DataSet Class Reference

NeuralNetwork ⇒