|
OpenNN
Open-source neural networks library
|
Base data container with samples, variables and per-variable metadata. More...
#include <dataset.h>
Public Types | |
| enum class | Codification { UTF8 , SHIFT_JIS } |
| Source-file character encoding. More... | |
| enum class | Separator { Space , Tab , Comma , Semicolon } |
| Field-separator type for tabular files. More... | |
| enum class | MissingValuesMethod { Unuse , Mean , Median , Interpolation } |
| Strategy for replacing missing values. More... | |
Public Member Functions | |
| Dataset (const Index samples_number=0, const Shape &input_shape={0}, const Shape &target_shape={0}) | |
| Constructs an empty dataset of given dimensions. | |
| Dataset (const filesystem::path &data_path, const string &separator, bool has_header=true, bool has_ids=false, const Codification &codification=Codification::UTF8) | |
| Constructs a dataset by loading a delimited text file. | |
| Index | get_samples_number () const |
Returns the total number of samples (rows of data). | |
| Index | get_samples_number (const string &role_name) const |
| Returns the number of samples assigned to a given role. | |
| Index | get_used_samples_number () const |
| Returns the number of samples that are not "None". | |
| vector< Index > | get_sample_indices (const string &role_name) const |
| Returns the indices of the samples assigned to a given role. | |
| vector< Index > | get_used_sample_indices () const |
| Returns the indices of all samples that are not "None". | |
| const vector< SampleRole > & | get_sample_roles () const |
| Returns the per-sample role assignments. | |
| vector< Index > | get_sample_roles_vector () const |
| Returns the per-sample role indices as plain integers. | |
| VectorI | get_sample_role_numbers () const |
| Counts the samples assigned to each role. | |
| Index | get_variables_number () const |
Returns the total number of variables (columns of data). | |
| Index | get_variables_number (const string &role_name) const |
| Returns the number of variables assigned to a given role. | |
| Index | get_used_variables_number () const |
| Returns the number of variables that are not "Unused". | |
| const vector< Variable > & | get_variables () const |
| Returns the per-variable metadata. | |
| vector< Variable > | get_variables (const string &role_name) const |
| Returns the variables assigned to a given role. | |
| Index | get_variable_index (const string &name) const |
| Returns the column index of the variable with a given name. | |
| Index | get_variable_index (const Index id) const |
| Returns the column index of the variable with a given numeric id. | |
| vector< Index > | get_variable_indices (const string &role_name) const |
| Returns the column indices of the variables assigned to a given role. | |
| vector< Index > | get_used_variables_indices () const |
| Returns the column indices of all variables that are not "Unused". | |
| vector< string > | get_variable_names () const |
| Returns the names of every variable. | |
| vector< string > | get_variable_names (const string &role_name) const |
| Returns the names of the variables assigned to a given role. | |
| VariableType | get_variable_type (const Index index) const |
| Returns the type of a variable (Numeric, Binary, Categorical, ...). | |
| vector< VariableType > | get_variable_types (const vector< Index > indices) const |
| Returns the types of a list of variables. | |
| Index | get_features_number () const |
| Returns the total number of features. | |
| Index | get_features_number (const string &role_name) const |
| Returns the number of features assigned to a given role. | |
| Index | get_used_features_number () const |
| Returns the number of features that are not "Unused". | |
| vector< string > | get_feature_names () const |
| Returns the names of every feature. | |
| vector< string > | get_feature_names (const string &role_name) const |
| Returns the names of the features assigned to a given role. | |
| vector< vector< Index > > | get_feature_indices () const |
| Returns the per-variable feature indices. | |
| vector< Index > | get_feature_indices (const Index variable_index) const |
| Returns the feature indices for a single variable. | |
| vector< Index > | get_feature_indices (const string &role_name) const |
| Returns the feature indices for variables of a given role. | |
| vector< Index > | get_used_feature_indices () const |
| Returns the feature indices for all variables that are not "Unused". | |
| vector< Index > | get_feature_dimensions () const |
| Returns the per-variable feature dimension (1 for Numeric, N for Categorical). | |
| Shape | get_shape (const string &role_name) const |
| Returns the input or target shape used by the network. | |
| vector< string > | get_feature_scalers (const string &role_name) const |
| Returns the scaler chosen for each variable of a given role. | |
| virtual void | get_batches (const vector< Index > &sample_indices, Index batch_size, bool shuffle, vector< vector< Index > > &batches) const |
| Splits a list of sample indices into batches. | |
| const MatrixR & | get_data () const |
| Returns the raw data matrix. | |
| MatrixR | get_feature_data (const string &role_name) const |
| Returns the data matrix restricted to the features of a given role. | |
| MatrixR | get_data (const string &sample_role, const string &variable_role) const |
| Returns the data restricted to a sample-role and variable-role intersection. | |
| MatrixR | get_data_from_indices (const vector< Index > &sample_indices, const vector< Index > &variable_indices) const |
| Returns the data restricted to specific samples and variables. | |
| VectorR | get_sample_data (const Index sample_index) const |
| Returns a single sample as a row vector. | |
| MatrixR | get_variable_data (const Index variable_index) const |
| Returns the data for a single variable across all samples. | |
| MatrixR | get_variable_data (const Index variable_index, const vector< Index > &sample_indices) const |
| Returns the data for a single variable on a subset of samples. | |
| MatrixR | get_variable_data (const string &variable_name) const |
| Returns the data for a single variable identified by name. | |
| const vector< vector< string > > & | get_data_file_preview () const |
| Returns the cached preview of the source file (first rows). | |
| MissingValuesMethod | get_missing_values_method () const |
| Returns the configured missing-value strategy. | |
| string | get_missing_values_method_string () const |
| Returns the missing-value strategy as a string. | |
| const filesystem::path & | get_data_path () const |
| Returns the path to the source data file. | |
| const Separator & | get_separator () const |
| Returns the configured field separator. | |
| string | get_separator_string () const |
| Returns the field separator as the actual delimiter character(s). | |
| string | get_separator_name () const |
| Returns the field separator as a human-readable name. | |
| const Codification & | get_codification () const |
| Returns the configured source-file codification. | |
| const string | get_codification_string () const |
| Returns the codification as a string. | |
| const string & | get_missing_values_label () const |
| Returns the label that marks missing values in the source file. | |
| bool | get_display () const |
| Reports whether progress messages are printed. | |
| bool | is_empty () const |
| Reports whether the data matrix is empty. | |
| Shape | get_input_shape () const |
| Returns the input shape. | |
| Shape | get_target_shape () const |
| Returns the target shape. | |
| void | set (const Index samples_number=0, const Shape &input_shape={}, const Shape &target_shape={}) |
| Resets the dataset to a synthetic shape. | |
| void | set (const filesystem::path &data_path, const string &separator, bool has_header=true, bool has_ids=false, const Dataset::Codification &codification=Codification::UTF8) |
| Resets the dataset by loading a delimited text file. | |
| void | set (const filesystem::path &file_name) |
| Resets the dataset by loading a previously serialized JSON state. | |
| void | set_default () |
| Resets configuration members to defaults. | |
| void | set_sample_roles (const string &role_name) |
| Assigns the same role to every sample. | |
| void | set_sample_role (const Index sample_index, const string &role_name) |
| Assigns a role to a single sample. | |
| void | set_sample_roles (const vector< string > &role_names) |
| Assigns roles to all samples from a parallel string vector. | |
| void | set_sample_roles (const vector< Index > &sample_indices, const string &role_name) |
| Assigns the same role to a list of samples. | |
| void | set_variables (const vector< Variable > &new_variables) |
| Replaces the per-variable metadata. | |
| void | set_default_variable_names () |
| Sets default names ("variable_1", "variable_2", ...) for every variable. | |
| virtual void | set_variable_roles (const vector< string > &role_names) |
| Assigns roles to all variables from a parallel string vector. | |
| void | set_variables (const string &description) |
| Re-creates the variables vector from an input/target shape descriptor. | |
| void | set_variable_indices (const vector< Index > &input_indices, const vector< Index > &target_indices) |
| Marks selected variables as Input and others as Target. | |
| void | set_input_variables_unused () |
| Marks all input variables as Unused. | |
| void | set_variable_role (const Index variable_index, const string &role_name) |
| Sets the role of a single variable by index. | |
| void | set_variable_role (const string &variable_name, const string &role_name) |
| Sets the role of a single variable by name. | |
| void | set_variable_type (const Index variable_index, const VariableType &type) |
| Sets the type of a single variable by index. | |
| void | set_variable_type (const string &variable_name, const VariableType &type) |
| Sets the type of a single variable by name. | |
| void | set_variable_types (const VariableType &type) |
| Sets every variable to a given type. | |
| void | set_variable_names (const vector< string > &new_variable_names) |
| Replaces the names of every variable. | |
| void | set_variables_number (const Index new_size) |
| Resizes the variables vector. | |
| void | set_variable_scalers (const string &scaler_name) |
| Sets the same scaler on every variable. | |
| void | set_variable_scalers (const vector< string > &scaler_names) |
| Sets one scaler per variable. | |
| void | set_binary_variables () |
| Detects binary variables (two distinct values) and tags them accordingly. | |
| void | set_feature_names (const vector< string > &new_feature_names) |
| Names every feature. | |
| void | set_variable_roles (const string &role_name) |
| Assigns the same role to every variable. | |
| void | set_shape (const string &role_name, const Shape &new_shape) |
| Sets the input or target shape. | |
| void | set_data (const MatrixR &new_data) |
| Replaces the data matrix. | |
| void | set_data_path (const filesystem::path &new_data_path) |
| Sets the path to the source data file. | |
| void | set_has_header (bool new_has_header) |
| Sets whether the source file has a header row. | |
| void | set_has_ids (bool new_has_ids) |
| Sets whether the source file has a sample-id column. | |
| void | set_separator (const Separator &new_separator) |
| Sets the field separator. | |
| void | set_separator_string (const string &new_separator_string) |
| Sets the field separator from its delimiter character(s). | |
| void | set_separator_name (const string &new_separator_name) |
| Sets the field separator from its human-readable name. | |
| void | set_codification (const Codification &new_codification) |
| Sets the source-file codification. | |
| void | set_codification (const string &new_codification) |
| Sets the source-file codification from its name. | |
| void | set_missing_values_label (string label) |
| Sets the label used for missing values in the source file. | |
| void | set_missing_values_method (const MissingValuesMethod &method) |
| Sets the missing-value handling strategy. | |
| void | set_missing_values_method (const string &method_name) |
| Sets the missing-value handling strategy from its name. | |
| void | set_gmt (const Index new_gmt) |
| Sets the GMT offset for time variables. | |
| void | set_display (bool new_display) |
| Toggles progress messages. | |
| bool | is_sample_used (const Index i) const |
| Reports whether a sample is used (any role other than None). | |
| bool | has_binary_variables () const |
| Reports whether at least one variable is binary. | |
| bool | has_categorical_variables () const |
| Reports whether at least one variable is categorical. | |
| bool | has_binary_or_categorical_variables () const |
| Reports whether the dataset has any binary or categorical variable. | |
| bool | has_time_variable () const |
| Reports whether at least one variable plays the Time role. | |
| bool | has_validation () const |
| Reports whether at least one sample is assigned to Validation. | |
| bool | has_missing_values (const vector< string > &labels) const |
| Reports whether the dataset has missing values matching any of the supplied labels. | |
| void | split_samples (const float training_ratio=0.6f, float selection_ratio=0.2f, float testing_ratio=0.2f, bool shuffle=true) |
| Splits samples into training/validation/testing partitions. | |
| void | split_samples_sequential (const float training_ratio=0.6f, float selection_ratio=0.2f, float testing_ratio=0.2f) |
| Splits samples into partitions in their original order. | |
| void | split_samples_random (const float training_ratio=0.6f, float selection_ratio=0.2f, float testing_ratio=0.2f) |
| Splits samples into partitions after random shuffling. | |
| vector< string > | unuse_uncorrelated_variables (const float minimum_correlation=0.25f) |
| Marks variables with low correlation against the target as Unused. | |
| vector< string > | unuse_collinear_variables (const float maximum_correlation=0.95f) |
| Marks variables strongly correlated against another input as Unused. | |
| void | set_data_constant (const float value) |
| Fills the data matrix with a constant value. | |
| vector< Descriptives > | calculate_feature_descriptives () const |
| Computes descriptive statistics for every feature. | |
| vector< Descriptives > | calculate_variable_descriptives_positive_samples () const |
| Computes descriptives for inputs restricted to positive-target samples. | |
| vector< Descriptives > | calculate_variable_descriptives_negative_samples () const |
| Computes descriptives for inputs restricted to negative-target samples. | |
| vector< Descriptives > | calculate_variable_descriptives_categories (const Index variable_index) const |
| Computes descriptives per category of a categorical variable. | |
| vector< Descriptives > | calculate_feature_descriptives (const string &role_name) const |
| Computes feature descriptives for a single role. | |
| vector< Histogram > | calculate_variable_distributions (const Index bins_number=10) const |
| Builds histograms of every variable. | |
| vector< BoxPlot > | calculate_variables_box_plots () const |
| Computes box-plot statistics for every variable. | |
| Tensor< Correlation, 2 > | calculate_input_variable_correlations (Correlation(*correlation_function)(const MatrixR &, const MatrixR &), Correlation::Method method, const string &samples_role) const |
| Computes a custom correlation between every pair of input variables. | |
| Tensor< Correlation, 2 > | calculate_input_variable_pearson_correlations () const |
| Computes Pearson correlations between every pair of input variables. | |
| Tensor< Correlation, 2 > | calculate_input_variable_spearman_correlations () const |
| Computes Spearman rank correlations between every pair of input variables. | |
| Tensor< Correlation, 2 > | calculate_input_target_variable_correlations (Correlation(*correlation_function)(const MatrixR &, const MatrixR &), const string &samples_role) const |
| Computes a custom correlation between inputs and targets. | |
| Tensor< Correlation, 2 > | calculate_input_target_variable_pearson_correlations () const |
| Computes Pearson correlations between inputs and targets. | |
| Tensor< Correlation, 2 > | calculate_input_target_variable_spearman_correlations () const |
| Computes Spearman rank correlations between inputs and targets. | |
| VectorI | calculate_correlations_rank () const |
| Returns the rank of every input variable by absolute Pearson correlation against targets. | |
| void | set_default_variable_scalers () |
| Picks default scalers for every variable based on its type. | |
| vector< Descriptives > | scale_data () |
| Scales the entire data matrix. | |
| virtual vector< Descriptives > | scale_features (const string &role_name) |
| Scales the features of a given role. | |
| void | unscale_features (const string &role_name, const vector< Descriptives > &feature_descriptives) |
| Inverse-scales the features of a given role. | |
| VectorI | calculate_target_distribution () const |
| Counts the samples of every target class. | |
| vector< vector< Index > > | calculate_Tukey_outliers (const float tukey_factor=1.5f, bool replace=false) |
| Detects Tukey outliers per variable. | |
| vector< vector< Index > > | replace_Tukey_outliers_with_NaN (const float tukey_factor=1.5f) |
| Detects Tukey outliers and replaces them with NaN. | |
| void | unuse_Tukey_outliers (const float tukey_factor=1.5f) |
| Marks Tukey-outlier samples as Unused. | |
| virtual void | set_data_random () |
| Fills the data matrix with uniform random values. | |
| virtual void | set_data_integer (const Index vocabulary_size) |
| Fills the data matrix with random integers in [0, vocabulary_size). | |
| void | set_data_rosenbrock () |
| Fills the data matrix with samples from the Rosenbrock function. | |
| void | set_data_binary_classification () |
| Fills the data matrix with synthetic binary-classification data. | |
| virtual void | from_JSON (const JsonDocument &document) |
| Restores the dataset state from a JSON document. | |
| virtual void | to_JSON (JsonWriter &writer) const |
| Serializes the dataset state to JSON. | |
| void | save (const filesystem::path &file_name) const |
| Saves the dataset state to a JSON file on disk. | |
| void | load (const filesystem::path &file_name) |
| Loads the dataset state from a JSON file on disk. | |
| void | save_data () const |
| Saves the data matrix back to the configured source file. | |
| void | save_data_binary (const filesystem::path &file_name) const |
| Saves the data matrix as a binary file. | |
| void | load_data_binary () |
| Loads the data matrix from a binary file produced by save_data_binary(). | |
| Index | get_missing_values_number () const |
| Returns the total number of cells flagged as missing. | |
| bool | has_nan () const |
| Reports whether the data matrix contains any NaN. | |
| bool | has_nan_row (const Index row_index) const |
| Reports whether a row contains any NaN. | |
| virtual void | impute_missing_values_unuse () |
| Marks samples with missing values as Unused. | |
| void | impute_missing_values_statistic (const MissingValuesMethod &method) |
| Replaces missing values with a per-variable statistic. | |
| virtual void | impute_missing_values_interpolate () |
| Replaces missing values via linear interpolation along each variable. | |
| void | scrub_missing_values () |
| Removes samples that contain missing values. | |
| void | calculate_missing_values_statistics () |
| Updates the cached missing-value statistics (counts, indices). | |
| VectorI | count_nans_per_variable () const |
| Counts NaN cells per variable. | |
| Index | count_variables_with_nan () const |
| Counts variables that contain at least one NaN. | |
| Index | count_rows_with_nan () const |
| Counts samples that contain at least one NaN. | |
| Index | count_nan () const |
| Counts the total number of NaN cells. | |
| vector< vector< Index > > | split_samples (const vector< Index > &indices, Index parts_number) const |
| Splits a list of sample indices into chunks of (roughly) equal size. | |
| virtual void | read_csv () |
| Reads the configured CSV file into the data matrix. | |
| DateFormat | infer_dataset_date_format (const vector< Variable > &variables, const vector< vector< string > > &data_file_preview, bool has_header, const string &missing_values_label) |
| Infers the date format used by date-typed variables in the source file. | |
| virtual void | fill_inputs (const vector< Index > &sample_indices, const vector< Index > &feature_indices, float *buffer, bool transpose=true, int contiguous=-1) const |
| Fills a contiguous float buffer with input data for a batch of samples. | |
| virtual void | augment_inputs (float *buffer, Index batch_size) const |
| Optionally augments inputs in-place after fill_inputs() (e.g. random crops). | |
| virtual void | fill_decoder (const vector< Index > &sample_indices, const vector< Index > &feature_indices, float *buffer, bool transpose=true, int contiguous=-1) const |
| Fills a contiguous buffer with decoder-side inputs (transformer-style models). | |
| virtual void | fill_targets (const vector< Index > &sample_indices, const vector< Index > &feature_indices, float *buffer, bool transpose=true, int contiguous=-1) const |
| Fills a contiguous buffer with target data for a batch of samples. | |
Protected Member Functions | |
| void | set_default_variable_roles () |
| Sets default Input/Target roles based on column position (last column = target). | |
| void | set_default_variable_roles_forecasting () |
| Sets default roles for forecasting (typical pattern: lagged inputs + future target). | |
| void | variables_to_JSON (JsonWriter &) const |
| Serializes the variables vector to JSON. | |
| void | samples_to_JSON (JsonWriter &) const |
| Serializes the per-sample roles to JSON. | |
| void | missing_values_to_JSON (JsonWriter &) const |
| Serializes the missing-value statistics to JSON. | |
| void | preview_data_to_JSON (JsonWriter &) const |
| Serializes the source-file preview to JSON. | |
| void | variables_from_JSON (const Json *) |
| Restores the variables vector from JSON. | |
| void | samples_from_JSON (const Json *) |
| Restores the per-sample roles from JSON. | |
| void | missing_values_from_JSON (const Json *) |
| Restores the missing-value statistics from JSON. | |
| void | preview_data_from_JSON (const Json *) |
| Restores the source-file preview from JSON. | |
Protected Attributes | |
| MatrixR | data |
| Dense data matrix [samples x variables]. | |
| Shape | input_shape |
| Shape of the input portion (rank may exceed 1 for image/sequence data). | |
| Shape | target_shape |
| Shape of the target portion. | |
| Shape | decoder_shape |
| Shape of the decoder portion (transformer-style models). | |
| vector< SampleRole > | sample_roles |
| Per-sample role (Training/Validation/Testing/None). | |
| vector< string > | sample_ids |
| Optional per-sample identifiers (when the source file has an id column). | |
| vector< Variable > | variables |
| Per-variable metadata (name, role, type, scaler). | |
| filesystem::path | data_path |
| Path to the source data file. | |
| Separator | separator = Separator::Comma |
| Field separator used in the source file. | |
| string | missing_values_label = "NA" |
| Label that marks missing values in the source file. | |
| bool | has_header = false |
| Whether the source file's first row contains column names. | |
| bool | has_sample_ids = false |
| Whether the source file's first column contains sample identifiers. | |
| Codification | codification = Codification::UTF8 |
| Source-file character encoding. | |
| vector< vector< string > > | data_file_preview |
| Cached preview of the first rows of the source file. | |
| Index | gmt = 0 |
| GMT offset for time variables, in hours. | |
| MissingValuesMethod | missing_values_method = MissingValuesMethod::Mean |
| Strategy used to handle missing values. | |
| Index | missing_values_number = 0 |
| Total number of missing cells. | |
| VectorI | variables_missing_values_number |
| Per-variable missing-value count. | |
| Index | rows_missing_values_number = 0 |
| Number of rows that contain at least one missing value. | |
| bool | display = true |
| Whether to print progress messages. | |
| const vector< string > | positive_words = {"1", "yes", "positive", "+", "true", "good", "si", "sí", "Sí"} |
| Strings interpreted as positive when parsing binary variables. | |
| const vector< string > | negative_words = {"0", "no", "negative", "-", "false", "bad", "not", "No"} |
| Strings interpreted as negative when parsing binary variables. | |
Base data container with samples, variables and per-variable metadata.
Owns a dense matrix of values (rows are samples, columns are variables) plus parallel metadata: per-sample role, per-variable Variable description (role, type, scaler), input/target shapes, missing-value handling, source file metadata.
Provides utilities for loading from CSV, splitting into Training / Validation / Testing partitions, scaling and unscaling features, descriptive statistics, correlation analysis and Tukey-based outlier handling.
Specialized data formats are implemented by deriving from this class: TabularDataset, ImageDataset, LanguageDataset, TimeSeriesDataset.
|
strong |
|
strong |
|
strong |
| opennn::Dataset::Dataset | ( | const filesystem::path & | data_path, |
| const string & | separator, | ||
| bool | has_header = true, | ||
| bool | has_ids = false, | ||
| const Codification & | codification = Codification::UTF8 ) |
Constructs a dataset by loading a delimited text file.
| data_path | Path to the source file. |
| separator | Field separator string ("," ";" "\t" or " "). |
| has_header | Whether the first row contains column names. |
| has_ids | Whether the first column contains sample identifiers. |
| codification | Source-file character encoding. |
|
inlinevirtual |
Optionally augments inputs in-place after fill_inputs() (e.g. random crops).
Default implementation is a no-op; subclasses override.
| buffer | Buffer produced by fill_inputs(). |
| batch_size | Number of samples in the buffer. |
Reimplemented in opennn::ImageDataset.
| VectorI opennn::Dataset::calculate_correlations_rank | ( | ) | const |
Returns the rank of every input variable by absolute Pearson correlation against targets.
| vector< Descriptives > opennn::Dataset::calculate_feature_descriptives | ( | ) | const |
Computes descriptive statistics for every feature.
| vector< Descriptives > opennn::Dataset::calculate_feature_descriptives | ( | const string & | role_name | ) | const |
Computes feature descriptives for a single role.
| role_name | Variable role. |
| Tensor< Correlation, 2 > opennn::Dataset::calculate_input_target_variable_correlations | ( | Correlation(* | correlation_function )(const MatrixR &, const MatrixR &), |
| const string & | samples_role ) const |
Computes a custom correlation between inputs and targets.
| correlation_function | Function used to compute the pair correlation. |
| samples_role | Sample-role filter. |
| Tensor< Correlation, 2 > opennn::Dataset::calculate_input_target_variable_pearson_correlations | ( | ) | const |
Computes Pearson correlations between inputs and targets.
| Tensor< Correlation, 2 > opennn::Dataset::calculate_input_target_variable_spearman_correlations | ( | ) | const |
Computes Spearman rank correlations between inputs and targets.
| Tensor< Correlation, 2 > opennn::Dataset::calculate_input_variable_correlations | ( | Correlation(* | correlation_function )(const MatrixR &, const MatrixR &), |
| Correlation::Method | method, | ||
| const string & | samples_role ) const |
Computes a custom correlation between every pair of input variables.
| correlation_function | Function used to compute the pair correlation. |
| method | Correlation method (Pearson, Spearman, ...). |
| samples_role | Sample-role filter. |
| Tensor< Correlation, 2 > opennn::Dataset::calculate_input_variable_pearson_correlations | ( | ) | const |
Computes Pearson correlations between every pair of input variables.
| Tensor< Correlation, 2 > opennn::Dataset::calculate_input_variable_spearman_correlations | ( | ) | const |
Computes Spearman rank correlations between every pair of input variables.
| void opennn::Dataset::calculate_missing_values_statistics | ( | ) |
Updates the cached missing-value statistics (counts, indices).
| VectorI opennn::Dataset::calculate_target_distribution | ( | ) | const |
Counts the samples of every target class.
| vector< vector< Index > > opennn::Dataset::calculate_Tukey_outliers | ( | const float | tukey_factor = 1.5f, |
| bool | replace = false ) |
Detects Tukey outliers per variable.
| tukey_factor | Tukey-fence multiplier (1.5 = mild, 3.0 = extreme). |
| replace | Whether to also replace the outliers with NaN. |
| vector< Descriptives > opennn::Dataset::calculate_variable_descriptives_categories | ( | const Index | variable_index | ) | const |
Computes descriptives per category of a categorical variable.
| variable_index | Column index of the categorical variable. |
| vector< Descriptives > opennn::Dataset::calculate_variable_descriptives_negative_samples | ( | ) | const |
Computes descriptives for inputs restricted to negative-target samples.
| vector< Descriptives > opennn::Dataset::calculate_variable_descriptives_positive_samples | ( | ) | const |
Computes descriptives for inputs restricted to positive-target samples.
| vector< Histogram > opennn::Dataset::calculate_variable_distributions | ( | const Index | bins_number = 10 | ) | const |
Builds histograms of every variable.
| bins_number | Number of bins per histogram. |
| vector< BoxPlot > opennn::Dataset::calculate_variables_box_plots | ( | ) | const |
Computes box-plot statistics for every variable.
| Index opennn::Dataset::count_nan | ( | ) | const |
Counts the total number of NaN cells.
| VectorI opennn::Dataset::count_nans_per_variable | ( | ) | const |
Counts NaN cells per variable.
| Index opennn::Dataset::count_rows_with_nan | ( | ) | const |
Counts samples that contain at least one NaN.
| Index opennn::Dataset::count_variables_with_nan | ( | ) | const |
Counts variables that contain at least one NaN.
|
virtual |
Fills a contiguous buffer with decoder-side inputs (transformer-style models).
| sample_indices | Row indices for the batch. |
| feature_indices | Column indices for the decoder features. |
| buffer | Destination buffer. |
| transpose | Whether to write feature-major or sample-major. |
| contiguous | Stride hint. |
|
virtual |
Fills a contiguous float buffer with input data for a batch of samples.
Used by Loss/Optimizer to build forward-pass inputs without reallocating.
| sample_indices | Row indices for the batch. |
| feature_indices | Column indices for the input features. |
| buffer | Destination buffer; must hold sample_indices.size() * feature_indices.size() floats. |
| transpose | Whether to write feature-major (true) or sample-major (false). |
| contiguous | Stride hint when sample_indices is a contiguous range; -1 disables. |
Reimplemented in opennn::TimeSeriesDataset.
|
virtual |
Fills a contiguous buffer with target data for a batch of samples.
| sample_indices | Row indices for the batch. |
| feature_indices | Column indices for the target features. |
| buffer | Destination buffer. |
| transpose | Whether to write feature-major or sample-major. |
| contiguous | Stride hint. |
Reimplemented in opennn::TimeSeriesDataset.
|
virtual |
Restores the dataset state from a JSON document.
| document | Parsed JSON produced by to_JSON(). |
Reimplemented in opennn::ImageDataset, opennn::LanguageDataset, and opennn::TimeSeriesDataset.
|
virtual |
Splits a list of sample indices into batches.
Optionally shuffles the indices before batching.
| sample_indices | Sample indices to batch. |
| batch_size | Target batch size. |
| shuffle | Whether to shuffle indices. |
| batches | Output vector of batches; populated in place. |
|
inline |
Returns the configured source-file codification.
| const string opennn::Dataset::get_codification_string | ( | ) | const |
Returns the codification as a string.
|
inline |
Returns the raw data matrix.
| MatrixR opennn::Dataset::get_data | ( | const string & | sample_role, |
| const string & | variable_role ) const |
Returns the data restricted to a sample-role and variable-role intersection.
| sample_role | Sample role. |
| variable_role | Variable role. |
|
inline |
Returns the cached preview of the source file (first rows).
| MatrixR opennn::Dataset::get_data_from_indices | ( | const vector< Index > & | sample_indices, |
| const vector< Index > & | variable_indices ) const |
Returns the data restricted to specific samples and variables.
| sample_indices | Row indices. |
| variable_indices | Column indices. |
|
inline |
Returns the path to the source data file.
|
inline |
Reports whether progress messages are printed.
| MatrixR opennn::Dataset::get_feature_data | ( | const string & | role_name | ) | const |
Returns the data matrix restricted to the features of a given role.
| role_name | Variable role. |
| vector< Index > opennn::Dataset::get_feature_dimensions | ( | ) | const |
Returns the per-variable feature dimension (1 for Numeric, N for Categorical).
| vector< vector< Index > > opennn::Dataset::get_feature_indices | ( | ) | const |
Returns the per-variable feature indices.
| vector< Index > opennn::Dataset::get_feature_indices | ( | const Index | variable_index | ) | const |
Returns the feature indices for a single variable.
| variable_index | Column index of the variable. |
| vector< Index > opennn::Dataset::get_feature_indices | ( | const string & | role_name | ) | const |
Returns the feature indices for variables of a given role.
| role_name | Variable role. |
| vector< string > opennn::Dataset::get_feature_names | ( | ) | const |
Returns the names of every feature.
| vector< string > opennn::Dataset::get_feature_names | ( | const string & | role_name | ) | const |
Returns the names of the features assigned to a given role.
| role_name | Variable role. |
| vector< string > opennn::Dataset::get_feature_scalers | ( | const string & | role_name | ) | const |
Returns the scaler chosen for each variable of a given role.
| role_name | Variable role. |
| Index opennn::Dataset::get_features_number | ( | ) | const |
Returns the total number of features.
One categorical variable expands into N features (one per class), so features_number >= variables_number in general.
| Index opennn::Dataset::get_features_number | ( | const string & | role_name | ) | const |
Returns the number of features assigned to a given role.
| role_name | Variable role. |
|
inline |
Returns the input shape.
|
inline |
Returns the label that marks missing values in the source file.
|
inline |
Returns the configured missing-value strategy.
| string opennn::Dataset::get_missing_values_method_string | ( | ) | const |
Returns the missing-value strategy as a string.
|
inline |
Returns the total number of cells flagged as missing.
| VectorR opennn::Dataset::get_sample_data | ( | const Index | sample_index | ) | const |
Returns a single sample as a row vector.
| sample_index | Row index. |
| vector< Index > opennn::Dataset::get_sample_indices | ( | const string & | role_name | ) | const |
Returns the indices of the samples assigned to a given role.
| role_name | "Training", "Validation", "Testing" or "None". |
| VectorI opennn::Dataset::get_sample_role_numbers | ( | ) | const |
Counts the samples assigned to each role.
|
inline |
Returns the per-sample role assignments.
| vector< Index > opennn::Dataset::get_sample_roles_vector | ( | ) | const |
Returns the per-sample role indices as plain integers.
|
inline |
Returns the total number of samples (rows of data).
| Index opennn::Dataset::get_samples_number | ( | const string & | role_name | ) | const |
Returns the number of samples assigned to a given role.
| role_name | "Training", "Validation", "Testing" or "None". |
|
inline |
Returns the configured field separator.
| string opennn::Dataset::get_separator_name | ( | ) | const |
Returns the field separator as a human-readable name.
| string opennn::Dataset::get_separator_string | ( | ) | const |
Returns the field separator as the actual delimiter character(s).
| Shape opennn::Dataset::get_shape | ( | const string & | role_name | ) | const |
Returns the input or target shape used by the network.
| role_name | "Input" or "Target". |
|
inline |
Returns the target shape.
| vector< Index > opennn::Dataset::get_used_feature_indices | ( | ) | const |
Returns the feature indices for all variables that are not "Unused".
| Index opennn::Dataset::get_used_features_number | ( | ) | const |
Returns the number of features that are not "Unused".
| vector< Index > opennn::Dataset::get_used_sample_indices | ( | ) | const |
Returns the indices of all samples that are not "None".
| Index opennn::Dataset::get_used_samples_number | ( | ) | const |
Returns the number of samples that are not "None".
| vector< Index > opennn::Dataset::get_used_variables_indices | ( | ) | const |
Returns the column indices of all variables that are not "Unused".
| Index opennn::Dataset::get_used_variables_number | ( | ) | const |
Returns the number of variables that are not "Unused".
| MatrixR opennn::Dataset::get_variable_data | ( | const Index | variable_index | ) | const |
Returns the data for a single variable across all samples.
| variable_index | Column index. |
| MatrixR opennn::Dataset::get_variable_data | ( | const Index | variable_index, |
| const vector< Index > & | sample_indices ) const |
Returns the data for a single variable on a subset of samples.
| variable_index | Column index. |
| sample_indices | Row indices. |
sample_indices. | MatrixR opennn::Dataset::get_variable_data | ( | const string & | variable_name | ) | const |
Returns the data for a single variable identified by name.
| variable_name | Variable name. |
| Index opennn::Dataset::get_variable_index | ( | const Index | id | ) | const |
Returns the column index of the variable with a given numeric id.
| id | Variable id. |
data. | Index opennn::Dataset::get_variable_index | ( | const string & | name | ) | const |
Returns the column index of the variable with a given name.
| name | Variable name. |
data. | vector< Index > opennn::Dataset::get_variable_indices | ( | const string & | role_name | ) | const |
Returns the column indices of the variables assigned to a given role.
| role_name | Variable role. |
| vector< string > opennn::Dataset::get_variable_names | ( | ) | const |
Returns the names of every variable.
| vector< string > opennn::Dataset::get_variable_names | ( | const string & | role_name | ) | const |
Returns the names of the variables assigned to a given role.
| role_name | Variable role. |
|
inline |
Returns the type of a variable (Numeric, Binary, Categorical, ...).
| index | Column index. |
| vector< VariableType > opennn::Dataset::get_variable_types | ( | const vector< Index > | indices | ) | const |
Returns the types of a list of variables.
| indices | Column indices. |
indices.
|
inline |
Returns the per-variable metadata.
| vector< Variable > opennn::Dataset::get_variables | ( | const string & | role_name | ) | const |
|
inline |
Returns the total number of variables (columns of data).
| Index opennn::Dataset::get_variables_number | ( | const string & | role_name | ) | const |
| bool opennn::Dataset::has_binary_or_categorical_variables | ( | ) | const |
Reports whether the dataset has any binary or categorical variable.
| bool opennn::Dataset::has_binary_variables | ( | ) | const |
Reports whether at least one variable is binary.
| bool opennn::Dataset::has_categorical_variables | ( | ) | const |
Reports whether at least one variable is categorical.
| bool opennn::Dataset::has_missing_values | ( | const vector< string > & | labels | ) | const |
Reports whether the dataset has missing values matching any of the supplied labels.
| labels | Candidate missing-value labels. |
| bool opennn::Dataset::has_nan | ( | ) | const |
Reports whether the data matrix contains any NaN.
| bool opennn::Dataset::has_nan_row | ( | const Index | row_index | ) | const |
Reports whether a row contains any NaN.
| row_index | Row to inspect. |
row_index is NaN. | bool opennn::Dataset::has_time_variable | ( | ) | const |
Reports whether at least one variable plays the Time role.
| bool opennn::Dataset::has_validation | ( | ) | const |
Reports whether at least one sample is assigned to Validation.
|
virtual |
Replaces missing values via linear interpolation along each variable.
Reimplemented in opennn::TimeSeriesDataset.
| void opennn::Dataset::impute_missing_values_statistic | ( | const MissingValuesMethod & | method | ) |
Replaces missing values with a per-variable statistic.
| method | MissingValuesMethod (Mean, Median). |
|
virtual |
Marks samples with missing values as Unused.
Reimplemented in opennn::TimeSeriesDataset.
| DateFormat opennn::Dataset::infer_dataset_date_format | ( | const vector< Variable > & | variables, |
| const vector< vector< string > > & | data_file_preview, | ||
| bool | has_header, | ||
| const string & | missing_values_label ) |
Infers the date format used by date-typed variables in the source file.
| variables | Variables to inspect. |
| data_file_preview | Preview rows from the source file. |
| has_header | Whether the preview includes a header row. |
| missing_values_label | Missing-value label to ignore. |
|
inline |
Reports whether the data matrix is empty.
|
inline |
Reports whether a sample is used (any role other than None).
| i | Sample index. |
| void opennn::Dataset::load | ( | const filesystem::path & | file_name | ) |
Loads the dataset state from a JSON file on disk.
| file_name | Source path. |
| void opennn::Dataset::load_data_binary | ( | ) |
Loads the data matrix from a binary file produced by save_data_binary().
|
protected |
Restores the missing-value statistics from JSON.
|
protected |
Serializes the missing-value statistics to JSON.
|
protected |
Restores the source-file preview from JSON.
|
protected |
Serializes the source-file preview to JSON.
|
virtual |
Reads the configured CSV file into the data matrix.
Reimplemented in opennn::LanguageDataset, and opennn::TimeSeriesDataset.
| vector< vector< Index > > opennn::Dataset::replace_Tukey_outliers_with_NaN | ( | const float | tukey_factor = 1.5f | ) |
Detects Tukey outliers and replaces them with NaN.
| tukey_factor | Tukey-fence multiplier. |
|
protected |
Restores the per-sample roles from JSON.
|
protected |
Serializes the per-sample roles to JSON.
| void opennn::Dataset::save | ( | const filesystem::path & | file_name | ) | const |
Saves the dataset state to a JSON file on disk.
| file_name | Destination path. |
| void opennn::Dataset::save_data | ( | ) | const |
Saves the data matrix back to the configured source file.
| void opennn::Dataset::save_data_binary | ( | const filesystem::path & | file_name | ) | const |
Saves the data matrix as a binary file.
| file_name | Destination path. |
| vector< Descriptives > opennn::Dataset::scale_data | ( | ) |
Scales the entire data matrix.
|
virtual |
Scales the features of a given role.
Reads the scalers from the per-variable metadata, computes the descriptives of the unscaled data, scales the data in place and returns the descriptives so they can be reused (for inverse-scaling outputs or configuring the network's Scaling layer).
| role_name | Variable role to scale (typically "Input"). |
Reimplemented in opennn::ImageDataset.
| void opennn::Dataset::scrub_missing_values | ( | ) |
Removes samples that contain missing values.
| void opennn::Dataset::set | ( | const filesystem::path & | data_path, |
| const string & | separator, | ||
| bool | has_header = true, | ||
| bool | has_ids = false, | ||
| const Dataset::Codification & | codification = Codification::UTF8 ) |
Resets the dataset by loading a delimited text file.
| data_path | Path to the source file. |
| separator | Field separator string. |
| has_header | Whether the first row contains column names. |
| has_ids | Whether the first column contains sample identifiers. |
| codification | Source-file character encoding. |
| void opennn::Dataset::set | ( | const filesystem::path & | file_name | ) |
Resets the dataset by loading a previously serialized JSON state.
| file_name | Path to the JSON file. |
| void opennn::Dataset::set_binary_variables | ( | ) |
Detects binary variables (two distinct values) and tags them accordingly.
|
inline |
Sets the source-file codification.
| new_codification | Codification enum value. |
| void opennn::Dataset::set_codification | ( | const string & | new_codification | ) |
Sets the source-file codification from its name.
| new_codification | "UTF8" or "SHIFT_JIS". |
| void opennn::Dataset::set_data | ( | const MatrixR & | new_data | ) |
Replaces the data matrix.
The number of rows and columns must match the existing samples and variables count.
| new_data | Replacement matrix. |
| void opennn::Dataset::set_data_binary_classification | ( | ) |
Fills the data matrix with synthetic binary-classification data.
| void opennn::Dataset::set_data_constant | ( | const float | value | ) |
Fills the data matrix with a constant value.
| value | Value used to fill every cell. |
|
virtual |
Fills the data matrix with random integers in [0, vocabulary_size).
| vocabulary_size | Exclusive upper bound for the integers. |
|
inline |
Sets the path to the source data file.
| new_data_path | New path. |
|
virtual |
Fills the data matrix with uniform random values.
Reimplemented in opennn::ImageDataset.
| void opennn::Dataset::set_data_rosenbrock | ( | ) |
Fills the data matrix with samples from the Rosenbrock function.
| void opennn::Dataset::set_default | ( | ) |
Resets configuration members to defaults.
| void opennn::Dataset::set_default_variable_names | ( | ) |
Sets default names ("variable_1", "variable_2", ...) for every variable.
|
protected |
Sets default Input/Target roles based on column position (last column = target).
|
protected |
Sets default roles for forecasting (typical pattern: lagged inputs + future target).
| void opennn::Dataset::set_default_variable_scalers | ( | ) |
Picks default scalers for every variable based on its type.
|
inline |
Toggles progress messages.
| new_display | true to enable. |
| void opennn::Dataset::set_feature_names | ( | const vector< string > & | new_feature_names | ) |
Names every feature.
Each categorical variable contributes one name per class.
| new_feature_names | One name per feature. |
|
inline |
Sets the GMT offset for time variables.
| new_gmt | Offset in hours. |
|
inline |
Sets whether the source file has a header row.
| new_has_header | true if the first row contains names. |
|
inline |
Sets whether the source file has a sample-id column.
| new_has_ids | true if the first column contains identifiers. |
| void opennn::Dataset::set_input_variables_unused | ( | ) |
Marks all input variables as Unused.
|
inline |
Sets the label used for missing values in the source file.
| label | New label. |
|
inline |
Sets the missing-value handling strategy.
| method | MissingValuesMethod enum value. |
| void opennn::Dataset::set_missing_values_method | ( | const string & | method_name | ) |
Sets the missing-value handling strategy from its name.
| method_name | "Unuse", "Mean", "Median" or "Interpolation". |
| void opennn::Dataset::set_sample_role | ( | const Index | sample_index, |
| const string & | role_name ) |
Assigns a role to a single sample.
| sample_index | Row index. |
| role_name | Sample role. |
| void opennn::Dataset::set_sample_roles | ( | const string & | role_name | ) |
Assigns the same role to every sample.
| role_name | Sample role. |
| void opennn::Dataset::set_sample_roles | ( | const vector< Index > & | sample_indices, |
| const string & | role_name ) |
Assigns the same role to a list of samples.
| sample_indices | Indices to update. |
| role_name | Sample role. |
| void opennn::Dataset::set_sample_roles | ( | const vector< string > & | role_names | ) |
Assigns roles to all samples from a parallel string vector.
| role_names | One name per sample. |
|
inline |
Sets the field separator.
| new_separator | Separator enum value. |
| void opennn::Dataset::set_separator_name | ( | const string & | new_separator_name | ) |
Sets the field separator from its human-readable name.
| new_separator_name | "Comma", "Semicolon", "Tab" or "Space". |
| void opennn::Dataset::set_separator_string | ( | const string & | new_separator_string | ) |
Sets the field separator from its delimiter character(s).
| new_separator_string | Delimiter ("," ";" "\t" or " "). |
| void opennn::Dataset::set_shape | ( | const string & | role_name, |
| const Shape & | new_shape ) |
Sets the input or target shape.
| role_name | "Input" or "Target". |
| new_shape | New shape. |
| void opennn::Dataset::set_variable_indices | ( | const vector< Index > & | input_indices, |
| const vector< Index > & | target_indices ) |
Marks selected variables as Input and others as Target.
| input_indices | Indices of input variables. |
| target_indices | Indices of target variables. |
| void opennn::Dataset::set_variable_names | ( | const vector< string > & | new_variable_names | ) |
Replaces the names of every variable.
| new_variable_names | One name per variable. |
| void opennn::Dataset::set_variable_role | ( | const Index | variable_index, |
| const string & | role_name ) |
Sets the role of a single variable by index.
| variable_index | Column index. |
| role_name | New role. |
| void opennn::Dataset::set_variable_role | ( | const string & | variable_name, |
| const string & | role_name ) |
Sets the role of a single variable by name.
| variable_name | Variable name. |
| role_name | New role. |
| void opennn::Dataset::set_variable_roles | ( | const string & | role_name | ) |
Assigns the same role to every variable.
| role_name | Variable role. |
|
virtual |
Assigns roles to all variables from a parallel string vector.
| role_names | One role per variable. |
| void opennn::Dataset::set_variable_scalers | ( | const string & | scaler_name | ) |
Sets the same scaler on every variable.
| scaler_name | Scaler name. |
| void opennn::Dataset::set_variable_scalers | ( | const vector< string > & | scaler_names | ) |
Sets one scaler per variable.
| scaler_names | One name per variable. |
| void opennn::Dataset::set_variable_type | ( | const Index | variable_index, |
| const VariableType & | type ) |
Sets the type of a single variable by index.
| variable_index | Column index. |
| type | New variable type. |
| void opennn::Dataset::set_variable_type | ( | const string & | variable_name, |
| const VariableType & | type ) |
Sets the type of a single variable by name.
| variable_name | Variable name. |
| type | New variable type. |
| void opennn::Dataset::set_variable_types | ( | const VariableType & | type | ) |
Sets every variable to a given type.
| type | Variable type to apply. |
| void opennn::Dataset::set_variables | ( | const string & | description | ) |
Re-creates the variables vector from an input/target shape descriptor.
| description | Compact descriptor parsed by the implementation. |
|
inline |
Replaces the per-variable metadata.
| new_variables | New variables vector. |
|
inline |
Resizes the variables vector.
| new_size | New variable count. |
| void opennn::Dataset::split_samples | ( | const float | training_ratio = 0.6f, |
| float | selection_ratio = 0.2f, | ||
| float | testing_ratio = 0.2f, | ||
| bool | shuffle = true ) |
Splits samples into training/validation/testing partitions.
Convenience entry point that delegates to split_samples_random() (when shuffle is true) or split_samples_sequential() (otherwise).
| training_ratio | Fraction of samples for training. |
| selection_ratio | Fraction for validation. |
| testing_ratio | Fraction for testing. |
| shuffle | Whether to shuffle the indices before splitting. |
| vector< vector< Index > > opennn::Dataset::split_samples | ( | const vector< Index > & | indices, |
| Index | parts_number ) const |
Splits a list of sample indices into chunks of (roughly) equal size.
| indices | Sample indices to split. |
| parts_number | Number of chunks. |
| void opennn::Dataset::split_samples_random | ( | const float | training_ratio = 0.6f, |
| float | selection_ratio = 0.2f, | ||
| float | testing_ratio = 0.2f ) |
Splits samples into partitions after random shuffling.
| training_ratio | Fraction of samples for training. |
| selection_ratio | Fraction for validation. |
| testing_ratio | Fraction for testing. |
| void opennn::Dataset::split_samples_sequential | ( | const float | training_ratio = 0.6f, |
| float | selection_ratio = 0.2f, | ||
| float | testing_ratio = 0.2f ) |
Splits samples into partitions in their original order.
| training_ratio | Fraction of samples for training. |
| selection_ratio | Fraction for validation. |
| testing_ratio | Fraction for testing. |
|
virtual |
Serializes the dataset state to JSON.
| writer | JSON writer that receives the dataset tree. |
Reimplemented in opennn::ImageDataset, opennn::LanguageDataset, and opennn::TimeSeriesDataset.
| void opennn::Dataset::unscale_features | ( | const string & | role_name, |
| const vector< Descriptives > & | feature_descriptives ) |
Inverse-scales the features of a given role.
| role_name | Variable role to unscale. |
| feature_descriptives | Descriptives produced by scale_features(). |
| vector< string > opennn::Dataset::unuse_collinear_variables | ( | const float | maximum_correlation = 0.95f | ) |
Marks variables strongly correlated against another input as Unused.
| maximum_correlation | Threshold; variables above it (with another input) are unused. |
| void opennn::Dataset::unuse_Tukey_outliers | ( | const float | tukey_factor = 1.5f | ) |
Marks Tukey-outlier samples as Unused.
| tukey_factor | Tukey-fence multiplier. |
| vector< string > opennn::Dataset::unuse_uncorrelated_variables | ( | const float | minimum_correlation = 0.25f | ) |
Marks variables with low correlation against the target as Unused.
| minimum_correlation | Threshold; variables below it are unused. |
|
protected |
Restores the variables vector from JSON.
|
protected |
Serializes the variables vector to JSON.
|
protected |
Source-file character encoding.
|
protected |
Cached preview of the first rows of the source file.
|
protected |
Path to the source data file.
|
protected |
Shape of the decoder portion (transformer-style models).
|
protected |
Whether to print progress messages.
|
protected |
GMT offset for time variables, in hours.
|
protected |
Whether the source file's first row contains column names.
|
protected |
Whether the source file's first column contains sample identifiers.
|
protected |
Shape of the input portion (rank may exceed 1 for image/sequence data).
|
protected |
Label that marks missing values in the source file.
|
protected |
Strategy used to handle missing values.
|
protected |
Total number of missing cells.
|
protected |
Strings interpreted as negative when parsing binary variables.
|
protected |
Strings interpreted as positive when parsing binary variables.
|
protected |
Number of rows that contain at least one missing value.
|
protected |
Optional per-sample identifiers (when the source file has an id column).
|
protected |
Per-sample role (Training/Validation/Testing/None).
|
protected |
Field separator used in the source file.
|
protected |
Per-variable metadata (name, role, type, scaler).
|
protected |
Per-variable missing-value count.