|
OpenNN
Open-source neural networks library
|
Encoder-decoder Transformer (Vaswani et al., 2017) for sequence-to-sequence modeling. More...
#include <standard_networks.h>
Public Member Functions | |
| Transformer (const Index input_sequence_length=0, Index decoder_sequence_length=0, Index input_vocabulary_size=0, Index output_vocabulary_size=0, Index embedding_dimension=0, Index heads_number=0, Index feedforward_dimension=0, Index layers_number=0) | |
| Constructs an untrained Transformer. | |
| void | set (const Index input_sequence_length=0, Index decoder_sequence_length=0, Index input_vocabulary_size=0, Index output_vocabulary_size=0, Index embedding_dimension=0, Index heads_number=0, Index feedforward_dimension=0, Index layers_number=0) |
| Re-initializes the Transformer with new dimensions. | |
| Index | get_input_sequence_length () const |
| Length of the encoder input sequence. | |
| Index | get_decoder_sequence_length () const |
| Length of the decoder input sequence. | |
| Index | get_embedding_dimension () const |
| Embedding (model) dimension. | |
| Index | get_heads_number () const |
| Number of attention heads per layer. | |
| void | set_dropout_rate (const float) |
| Sets the dropout rate applied to the residual streams. | |
| void | set_input_vocabulary (const vector< string > &) |
| Replaces the input-side vocabulary. | |
| void | set_output_vocabulary (const vector< string > &) |
| Replaces the output-side vocabulary. | |
| string | calculate_outputs (const string &input) |
| Greedy decoding given a raw input string. | |
Public Member Functions inherited from opennn::NeuralNetwork | |
| NeuralNetwork () | |
| Default-constructs an empty network. | |
| virtual | ~NeuralNetwork ()=default |
| Defaulted virtual destructor for safe polymorphic deletion. | |
| NeuralNetwork (const filesystem::path &file_name) | |
| Constructs a network by loading a serialized model from disk. | |
| void | add_layer (unique_ptr< Layer > new_layer, const vector< Index > &input_indices=vector< Index >()) |
| Appends a layer to the stack. | |
| const Configuration::Resolved & | get_config () const |
| Returns the resolved device/precision configuration. | |
| bool | is_gpu () const |
| Reports whether the network is configured to run on CUDA. | |
| bool | is_cpu () const |
| Reports whether the network is configured to run on CPU. | |
| Type | get_training_type () const |
| Returns the precision used for training. | |
| Type | get_inference_type () const |
| Returns the precision used for inference. | |
| vector< vector< Shape > > | get_parameter_shapes () const |
| Returns the parameter shapes of every layer. | |
| vector< vector< Shape > > | get_state_shapes () const |
| Returns the persistent state shapes of every layer. | |
| vector< vector< Shape > > | get_forward_shapes (Index batch_size) const |
| Returns the forward-propagation buffer shapes for a given batch size. | |
| vector< vector< Shape > > | get_backward_shapes (Index batch_size) const |
| Returns the backward-propagation buffer shapes for a given batch size. | |
| Index | get_states_size () const |
| Returns the total size in floats of all persistent state buffers. | |
| Index | get_forward_size (Index batch_size) const |
| Returns the total size in floats of the forward-propagation workspace. | |
| Index | get_backward_size (Index batch_size) const |
| Returns the total size in floats of the backward-propagation workspace. | |
| void | compile () |
| Allocates parameters and resolves layer-input indices. | |
| bool | has (const string &layer_name) const |
| Reports whether the network contains a layer of the given name. | |
| bool | has (LayerType type) const |
| Reports whether the network contains a layer of the given type. | |
| bool | is_empty () const |
| Reports whether the network has zero layers. | |
| float * | get_parameters_data () |
| Returns the parameter buffer as a raw float pointer. | |
| const float * | get_parameters_data () const |
| Returns the parameter buffer as a raw float pointer (const overload). | |
| Index | get_parameters_size () const |
| Returns the parameter count in floats. | |
| const vector< Variable > & | get_input_variables () const |
| Returns the input variables describing each input feature. | |
| const vector< string > | get_input_feature_names () const |
| Returns the names of the input features. | |
| const vector< Variable > & | get_output_variables () const |
| Returns the output variables describing each output target. | |
| const vector< string > | get_output_feature_names () const |
| Returns the names of the output features. | |
| const vector< unique_ptr< Layer > > & | get_layers () const |
| Returns the layer stack. | |
| const unique_ptr< Layer > & | get_layer (const Index i) const |
| Returns the layer at a given index. | |
| const unique_ptr< Layer > & | get_layer (const string &layer_name) const |
| Returns the layer with a given name. | |
| Index | get_layer_index (const string &layer_name) const |
| Returns the index of the layer with a given name. | |
| const vector< vector< Index > > & | get_layer_input_indices () const |
| Returns the per-layer input-layer indices. | |
| vector< vector< Index > > | get_layer_output_indices () const |
| Returns the per-layer output-layer indices. | |
| Layer * | get_first (const string &layer_name) |
| Returns the first layer of a given type by name. | |
| Layer * | get_first (LayerType type) |
| Returns the first layer of a given type. | |
| const Layer * | get_first (const string &layer_name) const |
| Returns the first layer of a given type by name (const overload). | |
| const Layer * | get_first (LayerType type) const |
| Returns the first layer of a given type (const overload). | |
| void | set_layers_number (const Index new_layers_number) |
| Resizes the layer stack. | |
| void | set_layer_input_indices (const vector< vector< Index > > &new_layer_input_indices) |
| Replaces the entire layer-input wiring. | |
| void | set_layer_input_indices (const Index layer_index, const vector< Index > &new_input_indices) |
| Replaces the input wiring of a single layer by index. | |
| void | set_layer_input_indices (const string &layer_name, const vector< string > &input_layer_names) |
| Replaces the input wiring of a single layer by name. | |
| void | set_layer_input_indices (const string &layer_name, initializer_list< string > input_layer_names) |
| Replaces the input wiring of a single layer by name (initializer-list overload). | |
| void | set_layer_input_indices (const string &layer_name, const string &input_layer_name) |
| Sets a single source layer as input to a destination layer. | |
| void | set_input_variables (const vector< Variable > &new_input_variables) |
| Replaces the input variables. | |
| void | set_output_variables (const vector< Variable > &new_output_variables) |
| Replaces the output variables. | |
| void | set_input_names (const vector< string > &new_input_names) |
| Sets the names of the input features. | |
| void | set_output_names (const vector< string > &new_output_names) |
| Sets the names of the output features. | |
| void | set_input_shape (const Shape &new_input_shape) |
| Sets the input shape of the network. | |
| void | set_default () |
| Resets non-architectural state to defaults. | |
| Index | get_layers_number () const |
| Returns the number of layers. | |
| Index | get_layers_number (const string &layer_name) const |
| Returns the number of layers of a given type by name. | |
| Index | get_layers_number (LayerType type) const |
| Returns the number of layers of a given type. | |
| Index | get_first_trainable_layer_index () const |
| Returns the index of the first trainable layer. | |
| Index | get_last_trainable_layer_index () const |
| Returns the index of the last trainable layer. | |
| Index | get_inputs_number () const |
| Returns the total number of input features. | |
| Index | get_outputs_number () const |
| Returns the total number of output features. | |
| Shape | get_input_shape () const |
| Returns the network's input shape. | |
| Shape | get_output_shape () const |
| Returns the network's output shape. | |
| Activation::Function | get_output_activation () const |
| Returns the activation function of the last layer. | |
| Index | get_parameters_number () const |
| Returns the total number of trainable parameters. | |
| vector< Index > | get_layer_parameter_numbers () const |
| Returns the parameter count of every layer. | |
| void | set_parameters (const VectorR &new_parameters) |
| Replaces all trainable parameters. | |
| void | set_parameters_random () |
| Initializes parameters with uniform random values. | |
| void | set_parameters_glorot () |
| Initializes parameters with Glorot (Xavier) uniform values. | |
| MatrixR | calculate_outputs (const vector< TensorView > &inputs) |
| Computes outputs from a list of input tensor views. | |
| MatrixR | calculate_outputs (const MatrixR &inputs) |
| Computes outputs for tabular inputs. | |
| MatrixR | calculate_outputs (const Tensor3 &inputs) |
| Computes outputs for rank-3 inputs (e.g. sequence data). | |
| MatrixR | calculate_outputs (const Tensor4 &inputs) |
| Computes outputs for rank-4 inputs (e.g. images). | |
| MatrixR | calculate_directional_inputs (const Index direction, const VectorR &point, float minimum, float maximum, Index points_number=101) const |
| Computes outputs along a directional sweep of one input. | |
| Tensor3 | calculate_outputs (const Tensor3 &inputs, const Tensor3 &context) |
| Computes outputs from two rank-3 inputs (encoder/decoder pair). | |
| Index | calculate_image_output (const filesystem::path &image_path) |
| Runs the network on a single image and returns the predicted class. | |
| MatrixR | calculate_text_outputs (const Tensor< string, 1 > &texts) |
| Runs the network on a batch of strings and returns class outputs. | |
| void | from_JSON (const JsonDocument &document) |
| Restores the network from a JSON document. | |
| void | to_JSON (JsonWriter &writer) const |
| Serializes the network to JSON. | |
| void | save (const filesystem::path &file_name) const |
| Saves the full network (architecture + parameters) to a JSON file. | |
| void | save_parameters (const filesystem::path &file_name) const |
| Saves only the parameters in binary form. | |
| void | load (const filesystem::path &file_name) |
| Loads the full network (architecture + parameters) from a JSON file. | |
| void | load_parameters_binary (const filesystem::path &file_name) |
| Loads parameters from a binary file produced by save_parameters(). | |
| vector< string > | get_names_string () const |
| Returns the names of every input and output feature. | |
| void | save_outputs (MatrixR &outputs, const filesystem::path &file_name) |
| Saves a tabular outputs tensor to a CSV file. | |
| void | save_outputs (Tensor3 &outputs, const filesystem::path &file_name) |
| Saves a rank-3 outputs tensor to a CSV file. | |
| void | forward_propagate (const vector< TensorView > &inputs, ForwardPropagation &forward, bool is_training=false) const |
Runs the forward pass and writes intermediate activations into forward. | |
| void | forward_propagate (const vector< TensorView > &inputs, const VectorR ¶meters, ForwardPropagation &forward) |
| Runs the forward pass with explicitly supplied parameters. | |
| vector< string > | get_layer_labels () const |
| Returns a label for every layer (name + key hyperparameters). | |
Additional Inherited Members | |
Protected Attributes inherited from opennn::NeuralNetwork | |
| string | name = "neural_network" |
| Network identifier; used as a JSON tag. | |
| vector< Variable > | input_variables |
| Description of every input feature (role, scaler, type). | |
| vector< Variable > | output_variables |
| Description of every output feature (role, scaler, type). | |
| vector< unique_ptr< Layer > > | layers |
| Owned layers in execution order. | |
| vector< vector< Index > > | layer_input_indices |
| Per-destination-layer list of source-layer indices. | |
| Buffer | parameters |
| Flat parameter buffer shared by all layers. | |
| Buffer | parameters_bf16 {Device::CUDA} |
BF16 mirror of parameters for mixed-precision CUDA training. | |
| vector< vector< vector< TensorView > > > | parameter_views |
Per-layer per-parameter-group views into parameters. | |
| Buffer | states |
| Flat persistent-state buffer shared by stateful layers (e.g. Recurrent). | |
| Configuration::Resolved | config |
| Resolved device/precision configuration applied at compile time. | |
Encoder-decoder Transformer (Vaswani et al., 2017) for sequence-to-sequence modeling.
| opennn::Transformer::Transformer | ( | const Index | input_sequence_length = 0, |
| Index | decoder_sequence_length = 0, | ||
| Index | input_vocabulary_size = 0, | ||
| Index | output_vocabulary_size = 0, | ||
| Index | embedding_dimension = 0, | ||
| Index | heads_number = 0, | ||
| Index | feedforward_dimension = 0, | ||
| Index | layers_number = 0 ) |
Constructs an untrained Transformer.
| input_sequence_length | Length of the encoder input sequence. |
| decoder_sequence_length | Length of the decoder input sequence. |
| input_vocabulary_size | Size of the input-side vocabulary. |
| output_vocabulary_size | Size of the output-side vocabulary. |
| embedding_dimension | Embedding (model) dimension. |
| heads_number | Number of attention heads per layer. |
| feedforward_dimension | Hidden size of the position-wise feed-forward sublayers. |
| layers_number | Number of encoder (and decoder) layers. |
| string opennn::Transformer::calculate_outputs | ( | const string & | input | ) |
Greedy decoding given a raw input string.
| input | Raw input text to encode. |
| Index opennn::Transformer::get_decoder_sequence_length | ( | ) | const |
Length of the decoder input sequence.
| Index opennn::Transformer::get_embedding_dimension | ( | ) | const |
Embedding (model) dimension.
| Index opennn::Transformer::get_heads_number | ( | ) | const |
Number of attention heads per layer.
| Index opennn::Transformer::get_input_sequence_length | ( | ) | const |
Length of the encoder input sequence.
| void opennn::Transformer::set | ( | const Index | input_sequence_length = 0, |
| Index | decoder_sequence_length = 0, | ||
| Index | input_vocabulary_size = 0, | ||
| Index | output_vocabulary_size = 0, | ||
| Index | embedding_dimension = 0, | ||
| Index | heads_number = 0, | ||
| Index | feedforward_dimension = 0, | ||
| Index | layers_number = 0 ) |
Re-initializes the Transformer with new dimensions.
Arguments mirror the constructor.
| void opennn::Transformer::set_dropout_rate | ( | const float | ) |
Sets the dropout rate applied to the residual streams.
Receives the dropout probability (0 disables dropout).
| void opennn::Transformer::set_input_vocabulary | ( | const vector< string > & | ) |
Replaces the input-side vocabulary.
Receives the new token list (order defines token ids).
| void opennn::Transformer::set_output_vocabulary | ( | const vector< string > & | ) |
Replaces the output-side vocabulary.
Receives the new token list (order defines token ids).