Encoder-decoder Transformer (Vaswani et al., 2017) for sequence-to-sequence modeling. More...

#include <standard_networks.h>

Inheritance diagram for opennn::Transformer:

Public Member Functions
	Transformer (const Index input_sequence_length=0, Index decoder_sequence_length=0, Index input_vocabulary_size=0, Index output_vocabulary_size=0, Index embedding_dimension=0, Index heads_number=0, Index feedforward_dimension=0, Index layers_number=0)
	Constructs an untrained Transformer.

void	set (const Index input_sequence_length=0, Index decoder_sequence_length=0, Index input_vocabulary_size=0, Index output_vocabulary_size=0, Index embedding_dimension=0, Index heads_number=0, Index feedforward_dimension=0, Index layers_number=0)
	Re-initializes the Transformer with new dimensions.

Index	get_input_sequence_length () const
	Length of the encoder input sequence.

Index	get_decoder_sequence_length () const
	Length of the decoder input sequence.

Index	get_embedding_dimension () const
	Embedding (model) dimension.

Index	get_heads_number () const
	Number of attention heads per layer.

void	set_dropout_rate (const float)
	Sets the dropout rate applied to the residual streams.

void	set_input_vocabulary (const vector< string > &)
	Replaces the input-side vocabulary.

void	set_output_vocabulary (const vector< string > &)
	Replaces the output-side vocabulary.

string	calculate_outputs (const string &input)
	Greedy decoding given a raw input string.

Public Member Functions inherited from opennn::NeuralNetwork
	NeuralNetwork ()
	Default-constructs an empty network.

virtual	~NeuralNetwork ()=default
	Defaulted virtual destructor for safe polymorphic deletion.

	NeuralNetwork (const filesystem::path &file_name)
	Constructs a network by loading a serialized model from disk.

void	add_layer (unique_ptr< Layer > new_layer, const vector< Index > &input_indices=vector< Index >())
	Appends a layer to the stack.

const Configuration::Resolved &	get_config () const
	Returns the resolved device/precision configuration.

bool	is_gpu () const
	Reports whether the network is configured to run on CUDA.

bool	is_cpu () const
	Reports whether the network is configured to run on CPU.

Type	get_training_type () const
	Returns the precision used for training.

Type	get_inference_type () const
	Returns the precision used for inference.

vector< vector< Shape > >	get_parameter_shapes () const
	Returns the parameter shapes of every layer.

vector< vector< Shape > >	get_state_shapes () const
	Returns the persistent state shapes of every layer.

vector< vector< Shape > >	get_forward_shapes (Index batch_size) const
	Returns the forward-propagation buffer shapes for a given batch size.

vector< vector< Shape > >	get_backward_shapes (Index batch_size) const
	Returns the backward-propagation buffer shapes for a given batch size.

Index	get_states_size () const
	Returns the total size in floats of all persistent state buffers.

Index	get_forward_size (Index batch_size) const
	Returns the total size in floats of the forward-propagation workspace.

Index	get_backward_size (Index batch_size) const
	Returns the total size in floats of the backward-propagation workspace.

void	compile ()
	Allocates parameters and resolves layer-input indices.

bool	has (const string &layer_name) const
	Reports whether the network contains a layer of the given name.

bool	has (LayerType type) const
	Reports whether the network contains a layer of the given type.

bool	is_empty () const
	Reports whether the network has zero layers.

float *	get_parameters_data ()
	Returns the parameter buffer as a raw float pointer.

const float *	get_parameters_data () const
	Returns the parameter buffer as a raw float pointer (const overload).

Index	get_parameters_size () const
	Returns the parameter count in floats.

const vector< Variable > &	get_input_variables () const
	Returns the input variables describing each input feature.

const vector< string >	get_input_feature_names () const
	Returns the names of the input features.

const vector< Variable > &	get_output_variables () const
	Returns the output variables describing each output target.

const vector< string >	get_output_feature_names () const
	Returns the names of the output features.

const vector< unique_ptr< Layer > > &	get_layers () const
	Returns the layer stack.

const unique_ptr< Layer > &	get_layer (const Index i) const
	Returns the layer at a given index.

const unique_ptr< Layer > &	get_layer (const string &layer_name) const
	Returns the layer with a given name.

Index	get_layer_index (const string &layer_name) const
	Returns the index of the layer with a given name.

const vector< vector< Index > > &	get_layer_input_indices () const
	Returns the per-layer input-layer indices.

vector< vector< Index > >	get_layer_output_indices () const
	Returns the per-layer output-layer indices.

Layer *	get_first (const string &layer_name)
	Returns the first layer of a given type by name.

Layer *	get_first (LayerType type)
	Returns the first layer of a given type.

const Layer *	get_first (const string &layer_name) const
	Returns the first layer of a given type by name (const overload).

const Layer *	get_first (LayerType type) const
	Returns the first layer of a given type (const overload).

void	set_layers_number (const Index new_layers_number)
	Resizes the layer stack.

void	set_layer_input_indices (const vector< vector< Index > > &new_layer_input_indices)
	Replaces the entire layer-input wiring.

void	set_layer_input_indices (const Index layer_index, const vector< Index > &new_input_indices)
	Replaces the input wiring of a single layer by index.

void	set_layer_input_indices (const string &layer_name, const vector< string > &input_layer_names)
	Replaces the input wiring of a single layer by name.

void	set_layer_input_indices (const string &layer_name, initializer_list< string > input_layer_names)
	Replaces the input wiring of a single layer by name (initializer-list overload).

void	set_layer_input_indices (const string &layer_name, const string &input_layer_name)
	Sets a single source layer as input to a destination layer.

void	set_input_variables (const vector< Variable > &new_input_variables)
	Replaces the input variables.

void	set_output_variables (const vector< Variable > &new_output_variables)
	Replaces the output variables.

void	set_input_names (const vector< string > &new_input_names)
	Sets the names of the input features.

void	set_output_names (const vector< string > &new_output_names)
	Sets the names of the output features.

void	set_input_shape (const Shape &new_input_shape)
	Sets the input shape of the network.

void	set_default ()
	Resets non-architectural state to defaults.

Index	get_layers_number () const
	Returns the number of layers.

Index	get_layers_number (const string &layer_name) const
	Returns the number of layers of a given type by name.

Index	get_layers_number (LayerType type) const
	Returns the number of layers of a given type.

Index	get_first_trainable_layer_index () const
	Returns the index of the first trainable layer.

Index	get_last_trainable_layer_index () const
	Returns the index of the last trainable layer.

Index	get_inputs_number () const
	Returns the total number of input features.

Index	get_outputs_number () const
	Returns the total number of output features.

Shape	get_input_shape () const
	Returns the network's input shape.

Shape	get_output_shape () const
	Returns the network's output shape.

Activation::Function	get_output_activation () const
	Returns the activation function of the last layer.

Index	get_parameters_number () const
	Returns the total number of trainable parameters.

vector< Index >	get_layer_parameter_numbers () const
	Returns the parameter count of every layer.

void	set_parameters (const VectorR &new_parameters)
	Replaces all trainable parameters.

void	set_parameters_random ()
	Initializes parameters with uniform random values.

void	set_parameters_glorot ()
	Initializes parameters with Glorot (Xavier) uniform values.

MatrixR	calculate_outputs (const vector< TensorView > &inputs)
	Computes outputs from a list of input tensor views.

MatrixR	calculate_outputs (const MatrixR &inputs)
	Computes outputs for tabular inputs.

MatrixR	calculate_outputs (const Tensor3 &inputs)
	Computes outputs for rank-3 inputs (e.g. sequence data).

MatrixR	calculate_outputs (const Tensor4 &inputs)
	Computes outputs for rank-4 inputs (e.g. images).

MatrixR	calculate_directional_inputs (const Index direction, const VectorR &point, float minimum, float maximum, Index points_number=101) const
	Computes outputs along a directional sweep of one input.

Tensor3	calculate_outputs (const Tensor3 &inputs, const Tensor3 &context)
	Computes outputs from two rank-3 inputs (encoder/decoder pair).

Index	calculate_image_output (const filesystem::path &image_path)
	Runs the network on a single image and returns the predicted class.

MatrixR	calculate_text_outputs (const Tensor< string, 1 > &texts)
	Runs the network on a batch of strings and returns class outputs.

void	from_JSON (const JsonDocument &document)
	Restores the network from a JSON document.

void	to_JSON (JsonWriter &writer) const
	Serializes the network to JSON.

void	save (const filesystem::path &file_name) const
	Saves the full network (architecture + parameters) to a JSON file.

void	save_parameters (const filesystem::path &file_name) const
	Saves only the parameters in binary form.

void	load (const filesystem::path &file_name)
	Loads the full network (architecture + parameters) from a JSON file.

void	load_parameters_binary (const filesystem::path &file_name)
	Loads parameters from a binary file produced by save_parameters().

vector< string >	get_names_string () const
	Returns the names of every input and output feature.

void	save_outputs (MatrixR &outputs, const filesystem::path &file_name)
	Saves a tabular outputs tensor to a CSV file.

void	save_outputs (Tensor3 &outputs, const filesystem::path &file_name)
	Saves a rank-3 outputs tensor to a CSV file.

void	forward_propagate (const vector< TensorView > &inputs, ForwardPropagation &forward, bool is_training=false) const
	Runs the forward pass and writes intermediate activations into `forward`.

void	forward_propagate (const vector< TensorView > &inputs, const VectorR &parameters, ForwardPropagation &forward)
	Runs the forward pass with explicitly supplied parameters.

vector< string >	get_layer_labels () const
	Returns a label for every layer (name + key hyperparameters).

Additional Inherited Members
Protected Attributes inherited from opennn::NeuralNetwork
string	name = "neural_network"
	Network identifier; used as a JSON tag.

vector< Variable >	input_variables
	Description of every input feature (role, scaler, type).

vector< Variable >	output_variables
	Description of every output feature (role, scaler, type).

vector< unique_ptr< Layer > >	layers
	Owned layers in execution order.

vector< vector< Index > >	layer_input_indices
	Per-destination-layer list of source-layer indices.

Buffer	parameters
	Flat parameter buffer shared by all layers.

Buffer	parameters_bf16 {Device::CUDA}
	BF16 mirror of `parameters` for mixed-precision CUDA training.

vector< vector< vector< TensorView > > >	parameter_views
	Per-layer per-parameter-group views into `parameters`.

Buffer	states
	Flat persistent-state buffer shared by stateful layers (e.g. Recurrent).

Configuration::Resolved	config
	Resolved device/precision configuration applied at compile time.

Detailed Description

Encoder-decoder Transformer (Vaswani et al., 2017) for sequence-to-sequence modeling.

Constructor & Destructor Documentation

◆ Transformer()

opennn::Transformer::Transformer	(	const Index	input_sequence_length = 0,
		Index	decoder_sequence_length = 0,
		Index	input_vocabulary_size = 0,
		Index	output_vocabulary_size = 0,
		Index	embedding_dimension = 0,
		Index	heads_number = 0,
		Index	feedforward_dimension = 0,
		Index	layers_number = 0 )

Constructs an untrained Transformer.

Parameters

input_sequence_length	Length of the encoder input sequence.
decoder_sequence_length	Length of the decoder input sequence.
input_vocabulary_size	Size of the input-side vocabulary.
output_vocabulary_size	Size of the output-side vocabulary.
embedding_dimension	Embedding (model) dimension.
heads_number	Number of attention heads per layer.
feedforward_dimension	Hidden size of the position-wise feed-forward sublayers.
layers_number	Number of encoder (and decoder) layers.

Member Function Documentation

◆ calculate_outputs()

string opennn::Transformer::calculate_outputs ( const string & input )

Greedy decoding given a raw input string.

Parameters

input Raw input text to encode.

Returns: Decoded output string.

◆ get_decoder_sequence_length()

Index opennn::Transformer::get_decoder_sequence_length ( ) const

Length of the decoder input sequence.

◆ get_embedding_dimension()

Index opennn::Transformer::get_embedding_dimension ( ) const

Embedding (model) dimension.

◆ get_heads_number()

Index opennn::Transformer::get_heads_number ( ) const

Number of attention heads per layer.

◆ get_input_sequence_length()

Index opennn::Transformer::get_input_sequence_length ( ) const

Length of the encoder input sequence.

◆ set()

void opennn::Transformer::set	(	const Index	input_sequence_length = 0,
		Index	decoder_sequence_length = 0,
		Index	input_vocabulary_size = 0,
		Index	output_vocabulary_size = 0,
		Index	embedding_dimension = 0,
		Index	heads_number = 0,
		Index	feedforward_dimension = 0,
		Index	layers_number = 0 )

Re-initializes the Transformer with new dimensions.

Arguments mirror the constructor.

◆ set_dropout_rate()

void opennn::Transformer::set_dropout_rate ( const float )

Sets the dropout rate applied to the residual streams.

Receives the dropout probability (0 disables dropout).

◆ set_input_vocabulary()

void opennn::Transformer::set_input_vocabulary ( const vector< string > & )

Replaces the input-side vocabulary.

Receives the new token list (order defines token ids).

◆ set_output_vocabulary()

void opennn::Transformer::set_output_vocabulary ( const vector< string > & )

Replaces the output-side vocabulary.