Scaled dot-product attention with multiple heads and learned linear projections.
More...
|
| | MultiHeadAttention (const Shape &input_shape=Shape({0, 0}), Index heads_number=0, const string &label=string()) |
| | Constructs a self-attention layer.
|
| |
| | MultiHeadAttention (const Shape &new_query_dimensions, const Shape &new_source_dimensions, Index heads_number=0, const string &label=string()) |
| | Constructs a cross-attention layer.
|
| |
| Shape | get_input_shape () const override |
| | Returns the per-sample input shape.
|
| |
| Shape | get_output_shape () const override |
| | Returns the per-sample output shape.
|
| |
| Index | get_query_sequence_length () const |
| | Length of the query side sequence.
|
| |
| Index | get_source_sequence_length () const |
| | Length of the source side sequence (equal to query length for self-attention).
|
| |
| Index | get_embedding_dimension () const |
| | Width of the embedding (model) dimension.
|
| |
| Index | get_heads_number () const |
| | Number of attention heads.
|
| |
| Index | get_head_dimension () const |
| | Per-head feature dimension.
|
| |
| Shape | get_heads_shape (Index batch_size) const |
| | Shape of the per-head attention scratch buffer.
|
| |
| Shape | get_concat_shape (Index batch_size) const |
| | Shape of the concatenated attention output before projection.
|
| |
| vector< Operator * > | get_operators () override |
| | Returns the active operators in pipeline order.
|
| |
| vector< pair< Shape, Type > > | get_forward_specs (Index batch_size) const override |
| | Specifications of the forward intermediate buffers.
|
| |
| vector< pair< Shape, Type > > | get_backward_specs (Index batch_size) const override |
| | Specifications of the backward intermediate buffers.
|
| |
| void | set (Index query_sequence_length=0, Index source_sequence_length=0, Index embedding_dimension=0, Index heads_number=0, bool use_causal_mask=false, const string &label="multihead_attention_layer") |
| | Re-initializes the layer.
|
| |
| void | set_input_shape (const Shape &new_input_shape) override |
| | Updates the input shape; rejects shapes whose rank is not 2.
|
| |
| void | on_compute_dtype_changed () override |
| | Propagates a compute dtype change to all sub-operators.
|
| |
| void | set_dropout_rate (float new_dropout_rate) |
| | Sets the dropout rate applied to attention weights.
|
| |
| void | forward_propagate (ForwardPropagation &, size_t, bool) noexcept override |
| | Forward pass: Q/K/V projections, scaled dot-product attention, head concatenation, output projection.
|
| |
| void | back_propagate (ForwardPropagation &, BackPropagation &, size_t) const noexcept override |
| | Backward pass through every operator in reverse order.
|
| |
| void | read_JSON_body (const Json *) override |
| | Reads the layer-specific JSON body (heads, sequences, dimension, causal flag, dropout).
|
| |
| void | write_JSON_body (JsonWriter &) const override |
| | Writes the layer-specific JSON body (heads, sequences, dimension, causal flag, dropout).
|
| |
| virtual | ~Layer ()=default |
| | Virtual destructor; subclasses are owned via unique_ptr<Layer>.
|
| |
| const string & | get_label () const |
| | Returns the user-assigned label of this layer.
|
| |
| const string & | get_name () const |
| | Returns the canonical type name of this layer.
|
| |
| LayerType | get_type () const |
| | Returns the LayerType enumerator for this layer.
|
| |
| virtual void | set_output_shape (const Shape &) |
| | Sets the per-sample output shape of this layer.
|
| |
| void | set_label (string new_label) |
| | Sets the human-readable label of this layer.
|
| |
| Index | get_parameters_number () const |
| | Total number of trainable parameters in this layer.
|
| |
| virtual vector< pair< Shape, Type > > | get_parameter_specs () const |
| | Specifications of the trainable parameter tensors owned by this layer.
|
| |
| virtual vector< pair< Shape, Type > > | get_state_specs () const |
| | Specifications of the persistent state tensors of this layer.
|
| |
| vector< Shape > | get_parameter_shapes () const |
| | Shape-only view of get_parameter_specs().
|
| |
| vector< Shape > | get_state_shapes () const |
| | Shape-only view of get_state_specs().
|
| |
| vector< Shape > | get_forward_shapes (Index b) const |
| | Shape-only view of get_forward_specs() for batch size b.
|
| |
| vector< Shape > | get_backward_shapes (Index b) const |
| | Shape-only view of get_backward_specs() for batch size b.
|
| |
| vector< Type > | get_parameter_dtypes () const |
| | Dtype-only view of get_parameter_specs().
|
| |
| vector< Type > | get_forward_dtypes (Index b) const |
| | Dtype-only view of get_forward_specs() for batch size b.
|
| |
| vector< Type > | get_backward_dtypes (Index b) const |
| | Dtype-only view of get_backward_specs() for batch size b.
|
| |
| virtual Activation::Function | get_output_activation () const |
| | Activation function fused at the end of this layer, if any.
|
| |
| Index | get_inputs_number () const |
| | Total number of scalar inputs per sample (product of input dims).
|
| |
| Index | get_outputs_number () const |
| | Total number of scalar outputs per sample (product of output dims).
|
| |
| virtual void | from_JSON (const JsonDocument &document) |
| | Loads the layer configuration (hyperparameters) from JSON.
|
| |
| virtual void | load_state_from_JSON (const JsonDocument &document) |
| | Loads parameter and state tensors from a JSON document.
|
| |
| virtual void | to_JSON (JsonWriter &writer) const |
| | Writes the layer configuration to JSON.
|
| |
| virtual void | print () const |
| | Prints a human-readable summary of the layer to stdout.
|
| |
| bool | get_is_trainable () const |
| | Whether this layer has trainable parameters.
|
| |
| Type | get_compute_dtype () const |
| | Numerical type used for forward/backward computation.
|
| |
| void | set_compute_dtype (Type new_compute_dtype) |
| | Sets the compute dtype and triggers on_compute_dtype_changed().
|
| |
| virtual float * | link_parameters (float *pointer) |
| | Wires this layer's parameter TensorViews onto an external buffer.
|
| |
| virtual float * | link_states (float *pointer) |
| | Wires this layer's state TensorViews onto an external buffer.
|
| |
| vector< TensorView > & | get_parameter_views () |
| | Mutable access to this layer's parameter TensorViews.
|
| |
| const vector< TensorView > & | get_parameter_views () const |
| | Read-only access to this layer's parameter TensorViews.
|
| |
| vector< TensorView > & | get_state_views () |
| | Mutable access to this layer's state TensorViews.
|
| |
| const vector< TensorView > & | get_state_views () const |
| | Read-only access to this layer's state TensorViews.
|
| |
| void | redistribute_parameters_to_operators () |
| | Forwards the current parameter views down to each composing Operator.
|
| |
| void | redistribute_parameter_gradients_to_operators (vector< TensorView > &gradient_views) |
| | Forwards externally provided gradient views down to each Operator.
|
| |
| void | redistribute_states_to_operators () |
| | Forwards the current state views down to each composing Operator.
|
| |
Scaled dot-product attention with multiple heads and learned linear projections.
Wraps four Combination/MultiHeadProjection operators (query, key, value, output projection) and one Attention operator that performs the scaled dot-product attention with optional dropout.
Two input modes are supported:
- Self-attention: a single rank-2 input is used as query, key and value.
- Cross-attention: two rank-2 inputs (query side and source side); used in the decoder's encoder-decoder attention.