|
OpenNN
Open-source neural networks library
|
Projects (input_features) into (heads * head_dim) and reshapes for multi-head attention. More...
#include <operators.h>
Public Member Functions | |
| void | set (Index input_features, Index heads_number, Index head_dimension, Type compute_dtype) |
| Configures the projection geometry. | |
| vector< TensorSpec > | parameter_specs () const override |
| Returns the tensor specs of trainable parameters owned by this operator. | |
| void | link_parameters (span< const TensorView > views) override |
| Binds parameter views provided by the hosting layer. | |
| void | link_gradients (span< const TensorView > views) override |
| Binds gradient views provided by the hosting layer. | |
| void | set_parameters_random () override |
| Initializes parameters with random values. | |
| void | set_parameters_glorot () override |
| Initializes parameters using Glorot (Xavier) initialization. | |
| void | forward_propagate (ForwardPropagation &fp, size_t layer, bool is_training) noexcept override |
| Runs the operator's forward computation. | |
| void | back_propagate (ForwardPropagation &fp, BackPropagation &bp, size_t layer) const noexcept override |
| Runs the operator's backward computation, accumulating into gradient/delta buffers. | |
| void | apply (const TensorView &input, TensorView &head_output, float *scratch) |
Projects input and reshapes the result into per-head form in head_output. | |
| void | apply_delta (const TensorView &head_delta, const TensorView &input, TensorView &input_delta, bool accumulate, float *scratch) const |
| Computes input_delta from per-head gradients and updates the projection weight gradient. | |
Public Member Functions inherited from opennn::Operator | |
| virtual | ~Operator ()=default |
| virtual vector< TensorSpec > | state_specs () const |
| Returns the tensor specs of persistent state owned by this operator. | |
| virtual void | link_states (span< const TensorView >) |
| Binds state views provided by the hosting layer. | |
| virtual void | to_JSON (JsonWriter &) const |
| Serializes the operator configuration to a JSON writer. | |
| virtual void | from_JSON (const Json *) |
| Restores the operator configuration from a JSON node. | |
| virtual void | load_state_from_JSON (const Json *) |
| Restores persistent state (e.g. running statistics) from a JSON node. | |
| virtual void | destroy_cuda () |
| Releases CUDA resources owned by the operator; called from destructors. | |
| TensorView & | get_input (ForwardPropagation &fp, size_t layer, size_t i=0) const noexcept |
| vector< TensorView > & | get_inputs (ForwardPropagation &fp, size_t layer, size_t i=0) const noexcept |
| TensorView & | get_output (ForwardPropagation &fp, size_t layer, size_t i=0) const noexcept |
| TensorView & | get_output_delta (BackPropagation &bp, size_t layer, size_t i=0) const noexcept |
| TensorView & | get_input_delta (BackPropagation &bp, size_t layer, size_t i=0) const noexcept |
Public Attributes | |
| CombinationOp | combination |
| Index | input_features = 0 |
| Index | heads_number = 0 |
| Index | head_dimension = 0 |
| Type | compute_dtype = Type::FP32 |
| size_t | input_view_index = 0 |
| vector< size_t > | scratch_slots |
| vector< size_t > | input_delta_slots_self |
| vector< size_t > | input_delta_slots_cross |
| bool | accumulate_input_delta_self = false |
| bool | accumulate_input_delta_cross = false |
Public Attributes inherited from opennn::Operator | |
| vector< size_t > | input_slots = {0} |
| vector< size_t > | output_slots = {1} |
| vector< size_t > | input_delta_slots = {1} |
| vector< size_t > | output_delta_slots = {0} |
Projects (input_features) into (heads * head_dim) and reshapes for multi-head attention.
| void opennn::MultiHeadProjectionOp::apply | ( | const TensorView & | input, |
| TensorView & | head_output, | ||
| float * | scratch ) |
Projects input and reshapes the result into per-head form in head_output.
| input | Input tokens (batch, seq, embed). |
| head_output | Output tensor (batch, heads, seq, head_dim). |
| scratch | Shared transpose-scratch buffer used during the reshape. |
| void opennn::MultiHeadProjectionOp::apply_delta | ( | const TensorView & | head_delta, |
| const TensorView & | input, | ||
| TensorView & | input_delta, | ||
| bool | accumulate, | ||
| float * | scratch ) const |
Computes input_delta from per-head gradients and updates the projection weight gradient.
| head_delta | Gradient w.r.t. the per-head output. |
| input | Forward-pass input tokens. |
| input_delta | Output gradient w.r.t. the input. |
| accumulate | If true, accumulates into input_delta instead of overwriting. |
| scratch | Shared transpose-scratch buffer. |
|
overridevirtualnoexcept |
Runs the operator's backward computation, accumulating into gradient/delta buffers.
| fp | Forward propagation workspace (read-only). |
| bp | Back propagation workspace receiving gradients and deltas. |
| layer | Index of the hosting layer in the workspace. |
Reimplemented from opennn::Operator.
|
overridevirtualnoexcept |
Runs the operator's forward computation.
| fp | Forward propagation workspace. |
| layer | Index of the hosting layer in the workspace. |
| is_training | If true, enables training-only behavior (e.g. dropout sampling). |
Reimplemented from opennn::Operator.
|
inlineoverridevirtual |
Binds gradient views provided by the hosting layer.
Reimplemented from opennn::Operator.
|
inlineoverridevirtual |
Binds parameter views provided by the hosting layer.
Reimplemented from opennn::Operator.
|
inlineoverridevirtual |
Returns the tensor specs of trainable parameters owned by this operator.
Reimplemented from opennn::Operator.
| void opennn::MultiHeadProjectionOp::set | ( | Index | input_features, |
| Index | heads_number, | ||
| Index | head_dimension, | ||
| Type | compute_dtype ) |
Configures the projection geometry.
| input_features | Embedding dimension of the input tokens. |
| heads_number | Number of attention heads. |
| head_dimension | Per-head feature size. |
| compute_dtype | Dtype used for the projection matmul. |
|
inlineoverridevirtual |
Initializes parameters using Glorot (Xavier) initialization.
Reimplemented from opennn::Operator.
|
inlineoverridevirtual |
Initializes parameters with random values.
Reimplemented from opennn::Operator.
| bool opennn::MultiHeadProjectionOp::accumulate_input_delta_cross = false |
| bool opennn::MultiHeadProjectionOp::accumulate_input_delta_self = false |
| CombinationOp opennn::MultiHeadProjectionOp::combination |
| Type opennn::MultiHeadProjectionOp::compute_dtype = Type::FP32 |
| Index opennn::MultiHeadProjectionOp::head_dimension = 0 |
| Index opennn::MultiHeadProjectionOp::heads_number = 0 |
| vector<size_t> opennn::MultiHeadProjectionOp::input_delta_slots_cross |
| vector<size_t> opennn::MultiHeadProjectionOp::input_delta_slots_self |
| Index opennn::MultiHeadProjectionOp::input_features = 0 |
| size_t opennn::MultiHeadProjectionOp::input_view_index = 0 |
| vector<size_t> opennn::MultiHeadProjectionOp::scratch_slots |