‹ Back to Tutorials

Configuring the Device and Precision

Every OpenNN program runs against a global configuration that controls which hardware the library uses and which numeric precision is applied to training and inference. This article explains the configuration API and the resolution rules OpenNN applies when defaults are requested.

Configuring once, at the top of main(), is enough — every dataset, network, and training strategy created afterward reads from the same singleton.

1. Introduction

OpenNN auto-detects the available hardware and selects a sensible default precision the first time the configuration is read. For most users this is enough — no configuration call is required and the library will run on the best available device.

When explicit control is needed — for example, to force CPU execution during debugging, to run training in BF16 on a specific GPU, or to mix FP32 training with BF16 inference for deployment — the same singleton accepts overrides on three independent axes.

2. The Three Configuration Axes

A configuration is fully described by three independent enums:

Device — Auto, CPU, or CUDA.
Training type — Auto, FP32, or BF16.
Inference type — Auto, FP32, or BF16.

These can be combined freely: train in FP32 on a CPU laptop and deploy inference in BF16 on a server GPU, or any other valid combination. The library validates the choice at startup and throws a descriptive error if the requested combination is not supported by the current hardware.

3. The Minimal API

All configuration is performed through the Configuration singleton.

#include "../../opennn/configuration.h"

using namespace opennn;

int main()
{
    // Option 1: let OpenNN pick the fastest valid combination
    Configuration::instance().set();

    // Option 2: explicit
    Configuration::instance().set(Device::CUDA, Type::FP32, Type::FP32);
}

The signature of set() takes three arguments: the device, the training type, and the inference type. All three default to Auto, so calling set() with no arguments delegates the choice entirely to OpenNN.

4. Auto Resolution

When Auto is passed (or no arguments are given at all), OpenNN inspects the runtime hardware once, computes the most efficient valid combination, and caches the result. The resolution rules are:

Device::Auto resolves to CUDA if a GPU is visible, otherwise CPU.
Type::Auto on a GPU resolves to BF16 if the GPU is Ampere or newer (compute capability of 8.0 or higher), otherwise FP32.
Type::Auto on CPU always resolves to FP32 (BF16 is GPU-only in OpenNN).
Inference Type::Auto mirrors the training type, so the BF16 working copy is reused for inference without an extra cast.

This means that on an Ampere or newer machine, a single call to Configuration::instance().set() with no arguments enables CUDA execution and BF16 mixed precision automatically.

5. Common Configurations

The following calls cover most real-world scenarios:

Development on a CPU laptop — Configuration::instance().set(Device::CPU, Type::FP32, Type::FP32);
Standard GPU training, full precision — Configuration::instance().set(Device::CUDA, Type::FP32, Type::FP32);
Mixed-precision GPU training on Ampere or newer — Configuration::instance().set(Device::CUDA, Type::BF16, Type::BF16);
FP32 training with BF16 inference for deployment — Configuration::instance().set(Device::CUDA, Type::FP32, Type::BF16);
Whatever is fastest on this machine — Configuration::instance().set();

The first variant is useful when iterating on model design without a GPU. The last is the recommended default for production code that may run on a mixed fleet of CPUs and GPUs.

6. Validation and Errors

The configuration is resolved lazily the first time anything reads from it. Invalid combinations throw immediately so a long training run does not start under a misconfiguration:

Device::CUDA when no GPU is visible — throws «CUDA requested but no GPU detected.»
Type::BF16 on CPU — throws «BF16 requires CUDA.»
Type::BF16 on a pre-Ampere GPU — throws «BF16 requires compute capability >= 8.0.»
Type::INT8 on any device — throws «INT8 not yet supported (placeholder).»

The clear error messages make hardware mismatches easy to diagnose.

7. Querying the Current State

Code that conditionally specializes for a given device or precision can query the resolved configuration through free helper functions:

if (is_gpu())
{
    // CUDA path
}

if (is_bf16_training())
{
    // BF16 training enabled
}

These helpers read from the cached resolved configuration, so calling them has no performance cost beyond an atomic load.

8. Conclusions

The Configuration singleton is the only knob that controls device and precision in OpenNN:

Call Configuration::instance().set(…) once before anything else.
Default arguments pick the fastest valid setup; explicit arguments give full control.
OpenNN validates the combination at startup, so misconfigurations fail loudly instead of silently.

For most projects, calling set() with no arguments is sufficient. Explicit configuration is needed only when the desired device or precision differs from the auto-detected default.

Contents: