‹ Back to Benchmarks

Numerical accuracy: OpenNN vs PyTorch vs TensorFlow

OpenNN matches the predictive accuracy of PyTorch and TensorFlow on a controlled nonlinear regression benchmark, while keeping the smaller native footprint described in the rest of this benchmark series.

The other notes in this series compare deployment size, startup latency, dependencies, memory use,

and export friction. This one asks the natural follow-up question: does OpenNN’s lighter native

design cost anything in numerical accuracy?

Contents

The result

Final test-set accuracy on the Rosenbrock approximation benchmark, averaged over 5 random seeds:

Framework Mean MSE Mean R2
OpenNN 0.0116 0.9879
PyTorch 0.0114 0.9882
TensorFlow 0.0129 0.9867

The three results are statistically indistinguishable: all reach R2 around 0.987-0.988. OpenNN’s training is numerically on par with the major frameworks, at a fraction of their footprint.

Benchmark setup

The task is the Rosenbrock approximation benchmark: inputs x_i ~ U(-1, 1), with target y = Σ_i [(1 - x_i)^2 + 100 (x_{i+1} - x_i^2)^2]. We generate 10,000 samples with 10 input variables, z-normalize the dataset once, and split it 80/20 into train and test sets.

The exact same normalized files feed OpenNN, PyTorch, and TensorFlow. This removes per-framework preprocessing differences from the comparison.

Item Value
Architecture 10 -> 50 tanh -> 50 tanh -> 1 linear
Initialization Glorot/Xavier uniform
Loss Mean squared error
Optimizer Adam, learning rate 0.001, beta1 0.9, beta2 0.999
Epochs / batch size 200 / 64
Data One shared normalized train/test split
Runs 5 random seeds per framework

Accuracy is computed by a single neutral scorer that reads each framework’s test-set predictions and computes MSE and R2 in the same way. That prevents differences in internal loss definitions or reduction conventions from affecting the comparison.

Why this matters

A small native library is only useful if it is also correct. This benchmark shows that OpenNN’s forward pass, backpropagation, and Adam optimizer produce the same quality of fit as two independently developed, industry-standard frameworks.

That matters for deployment: the footprint and startup advantages in the other benchmark notes come with no accuracy penalty on this controlled regression task.

Caveats

  • This is an accuracy-parity check on a controlled regression task, not a claim that one framework is universally more accurate than another.
  • The benchmark was measured on Linux x86_64. OpenNN was built with g++ 13.3 on CPU. PyTorch was 2.12.0+cpu and TensorFlow was 2.21.0 on CPython 3.12.
  • The reported values are averaged over five seeds. The spread is small; OpenNN’s R2 range was approximately 0.985-0.991.
  • Different Adam epsilon defaults and tiny initialization differences can create per-seed variation, but they do not change the conclusion.

References