‹ Back to Benchmarks

Peak memory: OpenNN vs PyTorch vs TensorFlow

Disk size and startup time are two costs of a heavy runtime; RAM is a third. On a constrained target — a small edge device, a memory-capped container, a function with a tight memory limit, or simply a machine running many model processes at once — what matters is how much resident memory a model process actually holds. A framework that loads a large runtime pays for it in RAM the moment it starts, before any data is touched.

The numbers
What the numbers show
Why OpenNN uses so little
Why it matters
Caveats
References

The numbers

	OpenNN	PyTorch	TensorFlow
Baseline RSS (model built, no training)	9 MB	221 MB	485 MB
Peak RSS (during training)	9 MB	295 MB	521 MB
Peak vs OpenNN	1×	≈32×	≈56×

All three programs do the same thing on the same data: load sum.csv (1,000 rows × 100 numeric inputs + 1 target), build a 100 → 64 → 1 MLP, and train for 50 epochs (Adam, batch size 32, single-threaded). Each reports its own peak RSS via the OS (getrusage / resource).

What the numbers show

OpenNN holds ~9 MB and barely moves. Baseline and peak are the same to within measurement noise — the dataset and training buffers are tiny next to the already-small working set. The whole process, code and data, fits in single-digit megabytes.
PyTorch starts at ~221 MB before training — the Python interpreter plus the libtorch runtime (and NumPy) resident in memory — and rises to ~295 MB during training as autograd and optimizer buffers are allocated.
TensorFlow starts at ~485 MB and rises to ~521 MB — the Keras/TF runtime carries an even larger resident footprint than PyTorch before any training.

Why OpenNN uses so little

OpenNN is a native binary with the library linked in and Eigen (header-only) for math. The resident memory is essentially the model parameters, the data, and a small amount of code — there is no interpreter and no large general-purpose tensor runtime mapped into the process. PyTorch keeps the Python runtime and the full libtorch engine resident for the life of the process, which sets a high floor independent of model size.

Why it matters

Memory-capped containers / functions: a 256 MB function can host the OpenNN process many times over; PyTorch’s baseline alone nearly fills it before any work.
Many concurrent model processes: at ~9 MB each, you can run far more OpenNN workers per machine than ~250–300 MB PyTorch ones.
Small edge devices: RAM is often scarcer than disk; a single-digit-MB footprint leaves room for the rest of the application.

Caveats

This is a memory benchmark on a small model, chosen so the numbers reflect framework overhead — the structural difference — rather than a specific large workload. A bigger model or dataset adds parameter/activation memory to both sides on top of these baselines.
Measured on Linux x86_64, single-threaded (OMP_NUM_THREADS=1) for both, to avoid thread-pool arenas inflating RSS differently. OpenNN built with g++ 13.3 (CPU-only); PyTorch 2.12.0+cpu on CPython 3.12 with NumPy installed.
RSS is the OS’s peak resident size (ru_maxrss); absolute values vary with allocator, glibc, and thread settings, but the order-of-magnitude gap is structural.
CPU-only on both sides. A CUDA build adds GPU-side memory, which this note does not cover.

Contents

The numbers

What the numbers show

Why OpenNN uses so little

Why it matters

Caveats

References