‹ Back to Benchmarks

Training the HIGGS Benchmark with OpenNN and PyTorch

OpenNN trains the classic HIGGS deep-learning benchmark faster than an optimized PyTorch CUDA Graphs implementation, while reaching the same predictive quality.

This article compares the best measured GPU training run from OpenNN with the best measured PyTorch run for the same HIGGS benchmark. Both executions use the same dataset split, the same neural network topology and CUDA Graph-class execution.

Benchmark application
Reference computer
Methodology
Results
Discussion
Conclusions
References

Benchmark application

The HIGGS dataset is a binary classification benchmark from high-energy physics. Each sample represents a simulated particle-collision event, and the task is to distinguish a Higgs-boson signal process from background events.

Item	Configuration
Dataset	HIGGS, 11,000,000 simulated particle-collision events
Split	10,000,000 training / 500,000 validation / 500,000 testing
Inputs	28 real-valued features; first column is the binary target
Network	5 hidden dense layers, 300 tanh units per hidden layer, sigmoid output
Parameters	370,201 trainable parameters
Optimizer	Stochastic gradient descent with momentum
Batch size	100
Learning rate	0.05 initial learning rate, decay 0.0202, momentum 0.9
Regularization	L2 weight decay coefficient 1e-5
Metric	Area under the ROC curve (AUC) on the testing split

Reference computer

Component	Reference system
Operating system	Ubuntu Linux 22.04, kernel 6.8
CPU	12th Gen Intel Core i9-12900K, 24 logical CPUs
RAM	62 GiB
GPU	NVIDIA GeForce RTX 4080, 16 GB VRAM
NVIDIA driver	555.42.02
PyTorch	2.6.0+cu124
OpenNN	C++20 development build with CUDA FP32 backend

Methodology

The benchmark reports one representative measured run for each framework. Training time excludes CSV preprocessing and measures the training loop only, including validation at the end of each epoch. Testing metrics are calculated after training and are not included in the training time.

Epoch numbering starts at 0, so the run labelled epoch 20 contains 21 complete passes over the training split.

Framework	Execution path
OpenNN	CUDA Graph training path with GPU-resident dataset and device-side batch gathering
PyTorch	Optimized CUDA Graphs implementation with static tensors updated in-place

The comparison below keeps only the strongest measured run from each framework: OpenNN with CUDA Graph execution and a GPU-resident dataset, and PyTorch optimized with CUDA Graphs.

Results

Training throughput on HIGGS

Samples per second · RTX 4080 · batch size 100 · epochs 0-20 HIGHER IS BETTER ->

OpenNN · CUDA Graph + GPU dataset

1.01M

PyTorch · CUDA Graphs

563k

OpenNN reaches 1.79x the training throughput of the optimized PyTorch CUDA Graphs run in this benchmark.

Framework / run	Training time	Avg. epoch time	Throughput	Test AUC	Test accuracy @ 0.5
OpenNN, CUDA Graph + GPU-resident dataset, final parameters at epoch 20	208.23 s	9.92 s	1.01M samples/s	0.859371	0.77382
PyTorch optimized with CUDA Graphs, final parameters at epoch 20	372.82 s	17.75 s	563k samples/s	0.857541	0.77301

Comparison	Result
OpenNN speed vs optimized PyTorch CUDA Graphs	1.79x faster
Epoch-time reduction	44.1% lower epoch time
Throughput increase	79.1% more samples per second
Predictive quality	Same class of testing AUC and accuracy

Discussion

The important result is that OpenNN is not only faster than a transparent Python training loop; it is faster than the optimized PyTorch CUDA Graphs version used for this workload. The OpenNN run completes epochs 0-20 in 208.23 seconds, compared with 372.82 seconds for PyTorch CUDA Graphs.

The reason is that the optimized OpenNN path keeps the dataset resident on the GPU and uses CUDA Graph execution for the training step. This removes the per-batch host orchestration cost and avoids repeatedly staging the same tabular data through CPU-side batch workers. For a small-batch workload such as HIGGS with batch size 100, that overhead matters.

Predictive quality remains aligned. OpenNN reaches a test AUC of 0.859371, while the optimized PyTorch run reaches 0.857541. The difference is small; the main conclusion is about training speed at equivalent model quality.

Conclusions

OpenNN trains the HIGGS 5×300 benchmark in 208.23 seconds for epochs 0-20.
The optimized PyTorch CUDA Graphs run takes 372.82 seconds for the same benchmark setup.
OpenNN is 1.79x faster, with 44.1% lower average epoch time.
Both frameworks reach essentially the same predictive quality on the testing split.

References

HIGGS dataset, UCI Machine Learning Repository.
P. Baldi, P. Sadowski and D. Whiteson, Searching for exotic particles in high-energy physics with deep learning, Nature Communications, 2014.
PyTorch.
OpenNN.

Contents