Neural networks at native C++ speed.

No Python. No PyTorch dependency. Drop a single header and get a complete training pipeline — backed by a custom slab allocator and Apple AMX acceleration.

Zero AllocationC++17Apple AMXCMake ≥ 3.15Eigen 3Single Header
Read the DocsQuick Install
bash — 80×24
1.6M
Samples / sec
19×
Faster than single-sample
88%
EMNIST test accuracy
0
malloc in hot path
Core Principles

Engineered with intention.

Every design decision in Sandokan traces back to a single constraint: training must be fast, deterministic, and portable — without dragging in a Python runtime.

Memory

PMAD Slab Allocator

All gradient buffers are served from a pre-allocated contiguous slab. Zero malloc / free during training. No heap fragmentation over long runs.

Compute

Apple AMX Acceleration

Batched GEMM via Apple Accelerate and AMX co-processors. Combined with the slab allocator, this is the engine's primary performance lever.

API

PyTorch-style Modules

Compose typed submodules with Submodule<T>. Auto-registers with the parent on construction — you cannot forget a register call.

Data

mmap-backed Datasets

ImageDataset pages images on demand — RSS stays bounded regardless of dataset size. TabularDataset handles numeric CSVs with column-major storage.

Training

Optimizers & Schedulers

SGD, Adam, and LinearLR schedulers out of the box. Training loops handle shuffling, partial-batch skipping, and scheduler stepping automatically.

Persist

Custom .sand Format

Compact binary model files with a 4-word header, optional normalisation block, and DFS-traversal weight layout. Load in one call.

Module System

Define networks by composing.

Residual blocks. First class.

Networks are plain C++ structs inheriting from Module. Submodules auto-register — the topology is known at construction time, so the slab allocator can compute sizes before any data moves.

  • Auto-registration on construction
  • Residual connections via operator overloads
  • Topology-derived slab sizing via init_pmad_for()
  • Forward and backward in pure C++
struct ResBlock : Module {
    Submodule<Linear> fc1 { *this, 64, 64 };
    ReLU              relu1;
    Submodule<Linear> fc2 { *this, 64, 64 };
    ReLU              relu2;

    MatrixXf forward(const MatrixXf& x) override {
        return relu2.forward(
            fc2.forward(
                relu1.forward(fc1.forward(x))
            )
        ) + x; // residual skip
    }
};

// One call allocates the entire gradient slab
LetterNet net;
init_pmad_for(net);

Adam     optim(1e-3f);
LinearLR sched(optim, 150, 1e-5f);
train_module(net, sched, train, test, 150, 128);
Benchmarks

Numbers that matter.

Apple Silicon (M-series) · EIGEN_USE_BLAS · Architecture 784→64→64→26 · batch=128 · EMNIST Letters, 124 800 training samples.

BackendTotal (ms)ms / epochms / sampleSamples / sec
Eigen single-sample9 2571 8510.014867 408
Sandokan single-sample7 5401 5080.012182 757
Eigen batched6141230.00101 015 951
Sandokan batched + parallelFASTEST386770.00061 615 666

Sandokan's batched path runs at 1.6 M samples/sec — 19.5× faster than single-sample Sandokan and 1.5× faster than plain Eigen batched. Fashion MNIST shows a further 1.74 M samples/sec at 34.4 ms/epoch.

Get Started

Up and running in minutes.

01

Install via Homebrew

The quickest path. Installs Sandokan and its Eigen dependency.

02

Link in CMake

Use find_package and link the sandokan::sandokantarget. That's it.

03

Enable AMX (optional)

One CMake flag unlocks Apple Accelerate / AMX for a significant speed boost on Apple Silicon.

04

Drop in the header

#include <sandokan.h> and you have the full training API.

Full Documentation →
# 1. Install
brew install sandokan

# 2. Build your project
cmake -B build .
cmake --build build -j

# CMakeLists.txt
find_package(sandokan REQUIRED)
target_link_libraries(your_target
    PRIVATE sandokan::sandokan)

# 3. Enable AMX (Apple Silicon)
target_compile_definitions(sandokan
    INTERFACE EIGEN_USE_BLAS)
target_link_libraries(sandokan
    INTERFACE "-framework Accelerate")

# 4. In your code
#include <sandokan.h>