Statistical Training¶

Fit a cuvis-ai pipeline using StatisticalTrainer — accumulate background moments (mean, covariance, histograms) during a single pass over the data, no gradient steps.

Goal¶

Produce a saved, ready-to-run pipeline whose statistical nodes have been initialised from data. The resulting pipeline can be replayed with restore-pipeline.

Prerequisites¶

A pipeline with at least one statistical node (RX, PCA, NormalizeFromStats, …).
A datamodule that produces unlabelled training data: Cu3sDataModule with cu3s_file_path=... for one cube, or data_dir=... for a folder of cubes.
The Concepts → Training page if you want the model behind the trainer.

Recipe¶

from cuvis_ai_core.training import StatisticalTrainer
from cuvis_ai_core.pipeline.pipeline import CuvisPipeline
from cuvis_ai_dataloader.data import Cu3sDataModule

pipeline = CuvisPipeline.load_pipeline("configs/pipeline/anomaly/rx/rx_statistical.yaml")
datamodule = Cu3sDataModule(cu3s_file_path="data/Lentils/Demo_000.cu3s")

trainer = StatisticalTrainer(pipeline=pipeline, datamodule=datamodule)
trainer.fit()

pipeline.save_to_file("artifacts/rx_statistical_fitted.yaml")

What happens under the hood¶

Trainer collects every node whose execution_stages includes STATISTICAL.
For each batch, it calls statistical_initialization(batch) on every collected node.
After the pass, each node finalises its accumulated stats (covariance inversion, normalisation, etc.).
The fitted pipeline is saved as a YAML with TRAINABLE_BUFFERS populated.

Common variations¶

Inference only on the trained pipeline: skip authoring a fresh YAML — run restore-pipeline --pipeline-path artifacts/rx_statistical_fitted.yaml --cu3s-file-path ….
Statistical phase as part of two-phase training: pair with GradientTrainer — the statistical phase initialises weights for the gradient phase. See Concepts → Training.
Multi-cube training: point the same Cu3sDataModule at a directory of cubes with data_dir=... instead of cu3s_file_path=....

Concepts → Execution stages — which nodes run when.
Build Pipeline (YAML) — author the pipeline this trainer fits.
Gradient Training — the next phase if your pipeline has trainable parameters.