Statistical Training¶
Fit a cuvis-ai pipeline using StatisticalTrainer — accumulate
background moments (mean, covariance, histograms) during a single
pass over the data, no gradient steps.
Goal¶
Produce a saved, ready-to-run pipeline whose statistical nodes have
been initialised from data. The resulting pipeline can be replayed
with restore-pipeline.
Prerequisites¶
- A pipeline with at least one statistical node (RX, PCA, NormalizeFromStats, …).
- A datamodule that produces unlabelled training data (typically
SingleCu3sDataModuleorMultiCu3sDataModule). - The Concepts → Training page if you want the model behind the trainer.
Recipe¶
from cuvis_ai_core.trainer import StatisticalTrainer
from cuvis_ai_core.pipeline import Pipeline
from cuvis_ai_core.data.datamodule import SingleCu3sDataModule
pipeline = Pipeline.from_yaml("configs/pipeline/anomaly/rx/rx_statistical.yaml")
datamodule = SingleCu3sDataModule(cu3s_file_path="data/Lentils/Demo_000.cu3s")
trainer = StatisticalTrainer()
trainer.fit(pipeline=pipeline, datamodule=datamodule)
pipeline.save("artifacts/rx_statistical_fitted.yaml")
What happens under the hood¶
- Trainer collects every node whose
execution_stagesincludesSTATISTICAL. - For each batch, it calls
statistical_initialization(batch)on every collected node. - After the pass, each node finalises its accumulated stats (covariance inversion, normalisation, etc.).
- The fitted pipeline is saved as a YAML with
TRAINABLE_BUFFERSpopulated.
Common variations¶
- Inference only on the trained pipeline: skip authoring a fresh YAML — run
restore-pipeline --pipeline-path artifacts/rx_statistical_fitted.yaml --cu3s-file-path …. - Statistical phase as part of two-phase training: pair with
GradientTrainer— the statistical phase initialises weights for the gradient phase. See Concepts → Training. - Multi-cube training: swap
SingleCu3sDataModuleforMultiCu3sDataModuleand point it at a directory.
Related¶
- Concepts → Execution stages — which nodes run when.
- Build Pipeline (YAML) — author the pipeline this trainer fits.
- Gradient Training — the next phase if your pipeline has trainable parameters.