Configuration¶
CUVIS.AI uses Hydra for flexible, reproducible configuration management. This guide covers configuration patterns and best practices.
Configuration Structure¶
Configuration files are located in configs/ at the project root and use YAML format with Hydra's composition system. The most common pattern is to create trainrun configs that compose sample pipeline, data, training, and plugin configurations together.
Configuration Directory Structure¶
configs/
├── pipeline/ # Sample pipeline configurations
├── data/ # Sample data configurations
├── training/ # Sample training configurations
├── trainrun/ # Sample full trainruns (compose data + pipeline + training + plugins)
└── plugins/ # Plugin registry
Note: The most common pattern is to use trainrun configs (
configs/trainrun/) which compose data, pipeline, training, and plugin configs together using Hydra's composition system. You can also override any composed values directly in the trainrun config.
Understanding Configuration Types¶
CUVIS.AI uses 4 distinct configuration types that work together:
| Config Type | Location | Purpose | Used For |
|---|---|---|---|
| Pipeline | configs/pipeline/ |
Graph structure - nodes & connections | Defining processing flow |
| Data | configs/data/ |
Data loading parameters | Dataset configuration |
| Training | configs/training/ |
Training hyperparameters | Optimizer, trainer settings |
| Trainrun | configs/trainrun/ |
Orchestration - composes all configs | Running experiments |
Typical workflow: Create a trainrun config that composes pipeline, data, and training configs together using Hydra's composition system.
Configuration Hierarchy¶
Trainrun configs sit at the top level and compose the other three:
Trainrun Config (Orchestrator)
├── Composes Pipeline Config → Graph structure (nodes, connections)
├── Composes Data Config → Data loading (paths, splits, batch size)
└── Composes Training Config → Training params (optimizer, epochs)
Pipeline Configurations¶
Pipeline configs define the computational graph structure: nodes and their connections.
Location: configs/pipeline/*.yaml
Complete example (configs/pipeline/rx_statistical.yaml - simplified):
# Pipeline metadata
metadata:
name: RX_Statistical
description: "RX anomaly detection pipeline"
tags: [statistical, rx]
author: cuvis.ai
# Node definitions (THIS IS A LIST)
nodes:
- name: LentilsAnomalyDataNode
class: cuvis_ai.node.data.LentilsAnomalyDataNode
params:
normal_class_ids: [0, 1]
- name: MinMaxNormalizer
class: cuvis_ai.node.normalization.MinMaxNormalizer
params:
eps: 1.0e-06
use_running_stats: true
- name: RXGlobal
class: cuvis_ai.anomaly.rx_detector.RXGlobal
params:
eps: 1.0e-06
num_channels: 61
# Connection definitions (REQUIRED)
connections:
- from: LentilsAnomalyDataNode.outputs.cube
to: MinMaxNormalizer.inputs.data
- from: MinMaxNormalizer.outputs.normalized
to: RXGlobal.inputs.data
Key structure elements:
metadata: Pipeline identification and documentationnodes: LIST of node definitions (each withname,class,params)connections: Data flow between nodes (port-to-port connections)
Critical Syntax
- Nodes use
class(NOT_target_) - Nodes are a list (with
-prefix), NOT a dictionary - Must include
connectionssection
Data Configurations¶
Data configs define data loading, dataset paths, and preprocessing parameters.
Location: configs/data/*.yaml
Complete example (configs/data/lentils.yaml):
# Data paths
cu3s_file_path: data/Lentils/Lentils_000.cu3s
annotation_json_path: data/Lentils/Lentils_000.json
# Train/val/test splits
train_ids: [0, 2, 3]
val_ids: [1, 5]
test_ids: [1, 5]
# Data loading parameters
batch_size: 2
processing_mode: Reflectance
Key fields:
cu3s_file_path: Path to hyperspectral data fileannotation_json_path: Path to annotation filetrain_ids,val_ids,test_ids: Sample split assignmentsbatch_size: Batch size for data loadingprocessing_mode: Data processing mode (e.g., Reflectance, Radiance)
Note
Data configs use simple key-value structure with no _target_ directives. Values are passed directly to SingleCu3sDataModule(**cfg.data).
Training Configurations¶
Training configs define training hyperparameters: trainer settings, optimizer, scheduler, and callbacks.
Location: configs/training/*.yaml
Complete example (configs/training/default.yaml):
# Random seed for reproducibility
seed: 42
# PyTorch Lightning trainer configuration
trainer:
max_epochs: 5
accelerator: "auto"
devices: 1
precision: "32-true"
log_every_n_steps: 10
val_check_interval: 1.0
enable_checkpointing: true
gradient_clip_val: 1.0
# Optimizer configuration
optimizer:
name: "adamw"
lr: 0.001
weight_decay: 0.01
betas: [0.9, 0.999]
Key sections:
seed: Random seed for reproducibilitytrainer: PyTorch Lightning Trainer parameters (epochs, accelerator, precision, etc.)optimizer: Optimizer type and hyperparametersscheduler: Optional learning rate scheduler configuration (see full examples)
Trainrun Configurations¶
Trainrun configs compose pipeline, data, and training configs together and add orchestration settings.
Location: configs/trainrun/*.yaml
Complete example (configs/trainrun/rx_statistical.yaml):
# Hydra package directive - merge into global namespace
# @package _global_
# Compose other configs using Hydra defaults
defaults:
- /pipeline@pipeline: rx_statistical # Inject configs/pipeline/rx_statistical.yaml into .pipeline
- /data@data: lentils # Inject configs/data/lentils.yaml into .data
- /training@training: default # Inject configs/training/default.yaml into .training
- _self_ # Allow overrides from this file
# Experiment identification
name: rx_statistical
output_dir: outputs\rx_statistical
# Override composed values
training:
trainer:
max_epochs: 10 # Override default.yaml's max_epochs: 5
# Orchestration settings (trainrun-specific)
unfreeze_nodes: [] # Nodes to unfreeze for gradient training
freeze_nodes: [] # Nodes to freeze during training
loss_nodes: [] # Nodes that compute loss
metric_nodes: # Nodes that compute metrics
- metrics_anomaly
Key elements:
@package _global_: Hydra directive to merge into global namespacedefaults: Compose other configs with package directive syntax (/config_type@target_key: config_name)- Overrides: Any section from composed configs can be overridden
- Orchestration fields:
unfreeze_nodes: Nodes to enable gradients for in phase 2freeze_nodes: Nodes to freeze during trainingloss_nodes: Nodes that compute training lossmetric_nodes: Nodes that compute metrics
Package Directive Syntax
The syntax /source@destination: config_name tells Hydra to load configs/source/config_name.yaml and inject it at the destination key in the merged config. For example, /pipeline@pipeline: rx_statistical loads configs/pipeline/rx_statistical.yaml and places it at cfg.pipeline.
How Configuration Composition Works¶
Hydra Composition with Package Directives¶
When you run a trainrun config, Hydra merges all composed configs:
Step 1: Load trainrun config
# trainrun/my_experiment.yaml
defaults:
- /pipeline@pipeline: rx_statistical
- /data@data: lentils
- /training@training: default
- _self_
Step 2: Hydra loads and injects configs
The resulting merged config structure:
cfg = {
"pipeline": { # From /pipeline@pipeline: rx_statistical
"metadata": {...},
"nodes": [...],
"connections": [...]
},
"data": { # From /data@data: lentils
"cu3s_file_path": "...",
"batch_size": 2,
...
},
"training": { # From /training@training: default
"seed": 42,
"trainer": {...},
"optimizer": {...}
},
# Plus trainrun-specific fields
"name": "my_experiment",
"unfreeze_nodes": [],
"metric_nodes": [...]
}
Step 3: Apply overrides
Trainrun config can override any composed value:
# In trainrun config
training:
trainer:
max_epochs: 20 # Overrides training/default.yaml's max_epochs: 5
data:
batch_size: 8 # Overrides data/lentils.yaml's batch_size: 2
Override Precedence¶
Overrides are applied in this order (later overrides earlier):
- Base composed configs (pipeline, data, training)
- Trainrun config overrides (because
_self_comes after defaults) - CLI overrides (highest priority)
Example:
# CLI overrides beat everything
uv run python examples/channel_selector.py training.trainer.max_epochs=50
Overrides for non-existing fields
If you try to override a field that doesn't exist in the composed config, you'll get an error:
uv run python examples/rx_statistical.py training.trainer.max_epochs=50
# Error: Key 'training' is not in struct
# full_key: training
# object_type=dict
This happens because rx_statistical.py uses trainrun/default_statistical, which has no training section (it only uses statistical training, not gradient-based training). To override training parameters, use a trainrun config that includes training defaults, such as trainrun/rx_statistical.yaml.
Accessing Composed Configs in Python¶
@hydra.main(config_path="../configs/", config_name="trainrun/my_experiment")
def main(cfg: DictConfig):
# Access composed configs
datamodule = SingleCu3sDataModule(**cfg.data) # From data config
pipeline = Pipeline.from_config(cfg.pipeline) # From pipeline config
trainer = Trainer(**cfg.training.trainer) # From training config
# Access trainrun-specific fields
metric_nodes = cfg.metric_nodes
Relationship to Protobuf Schemas¶
CUVIS.AI configurations are validated and transported using protocol buffers defined in the cuvis-ai-schemas repository.
Schema location: cuvis-ai-schemas/proto/cuvis_ai/grpc/v1/cuvis_ai.proto
Configuration Message Types¶
The protobuf schemas define 6 gRPC message types for config transport:
PipelineConfig- Pipeline structure (nodes + connections)DataConfig- Data loading configurationTrainingConfig- Training hyperparametersTrainRunConfig- Composed experiment configurationOptimizerConfig- Optimizer parametersSchedulerConfig- Learning rate scheduler
Configuration Flow: YAML → Protobuf → gRPC¶
All config messages use a single bytes config_bytes field that contains JSON-serialized data (not protobuf serialization):
flowchart LR
A[YAML<br/>Config] -->|Hydra<br/>compose| B[DictConfig<br/>merged]
B -->|Pydantic<br/>validate| C[JSON<br/>bytes]
C -->|gRPC<br/>transport| D[Server<br/>Process]
style A fill:#e1f5ff
style B fill:#fff4e1
style C fill:#e8f5e8
style D fill:#f5e1ff
Protobuf ↔ YAML Mapping¶
| Protobuf Message | YAML Config Files | Validation |
|---|---|---|
PipelineConfig |
configs/pipeline/*.yaml |
Node classes must be importable |
DataConfig |
configs/data/*.yaml |
Paths must exist, splits valid |
TrainingConfig |
configs/training/*.yaml |
Optimizer/scheduler names supported |
TrainRunConfig |
configs/trainrun/*.yaml |
All nested configs must validate |
Validation Stages¶
Configuration validation happens at multiple stages:
- YAML Syntax: YAML parser validates syntax
- Hydra Composition: Checks for missing defaults, circular references
- Pydantic Validation: Type checking, field constraints, nested model validation
- Application Domain: Node existence, valid class paths, port compatibility
JSON Transport Design
While protobuf is used for the gRPC message structure, the actual configuration data is JSON-serialized, not protobuf-serialized. This provides flexibility to work with dynamic configuration structures while maintaining type safety through Pydantic validation.
Configuration Files¶
Available Configs¶
| Directory | Purpose | Examples |
|---|---|---|
configs/pipeline/ |
Sample pipeline configurations (node definitions & connections) | rx_statistical.yaml, adaclip_baseline.yaml, deep_svdd.yaml |
configs/data/ |
Sample data configurations (data loading, splitting, paths) | lentils.yaml |
configs/training/ |
Sample training configurations (trainer, optimizer, scheduler settings) | default.yaml |
configs/trainrun/ |
Sample full trainruns (compose data + pipeline + training + plugins) | rx_statistical.yaml, concrete_adaclip.yaml |
configs/plugins/ |
Plugin registry and manifests | registry.yaml |
See Also
For detailed examples of each config type, see the sections above on Pipeline Configurations, Data Configurations, Training Configurations, and Trainrun Configurations.
Using Hydra in Scripts¶
Scripts use the @hydra.main decorator to load trainrun configs:
from omegaconf import DictConfig
import hydra
@hydra.main(config_path="../configs/", config_name="trainrun/default_gradient", version_base=None)
def main(cfg: DictConfig) -> None:
"""Access composed configs via cfg.data, cfg.pipeline, cfg.training."""
# Access data config (from configs/data/lentils.yaml)
datamodule = SingleCu3sDataModule(**cfg.data)
# Access training config (from configs/training/default.yaml)
trainer_config = cfg.training.trainer
# Access pipeline config if defined
# pipeline_config = cfg.pipeline
if __name__ == "__main__":
main()
Example from examples/rx_statistical.py:
@hydra.main(config_path="../configs/", config_name="trainrun/default_statistical", version_base=None)
def main(cfg: DictConfig) -> None:
# cfg.data comes from the composed data config
datamodule = SingleCu3sDataModule(**cfg.data) # Line 107
# cfg.output_dir comes from the trainrun config
output_dir = Path(cfg.output_dir) # Line 104
CLI Overrides¶
Override any configuration parameter from the command line using Hydra's override syntax:
# Override data parameters (works with statistical-only configs)
uv run python examples/rx_statistical.py data.batch_size=8
# Override training parameters (requires gradient training config)
uv run python examples/channel_selector.py training.trainer.max_epochs=20
# Override multiple parameters
uv run python examples/channel_selector.py \
training.trainer.max_epochs=20 \
training.optimizer.lr=0.0001 \
data.batch_size=8
Pattern applies to any Hydra script
Replace examples/rx_statistical.py with your actual script path. Any Python script using @hydra.main supports CLI overrides.
Environment Variables¶
Use environment variables in configs:
# Data configuration
data:
cu3s_file_path: ${oc.env:DATA_ROOT,./data/Lentils}/Lentils_000.cu3s # $DATA_ROOT or default
# Training configuration (if using WandB logger)
training:
wandb_api_key: ${oc.env:WANDB_API_KEY} # Required env var
TrainingConfig¶
The TrainingConfig dataclass wraps all training parameters:
from cuvis_ai_core.training.config import TrainingConfig, TrainerConfig, OptimizerConfig
config = TrainingConfig(
seed=42,
trainer=TrainerConfig(
max_epochs=10,
accelerator="gpu",
devices=1,
precision="16-mixed",
log_every_n_steps=10,
),
optimizer=OptimizerConfig(
name="adam",
lr=0.001,
weight_decay=0.0,
betas=(0.9, 0.999),
))
Trainer Parameters¶
Based on TrainerConfig schema from cuvis-ai-schemas:
| Parameter | Type | Default | Description |
|---|---|---|---|
max_epochs |
int | 100 | Maximum number of epochs (1-10000) |
accelerator |
str | "auto" | Accelerator type: "auto", "cpu", "gpu", "cuda" |
devices |
int|str|None | None | Number of devices or device IDs |
precision |
str|int | "32-true" | Precision: "32-true", "16-mixed", "bf16-mixed" |
log_every_n_steps |
int | 50 | Logging frequency (steps) |
val_check_interval |
float|int | 1.0 | Validation check interval |
check_val_every_n_epoch |
int | 1 | Validate every n epochs |
gradient_clip_val |
float|None | None | Gradient clipping threshold |
accumulate_grad_batches |
int | 1 | Accumulate gradients over n batches |
enable_progress_bar |
bool | True | Show progress bar |
enable_checkpointing |
bool | False | Enable model checkpointing |
deterministic |
bool | False | Deterministic mode |
benchmark |
bool | False | Enable cudnn benchmark |
Optimizer Parameters¶
Based on OptimizerConfig schema from cuvis-ai-schemas:
| Parameter | Type | Default | Description |
|---|---|---|---|
name |
str | "adamw" | Optimizer: "adamw", "adam", "sgd" |
lr |
float | 0.001 | Learning rate (1e-6 to 1.0) |
weight_decay |
float | 0.0 | L2 regularization (0.0 to 1.0) |
momentum |
float|None | 0.9 | Momentum for SGD (0.0 to 1.0) |
betas |
tuple|None | None | Adam betas (beta1, beta2), e.g., [0.9, 0.999] |
Scheduler Parameters¶
Based on SchedulerConfig schema from cuvis-ai-schemas:
| Parameter | Type | Default | Description |
|---|---|---|---|
name |
str|None | None | Scheduler: "cosine", "step", "exponential", "reduce_on_plateau" |
warmup_epochs |
int | 0 | Warmup epochs |
min_lr |
float | 1e-6 | Minimum learning rate |
monitor |
str|None | None | Metric to monitor (for plateau scheduler) |
mode |
str | "min" | Monitor mode: "min" or "max" |
factor |
float | 0.1 | LR reduction factor |
patience |
int | 10 | Patience epochs for plateau |
step_size |
int|None | None | LR decay period (for step scheduler) |
gamma |
float|None | None | LR decay multiplier |
Training Config Parameters¶
Based on TrainingConfig schema from cuvis-ai-schemas:
| Parameter | Type | Default | Description |
|---|---|---|---|
seed |
int | 42 | Random seed (≥0) |
max_epochs |
int | 100 | Maximum epochs (1-10000) |
batch_size |
int | 32 | Batch size (≥1) |
num_workers |
int | 4 | Data loading workers (≥0) |
gradient_clip_val |
float|None | None | Gradient clipping (≥0.0) |
accumulate_grad_batches |
int | 1 | Gradient accumulation batches |
Configuration Recipes¶
Development (Fast Iteration)¶
# Training configuration
training:
seed: 42
batch_size: 2
num_workers: 0
trainer:
max_epochs: 2
accelerator: cpu
devices: 1
enable_progress_bar: true
# Data configuration
data:
batch_size: 2
cu3s_file_path: data/Lentils/Lentils_000.cu3s
annotation_json_path: data/Lentils/Lentils_000.json
train_ids: [0]
val_ids: [1]
test_ids: [1]
Production (Full Training)¶
# Training configuration
training:
seed: 42
batch_size: 16
num_workers: 4
trainer:
max_epochs: 50
accelerator: gpu
devices: 1
precision: "16-mixed"
log_every_n_steps: 10
optimizer:
name: adamw
lr: 0.001
weight_decay: 0.01
# Data configuration
data:
batch_size: 16
cu3s_file_path: data/Lentils/Lentils_000.cu3s
annotation_json_path: data/Lentils/Lentils_000.json
train_ids: [0, 2, 3]
val_ids: [1, 5]
test_ids: [1, 5]
Multi-GPU Training¶
Configuration Validation¶
Hydra validates configurations at runtime:
from omegaconf import OmegaConf
from cuvis_ai_core.training.config import TrainingConfig
# Load and validate
cfg = OmegaConf.load("config.yaml")
training_cfg = TrainingConfig.from_dict_config(cfg.training)
Best Practices¶
1. Use Composition¶
Break large configs into reusable pieces:
# base.yaml
defaults:
- general
- monitoring/wandb
# experiment.yaml
defaults:
- base
- _self_
training:
trainer:
max_epochs: 100
2. Version Control Configs¶
Commit configuration files for reproducibility:
3. Use Structured Configs¶
Define configs as dataclasses for type safety:
from dataclasses import dataclass
from hydra.core.config_store import ConfigStore
@dataclass
class ModelConfig:
hidden_size: int = 128
num_layers: int = 3
cs = ConfigStore.instance()
cs.store(name="model_config", node=ModelConfig)
4. Document Custom Configs¶
Add comments explaining parameters:
nodes:
pca:
n_components: 3 # Number of principal components to retain
trainable: true # Enable gradient-based fine-tuning
Troubleshooting¶
Config Not Found¶
Solution: Ensure config file is in correct directory and path is specified correctly.
Override Parse Error¶
Solution: Use = not : for overrides: training.trainer.max_epochs=10
Type Mismatch¶
Solution: Ensure correct type in override: max_epochs=10 not max_epochs="10"
Next Steps¶
- Quickstart: See configuration in action
- Tutorials: Phase-specific configurations
- API Reference: TrainingConfig API details