spine.io.factories

Functions that instantiate IO tools from configuration blocks.

Functions

`collate_factory`(collate_cfg, data_types, ...)	Instantiate a collate function from configuration.
`dataset_factory`(dataset_cfg[, entry_list, dtype])	Instantiate a dataset from configuration.
`loader_factory`(dataset, dtype[, batch_size, ...])	Instantiate a PyTorch `DataLoader` from configuration.
`reader_factory`(reader_cfg)	Instantiate a reader from a configuration block.
`sampler_factory`(sampler_cfg, dataset, ...[, ...])	Instantiate a sampler from configuration.
`writer_factory`(writer_cfg[, prefix, split])	Instantiate a writer from a configuration block.

spine.io.factories.reader_factory(reader_cfg: Mapping[str, Any] | str) → Any[source]

Instantiate a reader from a configuration block.

The configured name must match a reader class exported from spine.io.read.

Parameters:: reader_cfg (Mapping[str, Any] or str) – Reader configuration mapping or the short reader name.
Returns:: Instantiated reader object.
Return type:: object

spine.io.factories.writer_factory(writer_cfg: Mapping[str, Any] | str, prefix: str | list[str] | None = None, split: bool = False) → Any[source]

Instantiate a writer from a configuration block.

The configured name must match a writer class exported from spine.io.write.

Parameters:

writer_cfg (Mapping[str, Any] or str) – Writer configuration mapping or the short writer name.
prefix (str or list[str], optional) – Input file prefix or per-file list of prefixes used to derive output names when the writer supports prefix-based naming.
split (bool, default False) – Request one output file per input file. Writers that do not support unsplit output may reject split=False explicitly.

Returns:

Instantiated writer object.

Return type:

object

spine.io.factories.loader_factory(dataset: Mapping[str, Any] | str, dtype: str, batch_size: int | None = None, minibatch_size: int | None = None, shuffle: bool = True, sampler: Mapping[str, Any] | str | None = None, num_workers: int = 0, collate_fn: Mapping[str, Any] | str | None = None, entry_list: list[int] | None = None, distributed: bool = False, world_size: int = 0, rank: int | None = None, **kwargs: Any) → Any[source]

Instantiate a PyTorch DataLoader from configuration.

Parameters:

dataset (mapping or str) – Dataset configuration mapping or short dataset name.
dtype (str) – Floating-point dtype passed to the dataset factory.
batch_size (int, optional) – Global batch size. Mutually exclusive with minibatch_size.
minibatch_size (int, optional) – Per-process batch size. Mutually exclusive with batch_size.
shuffle (bool, default True) – Whether to shuffle batches in the underlying loader.
sampler (mapping or str, optional) – Sampler configuration mapping or short sampler name.
num_workers (int, default 0) – Number of loader worker processes.
collate_fn (mapping or str, optional) – Collate function configuration mapping or short collate name.
entry_list (list[int], optional) – Explicit subset of dataset entries to expose.
distributed (bool, default False) – If True, wrap the sampler for distributed loading.
world_size (int, default 0) – Number of distributed processes/devices.
rank (int, optional) – Distributed process rank. Required when distributed=True.
**kwargs (dict) – Extra keyword arguments forwarded to torch.utils.data.DataLoader.

Returns:

Instantiated data loader.

Return type:

torch.utils.data.DataLoader

spine.io.factories.dataset_factory(dataset_cfg: Mapping[str, Any] | str, entry_list: list[int] | None = None, dtype: str | None = None) → Any[source]

Instantiate a dataset from configuration.

Parameters:

dataset_cfg (Mapping[str, Any] or str) – Dataset configuration mapping or short dataset name.
entry_list (list[int], optional) – Explicit subset of dataset entries to expose. When provided here, it overrides any entry_list already present in dataset_cfg.
dtype (str, optional) – Floating-point dtype forwarded to the dataset constructor.

Returns:

Instantiated dataset object.

Return type:

object

spine.io.factories.sampler_factory(sampler_cfg: Mapping[str, Any] | str, dataset: Any, minibatch_size: int, distributed: bool = False, num_replicas: int = 1, rank: int | None = None) → Any[source]

Instantiate a sampler from configuration.

Parameters:

sampler_cfg (mapping or str) – Sampler configuration mapping or short sampler name.
dataset (object) – Dataset instance used to initialize the sampler.
minibatch_size (int) – Per-process batch size passed to the sampler.
distributed (bool, default False) – If True, wrap the sampler in DistributedProxySampler.
num_replicas (int, default 1) – Number of distributed processes/devices.
rank (int, optional) – Distributed process rank. Required when distributed=True.

Returns:

Instantiated sampler object, optionally wrapped for distributed loading.

Return type:

object

spine.io.factories.collate_factory(collate_cfg: Mapping[str, Any] | str, data_types: Mapping[str, str], overlay_methods: Mapping[str, str]) → Any[source]

Instantiate a collate function from configuration.

Parameters:

collate_cfg (Mapping[str, Any] or str) – Collate configuration mapping or short collate function name.
data_types (Mapping[str, str]) – Mapping from parser output keys to their declared data type.
overlay_methods (Mapping[str, str]) – Mapping from parser output keys to the overlay method used when combining data from multiple sources.

Returns:

Instantiated collate callable.

Return type:

collections.abc.Callable