spine.io.factories

Functions that instantiate IO tools from configuration blocks.

Functions

collate_factory(collate_cfg, data_types, ...)

Instantiate a collate function from configuration.

dataset_factory(dataset_cfg[, entry_list, dtype])

Instantiate a dataset from configuration.

loader_factory(dataset, dtype[, batch_size, ...])

Instantiate a PyTorch DataLoader from configuration.

reader_factory(reader_cfg)

Instantiate a reader from a configuration block.

sampler_factory(sampler_cfg, dataset, ...[, ...])

Instantiate a sampler from configuration.

writer_factory(writer_cfg[, prefix, split])

Instantiate a writer from a configuration block.

spine.io.factories.reader_factory(reader_cfg: Mapping[str, Any] | str) Any[source]

Instantiate a reader from a configuration block.

The configured name must match a reader class exported from spine.io.read.

Parameters:

reader_cfg (Mapping[str, Any] or str) – Reader configuration mapping or the short reader name.

Returns:

Instantiated reader object.

Return type:

object

spine.io.factories.writer_factory(writer_cfg: Mapping[str, Any] | str, prefix: str | list[str] | None = None, split: bool = False) Any[source]

Instantiate a writer from a configuration block.

The configured name must match a writer class exported from spine.io.write.

Parameters:
  • writer_cfg (Mapping[str, Any] or str) – Writer configuration mapping or the short writer name.

  • prefix (str or list[str], optional) – Input file prefix or per-file list of prefixes used to derive output names when the writer supports prefix-based naming.

  • split (bool, default False) – Request one output file per input file. Writers that do not support unsplit output may reject split=False explicitly.

Returns:

Instantiated writer object.

Return type:

object

spine.io.factories.loader_factory(dataset: Mapping[str, Any] | str, dtype: str, batch_size: int | None = None, minibatch_size: int | None = None, shuffle: bool = True, sampler: Mapping[str, Any] | str | None = None, num_workers: int = 0, collate_fn: Mapping[str, Any] | str | None = None, entry_list: list[int] | None = None, distributed: bool = False, world_size: int = 0, rank: int | None = None, **kwargs: Any) Any[source]

Instantiate a PyTorch DataLoader from configuration.

Parameters:
  • dataset (mapping or str) – Dataset configuration mapping or short dataset name.

  • dtype (str) – Floating-point dtype passed to the dataset factory.

  • batch_size (int, optional) – Global batch size. Mutually exclusive with minibatch_size.

  • minibatch_size (int, optional) – Per-process batch size. Mutually exclusive with batch_size.

  • shuffle (bool, default True) – Whether to shuffle batches in the underlying loader.

  • sampler (mapping or str, optional) – Sampler configuration mapping or short sampler name.

  • num_workers (int, default 0) – Number of loader worker processes.

  • collate_fn (mapping or str, optional) – Collate function configuration mapping or short collate name.

  • entry_list (list[int], optional) – Explicit subset of dataset entries to expose.

  • distributed (bool, default False) – If True, wrap the sampler for distributed loading.

  • world_size (int, default 0) – Number of distributed processes/devices.

  • rank (int, optional) – Distributed process rank. Required when distributed=True.

  • **kwargs (dict) – Extra keyword arguments forwarded to torch.utils.data.DataLoader.

Returns:

Instantiated data loader.

Return type:

torch.utils.data.DataLoader

spine.io.factories.dataset_factory(dataset_cfg: Mapping[str, Any] | str, entry_list: list[int] | None = None, dtype: str | None = None) Any[source]

Instantiate a dataset from configuration.

Parameters:
  • dataset_cfg (Mapping[str, Any] or str) – Dataset configuration mapping or short dataset name.

  • entry_list (list[int], optional) – Explicit subset of dataset entries to expose. When provided here, it overrides any entry_list already present in dataset_cfg.

  • dtype (str, optional) – Floating-point dtype forwarded to the dataset constructor.

Returns:

Instantiated dataset object.

Return type:

object

spine.io.factories.sampler_factory(sampler_cfg: Mapping[str, Any] | str, dataset: Any, minibatch_size: int, distributed: bool = False, num_replicas: int = 1, rank: int | None = None) Any[source]

Instantiate a sampler from configuration.

Parameters:
  • sampler_cfg (mapping or str) – Sampler configuration mapping or short sampler name.

  • dataset (object) – Dataset instance used to initialize the sampler.

  • minibatch_size (int) – Per-process batch size passed to the sampler.

  • distributed (bool, default False) – If True, wrap the sampler in DistributedProxySampler.

  • num_replicas (int, default 1) – Number of distributed processes/devices.

  • rank (int, optional) – Distributed process rank. Required when distributed=True.

Returns:

Instantiated sampler object, optionally wrapped for distributed loading.

Return type:

object

spine.io.factories.collate_factory(collate_cfg: Mapping[str, Any] | str, data_types: Mapping[str, str], overlay_methods: Mapping[str, str]) Any[source]

Instantiate a collate function from configuration.

Parameters:
  • collate_cfg (Mapping[str, Any] or str) – Collate configuration mapping or short collate function name.

  • data_types (Mapping[str, str]) – Mapping from parser output keys to their declared data type.

  • overlay_methods (Mapping[str, str]) – Mapping from parser output keys to the overlay method used when combining data from multiple sources.

Returns:

Instantiated collate callable.

Return type:

collections.abc.Callable