spine.io.dataset.HDF5Dataset

class spine.io.dataset.HDF5Dataset(dtype: str | None = None, staged: bool = False, stage: str | None = None, schema: Mapping[str, Mapping[str, Any]] | None = None, keys: Sequence[str] | None = None, skip_keys: Sequence[str] | None = None, data_types: Mapping[str, str] | None = None, overlay_methods: Mapping[str, str] | None = None, augment: Mapping[str, Any] | None = None, **kwargs: Any)[source]

Torch dataset wrapper around flat or staged HDF5 readers.

The dataset can operate in two modes:

In both cases the dataset exposes a uniform parser-driven interface to the DataLoader layer. Reader-produced metadata such as entry indexes and source provenance are forwarded automatically alongside any parsed products.

Attributes:
data_keys

Return the names of all data products exposed by the dataset.

data_types

Return the collate type for each exposed HDF5 product.

overlay_methods

Return the overlay method for each exposed HDF5 product.

Methods

apply_augmenter(data)

Apply the configured augmenter, if present.

build_augmenter(augment)

Instantiate the configured augmenter, if any.

index_data_types()

Return the standard collate types for metadata keys.

index_overlay_methods()

Return the standard overlay methods for metadata keys.

metadata_dict(data)

Extract standard dataset metadata from one reader output.

__init__(dtype: str | None = None, staged: bool = False, stage: str | None = None, schema: Mapping[str, Mapping[str, Any]] | None = None, keys: Sequence[str] | None = None, skip_keys: Sequence[str] | None = None, data_types: Mapping[str, str] | None = None, overlay_methods: Mapping[str, str] | None = None, augment: Mapping[str, Any] | None = None, **kwargs: Any) None[source]

Instantiate the HDF5-backed dataset.

Parameters:
  • dtype (str, optional) – Floating-point dtype forwarded to parser factories

  • staged (bool, default False) – If True, use StageHDF5Reader as the backend instead of the flat HDF5Reader

  • stage (str, optional) – Default stage name to read when staged=True. Individual schema entries may override this with their own stage field.

  • schema (mapping, optional) – Parser schema used to reconstruct higher-level products

  • keys (sequence[str], optional) – Explicit list of raw HDF5 products to keep

  • skip_keys (sequence[str], optional) – Explicit list of raw HDF5 products to drop

  • data_types (mapping, optional) – Explicit collate type overrides for raw-product mode

  • overlay_methods (mapping, optional) – Explicit overlay-method overrides for raw-product mode

  • augment (mapping, optional) – Augmentation applied to each loaded sample

  • **kwargs (Any) – Reader-specific keyword arguments forwarded to the selected HDF5 backend reader

Methods

__init__([dtype, staged, stage, schema, ...])

Instantiate the HDF5-backed dataset.

apply_augmenter(data)

Apply the configured augmenter, if present.

build_augmenter(augment)

Instantiate the configured augmenter, if any.

index_data_types()

Return the standard collate types for metadata keys.

index_overlay_methods()

Return the standard overlay methods for metadata keys.

metadata_dict(data)

Extract standard dataset metadata from one reader output.

Attributes

data_keys

Return the names of all data products exposed by the dataset.

data_types

Return the collate type for each exposed HDF5 product.

name

overlay_methods

Return the overlay method for each exposed HDF5 product.

parsers

reader

augmenter

name: ClassVar[str] = 'hdf5'
parsers: dict[str, Any]
reader: HDF5Reader | StageHDF5Reader
property data_types: dict[str, str]

Return the collate type for each exposed HDF5 product.

Returns:

Mapping from dataset output key to collate type.

Return type:

dict[str, str]

property overlay_methods: dict[str, str]

Return the overlay method for each exposed HDF5 product.

Returns:

Mapping from dataset output key to overlay strategy.

Return type:

dict[str, str]

property data_keys: tuple[str, ...]

Return the names of all data products exposed by the dataset.

Returns:

Ordered tuple of metadata and parser-product keys.

Return type:

tuple[str, …]