spine.io.dataset.MixedDataset

class spine.io.dataset.MixedDataset(larcv: Mapping[str, Any], hdf5: Mapping[str, Any], dtype: str, augment: Mapping[str, Any] | None = None, align_keys: Sequence[str] = ('file_index', 'file_entry_index'), hdf5_align_keys: Mapping[str, str] | None = None, hdf5_key_map: Mapping[str, str] | None = None, allow_overwrite: bool = False, **kwargs: Any)[source]

Torch dataset that merges aligned samples from LArCV and HDF5.

The LArCV dataset is treated as the primary source of iteration order and truth products. The HDF5 dataset acts as an aligned cache or augmentation source whose products are merged into the primary sample only after metadata and source provenance checks pass.

Attributes:

data_keys: Return the names of all merged data products.
data_types: Return the collate type for each merged product.
overlay_methods: Return the overlay method for each merged product.

Methods

`apply_augmenter`(data)	Apply the configured augmenter, if present.
`build_augmenter`(augment)	Instantiate the configured augmenter, if any.
`index_data_types`()	Return the standard collate types for metadata keys.
`index_overlay_methods`()	Return the standard overlay methods for metadata keys.
`merge_cache`(merged, cache)	Merge one cached HDF5 sample into an existing LArCV sample.
`metadata_dict`(data)	Extract standard dataset metadata from one reader output.
`resolve_cache_align_key`(key, cache)	Return the HDF5 key used to align one LArCV index field.
`validate_alignment`(idx, primary, cache)	Ensure the configured alignment keys match between both sources.
`validate_source_alignment`(idx, primary, cache)	Validate cache-file provenance against the current LArCV source file.

__init__(larcv: Mapping[str, Any], hdf5: Mapping[str, Any], dtype: str, augment: Mapping[str, Any] | None = None, align_keys: Sequence[str] = ('file_index', 'file_entry_index'), hdf5_align_keys: Mapping[str, str] | None = None, hdf5_key_map: Mapping[str, str] | None = None, allow_overwrite: bool = False, **kwargs: Any) → None[source]

Instantiate the mixed dataset.

Parameters:

larcv (dict) – Configuration block for the LArCV-backed sample source
hdf5 (dict) – Configuration block for the HDF5-backed cache source
dtype (str) – Floating-point dtype used by parser factories
augment (dict, optional) – Augmentation configuration applied once to the merged sample
align_keys (sequence[str], default ("file_index", "file_entry_index")) – Keys that must match between the LArCV and HDF5 samples
hdf5_align_keys (dict, optional) – Optional mapping from LArCV alignment keys to HDF5 alignment keys. If not provided, the dataset uses source_<key> when that key is present in the HDF5 sample, and otherwise falls back to <key>.
hdf5_key_map (dict, optional) – Optional rename map applied to HDF5 product keys before merging
allow_overwrite (bool, default False) – If True, allow HDF5 products to overwrite colliding LArCV keys
**kwargs (Any) – Shared keyword arguments forwarded to both underlying dataset constructors. This is primarily used for reader-level options such as entry-list filtering that must remain aligned across sources.

Methods

`__init__`(larcv, hdf5, dtype[, augment, ...])	Instantiate the mixed dataset.
`apply_augmenter`(data)	Apply the configured augmenter, if present.
`build_augmenter`(augment)	Instantiate the configured augmenter, if any.
`index_data_types`()	Return the standard collate types for metadata keys.
`index_overlay_methods`()	Return the standard overlay methods for metadata keys.
`merge_cache`(merged, cache)	Merge one cached HDF5 sample into an existing LArCV sample.
`metadata_dict`(data)	Extract standard dataset metadata from one reader output.
`resolve_cache_align_key`(key, cache)	Return the HDF5 key used to align one LArCV index field.
`validate_alignment`(idx, primary, cache)	Ensure the configured alignment keys match between both sources.
`validate_source_alignment`(idx, primary, cache)	Validate cache-file provenance against the current LArCV source file.

Attributes

`data_keys`	Return the names of all merged data products.
`data_types`	Return the collate type for each merged product.
`name`
`overlay_methods`	Return the overlay method for each merged product.
`primary`
`cache`
`reader`
`augmenter`

name: ClassVar[str] = 'mixed'

primary: LArCVDataset

cache: HDF5Dataset

reader: Any

validate_alignment(idx: int, primary: dict[str, Any], cache: dict[str, Any]) → None[source]

Ensure the configured alignment keys match between both sources.

Parameters:

idx (int) – Dataset entry index being validated.
primary (dict) – Sample returned by the primary LArCV dataset.
cache (dict) – Sample returned by the HDF5 cache dataset.

validate_source_alignment(idx: int, primary: dict[str, Any], cache: dict[str, Any]) → None[source]

Validate cache-file provenance against the current LArCV source file.

This check is only applied when the HDF5 sample exposes staged-cache provenance keys. In that case the cache is expected to correspond to exactly one original source file, identified by file name, file size, and modification time.

Parameters:

idx (int) – Dataset entry index being validated.
primary (dict) – Sample returned by the primary LArCV dataset.
cache (dict) – Sample returned by the HDF5 cache dataset.

resolve_cache_align_key(key: str, cache: dict[str, Any]) → str[source]

Return the HDF5 key used to align one LArCV index field.

Parameters:

key (str) – Alignment key expected on the primary dataset side.
cache (dict) – Cache sample dictionary used to determine whether a source_<key> variant is available.

Returns:

HDF5-side key name that should match the primary key.

Return type:

str

merge_cache(merged: dict[str, Any], cache: dict[str, Any]) → None[source]

Merge one cached HDF5 sample into an existing LArCV sample.

Parameters:

merged (dict) – Mutable sample dictionary initially populated from the primary dataset.
cache (dict) – HDF5 cache sample to merge into merged.

property data_types: dict[str, str]

Return the collate type for each merged product.

Returns:: Mapping from merged output key to collate type.
Return type:: dict[str, str]

property overlay_methods: dict[str, str]

Return the overlay method for each merged product.

Returns:: Mapping from merged output key to overlay strategy.
Return type:: dict[str, str]

property data_keys: tuple[str, ...]

Return the names of all merged data products.

Returns:: Ordered tuple of keys exposed by the merged dataset.
Return type:: tuple[str, …]