spine.io.read.HDF5Reader

class spine.io.read.HDF5Reader(file_keys: str | list[str] | None = None, file_list: str | None = None, limit_num_files: int | None = None, max_print_files: int = 10, n_entry: int | None = None, n_skip: int | None = None, entry_list: list[int] | None = None, skip_entry_list: list[int] | None = None, run_event_list: list[list[int]] | None = None, skip_run_event_list: list[list[int]] | None = None, create_run_map: bool = False, build_classes: bool = True, skip_unknown_attrs: bool = False, run_info_key: str = 'run_info', allow_missing: bool = False, keep_open: bool = True, swmr: bool = False, ignore_incomplete: bool = False)[source]

Class which reads information stored in HDF5 files.

This class inherits from the ReaderBase class. It provides methods to load HDF5 files and extract their data products. The files must be structured as follows:

  • An events dataset with all the region references

  • One dataset per data product corresponding to each region reference in the events dataset

Attributes:
run_info
run_map

Methods

close()

Close any persistent HDF5 handles owned by this reader.

get(idx)

Returns a specific entry in the file.

get_file_entry_index(idx)

Returns the index of an entry within the file it lives in, provided a global index over the list of files.

get_file_index(idx)

Returns the index of the file corresponding to a specific entry.

get_file_path(idx)

Returns the path to the file corresponding to a specific entry.

get_run_event(run, subrun, event)

Returns an entry corresponding to a specific (run, subrun, event) triplet.

get_run_event_index(run, subrun, event)

Returns an entry index corresponding to a specific (run, subrun, event) triplet.

get_source_provenance(file_idx, file_entry_idx)

Return lightweight source-file provenance for one entry.

is_remote_path(path)

Checks whether a path points to a remote resource.

load_key(in_file, event, data, key)

Fetch a specific key for a specific event.

parse_entry_list(list_source)

Parses a list into an np.ndarray.

parse_run_event_list(list_source)

Parses a list of (run, subrun, event) triplets into an np.ndarray.

process_cfg()

Fetches the SPINE configuration used to produce the HDF5 file.

process_entry_list([n_entry, n_skip, ...])

Create a list of entries that can be accessed by __getitem__().

process_file_paths([file_keys, file_list, ...])

Process list of files.

process_run_info()

Process the run information.

process_version()

Returns the SPINE release version used to produce the HDF5 file.

resolve_object_class(class_name, array)

Resolve an HDF5 object class name to the concrete SPINE class.

__init__(file_keys: str | list[str] | None = None, file_list: str | None = None, limit_num_files: int | None = None, max_print_files: int = 10, n_entry: int | None = None, n_skip: int | None = None, entry_list: list[int] | None = None, skip_entry_list: list[int] | None = None, run_event_list: list[list[int]] | None = None, skip_run_event_list: list[list[int]] | None = None, create_run_map: bool = False, build_classes: bool = True, skip_unknown_attrs: bool = False, run_info_key: str = 'run_info', allow_missing: bool = False, keep_open: bool = True, swmr: bool = False, ignore_incomplete: bool = False) None[source]

Initalize the HDF5 file reader.

Parameters:
  • file_keys (str or list[str], optional) – Path or list of paths to the HDF5 files to be read

  • file_list (str, optional) – Path to a text file containing a list of file paths to be read

  • limit_num_files (int, optional) – Integer limiting number of files to be taken per data directory

  • max_print_files (int, default 10) – Maximum number of loaded file names to be printed

  • n_entry (int, optional) – Maximum number of entries to load

  • n_skip (int, optional) – Number of entries to skip at the beginning

  • entry_list (list[int], optional) – List of integer entry IDs to add to the index

  • skip_entry_list (list[int], optional) – List of integer entry IDs to skip from the index

  • run_event_list (list[list[int]], optional) – List of (run, subrun, event) triplets to add to the index

  • skip_run_event_list (list[list[int]], optional) – List of (run, subrun, event) triplets to skip from the index

  • create_run_map (bool, default False) – Initialize a map between (run, subrun, event) triplets and entries. For large files, this can be quite expensive (must load every entry).

  • build_classes (bool, default True) – If the stored object is a class, build it back

  • skip_unknown_attrs (bool, default False) – If True, allow a loaded object to have unrecognized attributes. This allows backward compatibility with old files, but use with extreme caution, as this might hide a fundamental issue with your code.

  • run_info_key (str, default 'run_info') – Name of the data product which contains the run info of the event

  • allow_missing (bool, default False) – If True, allows missing entries in the entry or event list

  • keep_open (bool, default True) – If True, keep one read-only HDF5 handle open per file and per process. This avoids reopening files for every event access. If False, open and close the file on each get call.

  • swmr (bool, default False) – If True, open files in HDF5 single-writer/multiple-reader mode. This is only relevant when reading files produced by a writer that was configured for SWMR-safe operation.

  • ignore_incomplete (bool, default False) – If True, allow opening files marked as incomplete. By default, files with an explicit info.attrs[“complete”] = False marker are rejected.

Methods

__init__([file_keys, file_list, ...])

Initalize the HDF5 file reader.

close()

Close any persistent HDF5 handles owned by this reader.

get(idx)

Returns a specific entry in the file.

get_file_entry_index(idx)

Returns the index of an entry within the file it lives in, provided a global index over the list of files.

get_file_index(idx)

Returns the index of the file corresponding to a specific entry.

get_file_path(idx)

Returns the path to the file corresponding to a specific entry.

get_run_event(run, subrun, event)

Returns an entry corresponding to a specific (run, subrun, event) triplet.

get_run_event_index(run, subrun, event)

Returns an entry index corresponding to a specific (run, subrun, event) triplet.

get_source_provenance(file_idx, file_entry_idx)

Return lightweight source-file provenance for one entry.

is_remote_path(path)

Checks whether a path points to a remote resource.

load_key(in_file, event, data, key)

Fetch a specific key for a specific event.

parse_entry_list(list_source)

Parses a list into an np.ndarray.

parse_run_event_list(list_source)

Parses a list of (run, subrun, event) triplets into an np.ndarray.

process_cfg()

Fetches the SPINE configuration used to produce the HDF5 file.

process_entry_list([n_entry, n_skip, ...])

Create a list of entries that can be accessed by __getitem__().

process_file_paths([file_keys, file_list, ...])

Process list of files.

process_run_info()

Process the run information.

process_version()

Returns the SPINE release version used to produce the HDF5 file.

resolve_object_class(class_name, array)

Resolve an HDF5 object class name to the concrete SPINE class.

Attributes

name

run_info

run_map

source_keys

file_paths

file_index

file_offsets

entry_index

num_entries

name: str = 'hdf5'
close() None[source]

Close any persistent HDF5 handles owned by this reader.

This only affects handles cached in the current process. It is safe to call repeatedly.

process_cfg() dict[str, Any] | None[source]

Fetches the SPINE configuration used to produce the HDF5 file.

Returns:

Configuration dictionary

Return type:

dict

process_version() str[source]

Returns the SPINE release version used to produce the HDF5 file.

Returns:

SPINE release tag

Return type:

str

get(idx: int) dict[str, Any][source]

Returns a specific entry in the file.

Parameters:

idx (int) – Integer entry ID to access

Returns:

data – Ditionary of data products corresponding to one event

Return type:

dict

static resolve_object_class(class_name: str, array: ndarray) type[source]

Resolve an HDF5 object class name to the concrete SPINE class.

This keeps backward-compatibility quirks localized in the reader. In particular, older HDF5 files stored image metadata with class_name="Meta". Newer files store the explicit ImageMeta2D / ImageMeta3D class names instead.

Parameters:
  • class_name (str) – Class name stored in the HDF5 dataset metadata

  • array (np.ndarray) – Structured array slice containing the serialized objects

Returns:

Concrete SPINE data class to reconstruct

Return type:

type

load_key(in_file: File, event: dict[str, Any], data: dict[str, Any], key: str) None[source]

Fetch a specific key for a specific event.

Parameters:
  • in_file (h5py.File) – HDF5 file instance

  • event (dict) – Dictionary of objects that make up one event

  • data (dict) – Dictionary of data products corresponding to one event

  • key (str) – Name of the dataset in the entry