spine.io.read.HDF5Reader
- class spine.io.read.HDF5Reader(file_keys: str | list[str] | None = None, file_list: str | None = None, limit_num_files: int | None = None, max_print_files: int = 10, n_entry: int | None = None, n_skip: int | None = None, entry_list: list[int] | None = None, skip_entry_list: list[int] | None = None, run_event_list: list[list[int]] | None = None, skip_run_event_list: list[list[int]] | None = None, create_run_map: bool = False, build_classes: bool = True, skip_unknown_attrs: bool = False, run_info_key: str = 'run_info', allow_missing: bool = False, keep_open: bool = True, swmr: bool = False, ignore_incomplete: bool = False)[source]
Class which reads information stored in HDF5 files.
This class inherits from the
ReaderBaseclass. It provides methods to load HDF5 files and extract their data products. The files must be structured as follows:An events dataset with all the region references
One dataset per data product corresponding to each region reference in the events dataset
- Attributes:
- run_info
- run_map
Methods
close()Close any persistent HDF5 handles owned by this reader.
get(idx)Returns a specific entry in the file.
get_file_entry_index(idx)Returns the index of an entry within the file it lives in, provided a global index over the list of files.
get_file_index(idx)Returns the index of the file corresponding to a specific entry.
get_file_path(idx)Returns the path to the file corresponding to a specific entry.
get_run_event(run, subrun, event)Returns an entry corresponding to a specific (run, subrun, event) triplet.
get_run_event_index(run, subrun, event)Returns an entry index corresponding to a specific (run, subrun, event) triplet.
get_source_provenance(file_idx, file_entry_idx)Return lightweight source-file provenance for one entry.
is_remote_path(path)Checks whether a path points to a remote resource.
load_key(in_file, event, data, key)Fetch a specific key for a specific event.
parse_entry_list(list_source)Parses a list into an np.ndarray.
parse_run_event_list(list_source)Parses a list of (run, subrun, event) triplets into an np.ndarray.
Fetches the SPINE configuration used to produce the HDF5 file.
process_entry_list([n_entry, n_skip, ...])Create a list of entries that can be accessed by
__getitem__().process_file_paths([file_keys, file_list, ...])Process list of files.
process_run_info()Process the run information.
Returns the SPINE release version used to produce the HDF5 file.
resolve_object_class(class_name, array)Resolve an HDF5 object class name to the concrete SPINE class.
- __init__(file_keys: str | list[str] | None = None, file_list: str | None = None, limit_num_files: int | None = None, max_print_files: int = 10, n_entry: int | None = None, n_skip: int | None = None, entry_list: list[int] | None = None, skip_entry_list: list[int] | None = None, run_event_list: list[list[int]] | None = None, skip_run_event_list: list[list[int]] | None = None, create_run_map: bool = False, build_classes: bool = True, skip_unknown_attrs: bool = False, run_info_key: str = 'run_info', allow_missing: bool = False, keep_open: bool = True, swmr: bool = False, ignore_incomplete: bool = False) None[source]
Initalize the HDF5 file reader.
- Parameters:
file_keys (str or list[str], optional) – Path or list of paths to the HDF5 files to be read
file_list (str, optional) – Path to a text file containing a list of file paths to be read
limit_num_files (int, optional) – Integer limiting number of files to be taken per data directory
max_print_files (int, default 10) – Maximum number of loaded file names to be printed
n_entry (int, optional) – Maximum number of entries to load
n_skip (int, optional) – Number of entries to skip at the beginning
entry_list (list[int], optional) – List of integer entry IDs to add to the index
skip_entry_list (list[int], optional) – List of integer entry IDs to skip from the index
run_event_list (list[list[int]], optional) – List of (run, subrun, event) triplets to add to the index
skip_run_event_list (list[list[int]], optional) – List of (run, subrun, event) triplets to skip from the index
create_run_map (bool, default False) – Initialize a map between (run, subrun, event) triplets and entries. For large files, this can be quite expensive (must load every entry).
build_classes (bool, default True) – If the stored object is a class, build it back
skip_unknown_attrs (bool, default False) – If True, allow a loaded object to have unrecognized attributes. This allows backward compatibility with old files, but use with extreme caution, as this might hide a fundamental issue with your code.
run_info_key (str, default 'run_info') – Name of the data product which contains the run info of the event
allow_missing (bool, default False) – If True, allows missing entries in the entry or event list
keep_open (bool, default True) – If True, keep one read-only HDF5 handle open per file and per process. This avoids reopening files for every event access. If False, open and close the file on each get call.
swmr (bool, default False) – If True, open files in HDF5 single-writer/multiple-reader mode. This is only relevant when reading files produced by a writer that was configured for SWMR-safe operation.
ignore_incomplete (bool, default False) – If True, allow opening files marked as incomplete. By default, files with an explicit info.attrs[“complete”] = False marker are rejected.
Methods
__init__([file_keys, file_list, ...])Initalize the HDF5 file reader.
close()Close any persistent HDF5 handles owned by this reader.
get(idx)Returns a specific entry in the file.
get_file_entry_index(idx)Returns the index of an entry within the file it lives in, provided a global index over the list of files.
get_file_index(idx)Returns the index of the file corresponding to a specific entry.
get_file_path(idx)Returns the path to the file corresponding to a specific entry.
get_run_event(run, subrun, event)Returns an entry corresponding to a specific (run, subrun, event) triplet.
get_run_event_index(run, subrun, event)Returns an entry index corresponding to a specific (run, subrun, event) triplet.
get_source_provenance(file_idx, file_entry_idx)Return lightweight source-file provenance for one entry.
is_remote_path(path)Checks whether a path points to a remote resource.
load_key(in_file, event, data, key)Fetch a specific key for a specific event.
parse_entry_list(list_source)Parses a list into an np.ndarray.
parse_run_event_list(list_source)Parses a list of (run, subrun, event) triplets into an np.ndarray.
Fetches the SPINE configuration used to produce the HDF5 file.
process_entry_list([n_entry, n_skip, ...])Create a list of entries that can be accessed by
__getitem__().process_file_paths([file_keys, file_list, ...])Process list of files.
process_run_info()Process the run information.
Returns the SPINE release version used to produce the HDF5 file.
resolve_object_class(class_name, array)Resolve an HDF5 object class name to the concrete SPINE class.
Attributes
run_inforun_mapsource_keysfile_pathsfile_indexfile_offsetsentry_indexnum_entries- name: str = 'hdf5'
- close() None[source]
Close any persistent HDF5 handles owned by this reader.
This only affects handles cached in the current process. It is safe to call repeatedly.
- process_cfg() dict[str, Any] | None[source]
Fetches the SPINE configuration used to produce the HDF5 file.
- Returns:
Configuration dictionary
- Return type:
dict
- process_version() str[source]
Returns the SPINE release version used to produce the HDF5 file.
- Returns:
SPINE release tag
- Return type:
str
- get(idx: int) dict[str, Any][source]
Returns a specific entry in the file.
- Parameters:
idx (int) – Integer entry ID to access
- Returns:
data – Ditionary of data products corresponding to one event
- Return type:
dict
- static resolve_object_class(class_name: str, array: ndarray) type[source]
Resolve an HDF5 object class name to the concrete SPINE class.
This keeps backward-compatibility quirks localized in the reader. In particular, older HDF5 files stored image metadata with
class_name="Meta". Newer files store the explicitImageMeta2D/ImageMeta3Dclass names instead.- Parameters:
class_name (str) – Class name stored in the HDF5 dataset metadata
array (np.ndarray) – Structured array slice containing the serialized objects
- Returns:
Concrete SPINE data class to reconstruct
- Return type:
- load_key(in_file: File, event: dict[str, Any], data: dict[str, Any], key: str) None[source]
Fetch a specific key for a specific event.
- Parameters:
in_file (h5py.File) – HDF5 file instance
event (dict) – Dictionary of objects that make up one event
data (dict) – Dictionary of data products corresponding to one event
key (str) – Name of the dataset in the entry