spine.io.parse.clean_data

Module which contains functions used to clean up cluster data.

When loading larcv.Cluster3DVoxelTensor objects into tensors, there can be duplicate voxels. These routines are used to remove these duplicates and ensure the ordering of the output.

Functions

`aggregate_features`(data, groups, cols)	Aggregate features by summing the requested columns over groups.
`aggregate_mean_features`(data, groups, cols)	Average the information in pre-defined voxel groups.
`aggregate_sum_features`(data, groups, cols)	Aggregate the information in pre-defined voxel groups.
`clean_sparse_data`(-> tuple[~numpy.ndarray, ...)	Clean and align cluster voxels against an optional sparse reference.
`filter_duplicate_voxels`(data)	Returns a mask of non-duplicate voxels.
`filter_duplicate_voxels_group`(data[, ...])	Returns a mask of non-duplicate voxels and a list of duplicate groups.
`filter_voxels_ref`(data, reference)	Removes voxels thsat do not appear in a reference tensor.

spine.io.parse.clean_data.clean_sparse_data(cluster_voxels: ndarray, cluster_data: ndarray, sparse_voxels: ndarray | None = None, sum_cols: ndarray | None = None, avg_cols: ndarray | None = None, prec_col: int | None = SHAPE_COL, precedence: ndarray | list[int] | tuple[int, ...] | None = SHAPE_PREC, *, return_index: Literal[False] = False) → tuple[ndarray, ndarray][source]

spine.io.parse.clean_data.clean_sparse_data(cluster_voxels: ndarray, cluster_data: ndarray, sparse_voxels: ndarray | None = None, sum_cols: ndarray | None = None, avg_cols: ndarray | None = None, prec_col: int | None = SHAPE_COL, precedence: ndarray | list[int] | tuple[int, ...] | None = SHAPE_PREC, *, return_index: Literal[True]) → tuple[ndarray, ndarray, ndarray]

Clean and align cluster voxels against an optional sparse reference.

This function does the following: 1. Lexicographically sort group data (images are lexicographically sorted) 2. Choose only one group per voxel (by lexicographic order or precedence) 3. Remove voxels from cluster data that are not in the image data (optional)

The set of sparse voxels must be a subset of the set of cluster voxels and it must not contain any duplicates. If not provided, this function can also be used to remove duplicates when overlaying multiple images together.

Parameters:

cluster_voxels (np.ndarray) – (N, 3) Matrix of voxel coordinates in the cluster3d tensor
cluster_data (np.ndarray) – (N, F) Matrix of voxel values corresponding to each voxel in the cluster3d tensor
sparse_voxels (np.ndarray, optional) – (M, 3) Matrix of voxel coordinates in the reference sparse tensor
sum_cols (np.ndarray, optional) – List of feature columns to sum when removing duplicates
avg_cols (np.ndarray, optional) – List of feature columns to average when removing duplicates
prec_col (int, default SHAPE_COL) – Column in the input feature tensor to use as a precdence source
precedence (np.ndarray or list[int], default SHAPE_PREC) –
1. Array of classes in the reference array, ordered by precedence
return_index (bool, default False) – If True, also return the selected row indexes into the original cluster_voxels / cluster_data arrays.

Returns:

cluster_voxels (np.ndarray) – (M, 3) Ordered and filtered set of voxel coordinates
cluster_data (np.ndarray) – (M, F) Ordered and filtered set of voxel values
index (np.ndarray, optional) –

(M) Selected row indexes in the original input ordering. Only returned when return_index is True.

spine.io.parse.clean_data.filter_duplicate_voxels(data: ndarray) → ndarray[source]

Returns a mask of non-duplicate voxels.

If there are multiple voxels with the same coordinates, this algorithm simply keeps the last one in the list.

Parameters:

data (np.ndarray) – (N, 3) Lexicographically sorted matrix of voxel coordinates

Returns:

Boolean mask which is False for pixels to remove

Return type:

np.ndarray

spine.io.parse.clean_data.filter_duplicate_voxels_group(data: ndarray, reference: ndarray | None = None, precedence: list[int] | None = None) → tuple[ndarray, dict[int, ndarray]][source]

Returns a mask of non-duplicate voxels and a list of duplicate groups.

If there are multiple voxels with the same coordinates, this algorithm simply keeps the last one in the list.

If a precedence is defined and there are multiple voxels with the same coordinates, this algorithm picks the voxel which has the label that comes first in order of precedence. If multiple voxels with the same precedence index share voxel coordinates, the last one is picked.

The duplicate voxel groups map the chosen voxel indices to the set of voxels which share voxel coordinates.

Parameters:

data (np.ndarray) – (N, 3) Lexicographically sorted matrix of voxel coordinates
reference (np.ndarray, optional) –
1. Array of values which have to follow the precedence order
precedence (list, optional) –
1. Array of classes in the reference array, ordered by precedence

Returns:

np.ndarray –
1. Boolean mask which is False for pixels to remove
dict[int, np.ndarray] – Map between kept voxel indexes onto voxels which share the same coordinates

spine.io.parse.clean_data.filter_voxels_ref(data: ndarray, reference: ndarray) → ndarray[source]

Removes voxels thsat do not appear in a reference tensor.

Returns an array which does not contain any voxels which do not belong to the reference array. The reference array must contain a subset of the voxels in the array to be filtered.

Assumes both arrays are lexicographically sorted, the reference matrix contains no duplicates and is a subset of the matrix to be filtered.

Parameters:

data (np.ndarray) – (N, 3) Lexicographically sorted matrix of voxel coordinates to filter
reference (np.ndarray) – (N, 3) Lexicographically sorted matrix of voxel coordinates to match

Returns:

Boolean mask which is False for pixels to remove

Return type:

np.ndarray

spine.io.parse.clean_data.aggregate_features(data: ndarray, groups: dict[int, ndarray], cols: ndarray) → ndarray[source]: Aggregate features by summing the requested columns over groups.

spine.io.parse.clean_data.aggregate_sum_features(data: ndarray, groups: dict[int, ndarray], cols: ndarray) → ndarray[source]

Aggregate the information in pre-defined voxel groups.

Parameters:

data (np.ndarray) – (N, F) Matrix of voxel features to aggregate
groups (dict[int, np.ndarray]) – Map between kept voxel indexes onto voxels which share the same coordinates
cols (np.ndarray) – List of feature columns to modify

Returns:

(N, F) Matrix of aggregated voxel features

Return type:

np.ndarray

spine.io.parse.clean_data.aggregate_mean_features(data: ndarray, groups: dict[int, ndarray], cols: ndarray) → ndarray[source]: Average the information in pre-defined voxel groups.