spine.math.metrics

Numba JIT compiled implementation of clustering evaluation metrics.

This module provides efficient implementations of: - Adjusted Rand Index (ARI) - Adjusted Mutual Information (AMI)

Functions

`adjusted_mutual_info_score`(labels_true, ...)	Compute the Adjusted Mutual Information (AMI) between two clusterings.
`adjusted_rand_score`(labels_true, labels_pred)	Compute the Adjusted Rand Index (ARI) between two clusterings.

spine.math.metrics.adjusted_rand_score(labels_true, labels_pred)[source]

Compute the Adjusted Rand Index (ARI) between two clusterings.

The Adjusted Rand Index is a measure of the similarity between two data clusterings. It is a function that measures the similarity of the two assignments, ignoring permutations and correcting for chance agreement.

The ARI is bounded between -1 and 1: - 1.0 indicates perfect clustering agreement - 0.0 indicates random clustering (expected value for independent labelings) - Negative values indicate worse than random clustering

The formula is: ARI = (RI - E[RI]) / (max(RI) - E[RI]) where RI is the Rand Index and E[RI] is the expected Rand Index under random labelings.

Parameters:

labels_true (ndarray of shape (n_samples,)) – Ground truth class labels to be used as a reference.
labels_pred (ndarray of shape (n_samples,)) – Cluster labels to evaluate.

Returns:

ari – Adjusted Rand Index. A clustering result satisfying the constraints of a correct clustering has a score of 1.0.

Return type:

float

Notes

This implementation uses a fast numba-compiled algorithm that avoids constructing the full pairwise similarity matrix.

References

[1]

Hubert, L. and Arabie, P. (1985). “Comparing partitions.” Journal of Classification 2(1): 193-218.

[2]

https://en.wikipedia.org/wiki/Rand_index#Adjusted_Rand_index

Examples

Perfect clustering: >>> labels_true = [0, 0, 1, 1] >>> labels_pred = [0, 0, 1, 1] >>> adjusted_rand_score(labels_true, labels_pred) 1.0

Random clustering: >>> labels_true = [0, 0, 1, 1] >>> labels_pred = [0, 1, 0, 1] >>> adjusted_rand_score(labels_true, labels_pred) # doctest: +ELLIPSIS 0.0

spine.math.metrics.adjusted_mutual_info_score(labels_true, labels_pred)[source]

Compute the Adjusted Mutual Information (AMI) between two clusterings.

The Adjusted Mutual Information is a measure of agreement between two partitions, adjusted for chance. It employs the expected mutual information under a hypergeometric model of randomness.

The AMI is normalized between 0 and 1: - 1.0 indicates perfect clustering agreement - 0.0 indicates independent labelings (expected value for random labelings) - Values close to 0.0 indicate near-random agreement

The formula is: AMI = (MI - E[MI]) / (max(H(U), H(V)) - E[MI]) where MI is the mutual information, E[MI] is the expected mutual information, and H(U), H(V) are the entropies of the two labelings.

Parameters:

labels_true (ndarray of shape (n_samples,)) – Ground truth class labels to be used as a reference.
labels_pred (ndarray of shape (n_samples,)) – Cluster labels to evaluate.

Returns:

ami – Adjusted Mutual Information score. Perfect labelings are scored 1.0. Bad labelings or independent labelings have non-positive scores.

Return type:

float

Notes

This implementation uses a fast numba-compiled algorithm that computes the hypergeometric expected mutual information directly from the contingency table.

References

[1]

Vinh, N. X., Epps, J., & Bailey, J. (2010). “Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance.” Journal of Machine Learning Research, 11, 2837-2854.

[2]

https://en.wikipedia.org/wiki/Adjusted_mutual_information

Examples

Perfect clustering: >>> labels_true = [0, 0, 1, 1] >>> labels_pred = [0, 0, 1, 1] >>> adjusted_mutual_info_score(labels_true, labels_pred) 1.0

Random clustering: >>> labels_true = [0, 0, 1, 1] >>> labels_pred = [0, 1, 0, 1] >>> adjusted_mutual_info_score(labels_true, labels_pred) # doctest: +ELLIPSIS 0.0…