The cluster module¶

Inside the cluster module one can find the Cluster class, which is the main class of the package. It is inisialised with a pair of adjacency matrices (one for the positive and one for the negative graph) and it contain all the implemented algorithms as class methods.

class cluster.Cluster(data)¶

Class containing all clustering algorithms for signed networks.

This should be initialised with a tuple of two csc matrices, representing positive and negative adjacency matrix respectively (A^+ and A^-). It contains clustering algorithms as methods and graph specifications as attributes.

Parameters:	data (tuple) – Tuple containing positive and negative adjacency matrix (A^+, A^-).

p¶

positive adjacency matrix.

Type:	csc matrix

n¶

negative adjacency matrix.

Type:	csc matrix

A¶

total adjacency matrix.

Type:	csc matrix

D_p¶

diagonal degree matrix of positive adjacency.

Type:	csc matrix

D_n¶

diagonal degree matrix of negative adjacency.

Type:	csc matrix

Dbar¶

diagonal signed degree matrix.

Type:	csc matrix

normA¶

symmetrically normalised adjacency matrix.

Type:	csc matrix

size¶

number of nodes in network

Type:	int

SDP_cluster(k, solver='BM_proj_grad', normalisation='sym_sep')¶

Clustering based on a SDP relaxation of the clustering problem.

A low dimensional embedding is obtained via the lowest eigenvectors of positive-semidefinite matrix Z which maximises its Frobenious product with the adjacency matrix and k-means is performed in this space.

Parameters:	k (int, or list of int) – The number of clusters to identify. If a list is given, the output is a corresponding list. solver (str) – Type of solver for the SDP formulation. ‘interior_point_method’ - Interior point method. ‘BM_proj_grad’ - Burer Monteiro method using projected gradient updates. ‘BM_aug_lag’ - Burer Monteiro method using augmented Lagrangian updates.
Returns:	Label assignments.
Return type:	array of int, or list of array of int

SPONGE(k=4, tau_p=1, tau_n=1, eigens=None, mi=None)¶

Clusters the graph using the Signed Positive Over Negative Generalised Eigenproblem (SPONGE) clustering.

The algorithm tries to minimises the following ratio (Lbar^+ + tau_n D^-)/(Lbar^- + tau_p D^+). The parameters tau_p and tau_n can be typically set to one.

Other Parameters:
Parameters:	k (int, or list of int) – The number of clusters to identify. If a list is given, the output is a corresponding list. tau_n (float) – regularisation of the numerator tau_p (float) – regularisation of the denominator
Returns:	Output assignment to clusters.
Return type:	array of int, or list of array of int
	eigens (int) – The number of eigenvectors to take. Defaults to k. mi (int) – The maximum number of iterations for which to run eigenvlue solvers. Defaults to number of nodes. nudge (int) – Amount added to diagonal to bound eigenvalues away from 0.

SPONGE_sym(k=4, tau_p=1, tau_n=1, eigens=None, mi=None)¶

Clusters the graph using the symmetric normalised version of the SPONGE clustering algorithm.

The algorithm tries to minimises the following ratio (Lbar_sym^+ + tau_n Id)/(Lbar_sym^- + tau_p Id). The parameters tau_p and tau_n can be typically set to one.

Other Parameters:
Parameters:	k (int, or list of int) – The number of clusters to identify. If a list is given, the output is a corresponding list. tau_n (float) – regularisation of the numerator tau_p (float) – regularisation of the denominator
Returns:	Output assignment to clusters.
Return type:	array of int, or list of array of int
	eigens (int) – The number of eigenvectors to take. Defaults to k. mi (int) – The maximum number of iterations for which to run eigenvlue solvers. Defaults to number of nodes. nudge (int) – Amount added to diagonal to bound eigenvalues away from 0.

find_eigenvalues(k=100, matrix='laplacian')¶

Find top or bottom k eigenvalues of adjacency or laplacian matrix.

The list of the top (bottom) k eigenvalues of the adjacency (laplacian) matrix is returned. This can be useful in identifying the number of clusters.

Note

The Laplacian matrix used is the signed symmetric Laplacian.

Parameters:

k (int) – Number of eigenvalues to return
matrix (str) – Type of matrix to diagonalise (either ‘adjacency’ or ‘laplacian’)

Returns:

An array of the first k eigenvalues, ordered in ascending or descending order: (depending on the matrix type)

Return type:

array of float

geproblem_adjacency(k=4, normalisation='multiplicative', eigens=None, mi=None, nudge=0.5)¶

Clusters the graph by solving a adjacency-matrix-based generalised eigenvalue problem.

Other Parameters:
Parameters:	k (int, or list of int) – The number of clusters to identify. If a list is given, the output is a corresponding list. normalisation (string) – How to normalise for cluster size: ‘none’ - do not normalise. ‘additive’ - add degree matrices appropriately. ‘multiplicative’ - multiply by degree matrices appropriately.
Returns:	Output assignment to clusters.
Return type:	array of int, or list of array of int
	eigens (int) – The number of eigenvectors to take. Defaults to k. mi (int) – The maximum number of iterations for which to run eigenvlue solvers. Defaults to number of nodes. nudge (int) – Amount added to diagonal to bound eigenvalues away from 0.

geproblem_laplacian(k=4, normalisation='multiplicative', eigens=None, mi=None, tau=1.0)¶

Clusters the graph by solving a Laplacian-based generalised eigenvalue problem.

Other Parameters:
Parameters:	k (int, or list of int) – The number of clusters to identify. If a list is given, the output is a corresponding list. normalisation (string) – How to normalise for cluster size: ‘none’ - do not normalise. ‘additive’ - add degree matrices appropriately. ‘multiplicative’ - multiply by degree matrices appropriately.
Returns:	Output assignment to clusters.
Return type:	array of int, or list of array of int
	eigens (int) – The number of eigenvectors to take. Defaults to k. mi (int) – The maximum number of iterations for which to run eigenvlue solvers. Defaults to number of nodes. nudge (int) – Amount added to diagonal to bound eigenvalues away from 0.

spectral_cluster_adjacency(k=2, normalisation='sym_sep', eigens=None, mi=None)¶

Clusters the graph using eigenvectors of the adjacency matrix.

Other Parameters:
Parameters:	k (int, or list of int) – The number of clusters to identify. If a list is given, the output is a corresponding list. normalisation (string) – How to normalise for cluster size: ‘none’ - do not normalise. ‘sym’ - symmetric normalisation. ‘rw’ - random walk normalisation. ‘sym_sep’ - separate symmetric normalisation of positive and negative parts. ‘rw_sep’ - separate random walk normalisation of positive and negative parts.
Returns:	Output assignment to clusters.
Return type:	array of int, or list of array of int
	eigens (int) – The number of eigenvectors to take. Defaults to k. mi (int) – The maximum number of iterations for which to run eigenvlue solvers. Defaults to number of nodes.

spectral_cluster_adjacency_reg(k=2, normalisation='sym_sep', tau_p=None, tau_n=None, eigens=None, mi=None)¶

Clusters the graph using eigenvectors of the regularised adjacency matrix.

Other Parameters:
Parameters:	k (int) – The number of clusters to identify. normalisation (string) – How to normalise for cluster size: ‘none’ - do not normalise. ‘sym’ - symmetric normalisation. ‘rw’ - random walk normalisation. ‘sym_sep’ - separate symmetric normalisation of positive and negative parts. ‘rw_sep’ - separate random walk normalisation of positive and negative parts. tau_p (int) – Regularisation coefficient for positive adjacency matrix. tau_n (int) – Regularisation coefficient for negative adjacency matrix.
Returns:	Output assignment to clusters.
Return type:	array of int
	eigens (int) – The number of eigenvectors to take. Defaults to k. mi (int) – The maximum number of iterations for which to run eigenvlue solvers. Defaults to number of nodes.

spectral_cluster_bethe_hessian(k, mi=1000, r=None, justpos=True)¶

Clustering based on signed Bethe Hessian.

A low dimensional embedding is obtained via the lowest eigenvectors of the signed Bethe Hessian matrix Hbar and k-means is performed in this space.

Parameters:	k (int, or list of int) – The number of clusters to identify. If a list is given, the output is a corresponding list. mi (int) – Maximum number of iterations of the eigensolver. type (str) – Types of normalisation of the Laplacian matrix. ‘unnormalised’, ‘symmetric’, ‘random_walk’.
Returns:	Label assignments. int: Suggested number of clusters for network.
Return type:	array of int, or list of array of int

spectral_cluster_bnc(k=2, normalisation='sym', eigens=None, mi=None)¶

Clusters the graph by using the Balance Normalised Cut or Balance Ratio Cut objective matrix.

Other Parameters:
Parameters:	k (int, or list of int) – The number of clusters to identify. If a list is given, the output is a corresponding list. normalisation (string) – How to normalise for cluster size: ‘none’ - do not normalise. ‘sym’ - symmetric normalisation. ‘rw’ - random walk normalisation.
Returns:	Output assignment to clusters.
Return type:	array of int, or list of array of int
	eigens (int) – The number of eigenvectors to take. Defaults to k. mi (int) – The maximum number of iterations for which to run eigenvlue solvers. Defaults to number of nodes.

spectral_cluster_laplacian(k=2, normalisation='sym_sep', eigens=None, mi=None)¶

Clusters the graph using the eigenvectors of the graph signed Laplacian.

Other Parameters:
Parameters:	k (int, or list of int) – The number of clusters to identify. If a list is given, the output is a corresponding list. normalisation (string) – How to normalise for cluster size: ‘none’ - do not normalise. ‘sym’ - symmetric normalisation. ‘rw’ - random walk normalisation. ‘sym_sep’ - separate symmetric normalisation of positive and negative parts. ‘rw_sep’ - separate random walk normalisation of positive and negative parts.
Returns:	Output assignment to clusters.
Return type:	array of int, or list of array of int
	eigens (int) – The number of eigenvectors to take. Defaults to k. mi (int) – The maximum number of iterations for which to run eigenvlue solvers. Defaults to number of nodes.

waggle(k, labs, matrix=None, rounds=50, mini=False)¶

Postprocessing based on iteratively merging and cutting clusters of the provided solution.

Pairs of clusters are merged randomly. Merged clusters are then partitioned in two by spectral clustering on input matrix.

Other Parameters:
Parameters:	k (int) – The number of clusters to identify. labs (array of int) – Initial assignment to clusters. matrix (csc matrix) – Matrix to use for partitioning. Defaults to un-normalised adjacency matrix.
Returns:	Output assignment to clusters.
Return type:	array of int
	rounds (int) – Number of iterations to perform. mini (boolean) – Whether to minimise (rather than maximise) the input matrix objective when partitioning.