structural_diversity_index package

Submodules

structural_diversity_index.MeetingTimeEstimator module

class structural_diversity_index.MeetingTimeEstimator.MeetingTimeEstimator

Bases: object

The class MeetingTimeEstimator implements some methods that allow to estimate the value of a meeting time sample, given that the walks simulated to compute that sample did not meet. The main idea behind these methods is to use the fact that meeting times are approximately geometrically distributed

static estimate_meeting_times_unmet_random_walks(meeting_times: numpy.ndarray, max_time_steps: int) → numpy.ndarray

Given samples of the meeting times of two random walks containing some samples in which the walks did not meet, this method estimates the value of the samples in which the walks did not meet.

Our estimate for the meeting time is based on the following reasoning. If the meeting time is approximately geometrically distributed it suffices to:

Estimate the parameter p of the geometric distribution which best approximate the distribution of the meeting time
Replace each -1 (i.e. each sample of the meeting time to be estimated) with: v + max_time_steps + 1, where v is a value sampled from the geometric distribution with parameter p. Here we are using the memory-less property of the geometric distribution.

Parameters

meeting_times (1D np.ndarray) – a 1D np.ndarray containing samples of the meeting times of two random walks. Indicate by -1 the samples which you want to be estimated
max_time_steps (int) – The number of time steps that the random walks have been run in order to compute the samples of the meeting times passed in the array meeting_times. For the samples with a -1 we assume that the walks took more than max_time_steps to meet.

Returns

a 1D np.ndarray containing samples of the meeting times of two random walks. The -1 in the array passed as parameter are replaced by estimates for the meeting time.

Return type

(1D np.ndarray)

static estimate_p_geometric_approximation(meeting_times: numpy.ndarray, max_time_steps: int) → float

Computes the parameter p of the geometric distribution which best approximate the distribution of the meeting time.

The computation is done by using the following equation: P[T leq t] = (1 - p)^t. Here T is the meeting time of the walks. If T is geometrically distributed the equation above should hold for some p. To compute this p, we compute all terms in the equation and then isolate p. Here t = max_time_steps and P[T leq t] is computed by counting the number of walks that did not meet.

Parameters

meeting_times (1D np.ndarray) – a 1D np.ndarray containing samples of the meeting times of two random walks. Indicate by -1 the samples which you want to be estimated
max_time_steps (int) – The number of time steps that the random walks have been run in order to compute the samples of the meeting times passed in the array meeting_times. For the samples with a -1 we assume that the walks took more than max_time_steps to meet.

Returns

the parameter p of the geometric distribution which best approximate the distribution of the meeting time

Return type

(float)

structural_diversity_index.MeetingTimeUI module

class structural_diversity_index.MeetingTimeUI.MeetingTimesUI

Bases: object

This class provides a user interface for the code computing the structural diversity index.

static _get_random_walk_simulator(adjacency: scipy.sparse.csr_matrix, degrees: numpy.ndarray, on_cuda: bool)

Given an adjacency matrix and a degree sequence of a graph it gets the random walk simulator for that graph.

Parameters

adjacency (sparse.csr_matrix) – the adjacency matrix of the graph
degrees (np.ndarray) – the degree sequence of the graph
on_cuda (bool) – if True, uses cuda to make computations

Returns

An instance of the class RandomWalkSimulator or RandomWalkSimulatorCUDA

static _get_random_walk_simulator_networkx(g: networkx.Graph, on_cuda: bool)

Given a graph it gets the random walk simulator for that graph.

Parameters

g (nx.Graph) – the graph on which you want to simulate a random walk
on_cuda (bool) – if True, uses cuda to make computations

Returns

An instance of the class RandomWalkSimulator or RandomWalkSimulatorCUDA

static get_meeting_times(g: networkx.Graph, max_time_steps: Optional[int] = None, n_samples: int = 1000, on_cuda: bool = False) → numpy.ndarray

Returns the n_samples of the meeting time of two uniformly started random walks on the graph g

Args:
g (nx.Graph): the graph for which you want the meeting time samples max_time_steps (int): the maximum number of time steps to run the random walks to compute the meeting time (default = 10*g.number_of_nodes()) n_samples (int): the number of samples of the meeting time (default = 1000) on_cuda (bool): if True, uses cuda to make computations (default = False)

Returns:
(np.ndarray): An array in which each entry is one sample of the meeting time on the graph g. Note: To estimate the meeting time of random walks that do not meet within max_times_steps steps we use the method MeetingTimeEstimator.estimate_meeting_times_unmet_random_walks(). See the class for a description.

static get_structural_diversity_index(g: networkx.Graph, max_time_steps: Optional[int] = None, n_samples: int = 100, on_cuda: bool = False) → Tuple[float, float]

Returns the structural diversity index of a graph g

Args:
g (nx.Graph): the graph for which you want the structural diversity index max_time_steps (int): the maximum number of time steps to run the random walks to compute the meeting time (default = 10*g.number_of_nodes()) n_samples (int): the number of samples of the meeting time (default = 1000) on_cuda (bool): if True, uses cuda to make computations (default = False)

Returns:
(float, float): The structural diversity index of the graph g and the standard deviation in the samples used for the computation. This index is computed by obtaining n_samples of the meeting time of the graph g and averaging over them. Note: To estimate the meeting time of random walks that do not meet within max_times_steps steps we use the method MeetingTimeEstimator.estimate_meeting_times_unmet_random_walks(). See the class for a description.

structural_diversity_index.RandomWalkSimulator module

class structural_diversity_index.RandomWalkSimulator.RandomWalkSimulator(random_walk_matrix: scipy.sparse.csr_matrix)

Bases: object

The class RandomWalkSimulator is designed to run a fast simulations of a random walk on a graph and compute the meeting times of two walks

static _check_complete(meeting_times: numpy.ndarray) → bool

This method check if all the walks have met

Parameters: meeting_times (2D np.ndarray) – the matrix of meeting times. Entry i, j is the meeting time of walk i with walk j. The default value of -1 is set when the walks have not met yet.
Returns: True if all the walks have met and False otherwise
Return type: (bool)

static _check_meetings(current_pos: numpy.ndarray, meeting_times: numpy.ndarray) → numpy.ndarray

This method checks if the walks meet for the first time at the current step.

Parameters

current_pos (1D np.ndarray) – the current position of the walks (i.e. the names of the vertices at which the walks are)
meeting_times (2D np.ndarray) – the matrix of meeting times. Entry i, j is the meeting time of walk i with walk j. The default value of -1 is set when the walks have not met yet.

Returns

A matrix with entry (i,j) being true if walk i has met walk j for the first time at current_position and false otherwise (i.e. either the walks are not at the same position, or they have met before).

Return type

(2D np.ndarray of boolean values)

static _compute_next_position_from_proba(proba_next_pos: scipy.sparse.csr_matrix) → numpy.ndarray

For each row i of proba_next_pos, corresponding to one random walk, it samples the column index j with the probability specified in the entry proba_next_pos[i][j]. The index j corresponds to the next position of the walk

Parameters: proba_next_pos (2D scipy.sparse.csr_matrix) – a matrix with entry i,j being the probability of moving from i to j for the random walk. The matrix has axis 0 of length n_samples (i.e. number of walks) and axis 1 of length n_nodes (i.e. the probability for the walk to move at each one of those nodes)
Returns: the next position of the walks, randomly sampled from proba_next_pos
Return type: (1D np.ndarray)

_get_meeting_times(max_time_steps: int, start_position: numpy.ndarray) → numpy.ndarray

This method simulates random walks started at start_position and keeps track of their meeting times. Then it returns these meeting times, once all the walks have met or max_time_steps has expired.

Parameters

max_time_steps (int) – the number of time steps for which you want to simulate the random walks at most
start_position (1D np.ndarray, optional) – a 1D np.ndarray in which each entry is the starting positions of one sample of the random walk.

Returns

a 2D np.ndarray with entry m,n being the meeting time between sample walk m and sample walk n. If two walks never meet the value of the m,n entry is -1

Return type

(2D np.ndarray)

_next_step(array_current: scipy.sparse.csr_matrix) → Tuple[numpy.ndarray, scipy.sparse.csr_matrix]

Given the current step of all the samples of the random walk, computes the next step for all the samples of the random walk

Parameters

array_current (sparse.csr_matrix) – A matrix with rows indicating the current position of the random walk. Each row is of the form [0,0, …, 0, 1, 0, …, 0]. The entry 1 indicates the current position of the random walk.

Returns

A tuple consisting of:

A 1D np.ndarray with for each sample of the random walk the index of the vertex where the rw will jump next
A 2D scipy.sparse.csr_matrix with rows indicates the next position of the random walk of a sample.

Each row is of the form [0,0, …, 0, 1, 0, …, 0]. The entry 1 indicates the next position of the random walk.

Return type

(np.ndarray, scipy.sparse.csr_matrix)

_start_position(n_samples: int, start_position: numpy.ndarray) → Tuple[numpy.ndarray, scipy.sparse.csr_matrix]

Randomly chooses starting vertices for each sample of the random walks

Parameters

n_samples (int) – the number of random walks you want to sample
start_position (1D np.ndarray) – starting positions for each sample of the random walk

Returns

A tuple consisting of:

A 1D np.ndarray with for each sample of the random walks the index of the starting vertex of the random walks
A 2D scipy.sparse.csr_matrix with rows indicates the starting position of the random walk of a sample.

Each row is of the form [0,0, …, 0, 1, 0, …, 0]. The entry 1 indicates the position of the random walk.

Return type

[(np.ndarray, scipy.sparse.csr_matrix)]

get_meeting_times(max_time_steps: int, n_samples: int) → numpy.ndarray

Gets the meeting times necessary to compute delta_g, i.e., it compute n_samples of the meeting time of two randomly started walks.

Parameters

max_time_steps (int) – the number of time steps for which you want to simulate the random walks at most
n_samples (int) – the number of samples of the meeting time that you want

Returns

a 1D np.ndarray in which each entry is one sample of the meeting time of two randomly started walks. If the walks met after max_time_steps the entry is equal to -1 by default.

Return type

(1D np.ndarray)

static random_walk_matrix_from_adjacency(adjacency: scipy.sparse.csr_matrix, degrees: numpy.ndarray) → scipy.sparse.csr_matrix

Returns the transition matrix of the random walk

Parameters

adjacency (scipy.sparse.csr_matrix) – the adjacency matrix of the graph for which we want to build the transition matrix
degrees (np.ndarray) – the degrees of the nodes in the graph

Returns

the transition matrix of the random walk. This is defined as P(i,j) = 1/deg(i)

Return type

(scipy.sparse.csr_matrix)

structural_diversity_index.RandomWalkSimulatorCUDA module

class structural_diversity_index.RandomWalkSimulatorCUDA.RandomWalkSimulator(random_walk_matrix: cupyx.scipy.sparse.csr_matrix)

Bases: object

The class RandomWalkSimulator is designed to run a fast simulations of a random walk on a graph and compute the meeting times of two walks

static _check_complete(meeting_times: cupy.ndarray) → bool

This method check if all the walks have met

Parameters: meeting_times (2D cp.ndarray) – the matrix of meeting times. Entry i, j is the meeting time of walk i with walk j. The default value of -1 is set when the walks have not met yet.
Returns: True if all the walks have met and False otherwise
Return type: (bool)

static _check_meetings(current_pos: cupy.ndarray, meeting_times: cupy.ndarray) → cupy.ndarray

This method checks if the walks meet for the first time at the current step.

Parameters

current_pos (1D cp.ndarray) – the current position of the walks (i.e. the names of the vertices at which the walks are)
meeting_times (2D cp.ndarray) – the matrix of meeting times. Entry i, j is the meeting time of walk i with walk j. The default value of -1 is set when the walks have not met yet.

Returns

A matrix with entry (i,j) being true if walk i has met walk j for the first time at current_position and false otherwise (i.e. either the walks are not at the same position, or they have met before).

Return type

(2D cp.ndarray of boolean values)

static _compute_next_position_from_proba(proba_next_pos: cupyx.scipy.sparse.csr_matrix) → cupy.ndarray

For each row i of proba_next_pos, corresponding to one random walk, it samples the column index j with the probability specified in the entry proba_next_pos[i][j]. The index j corresponds to the next position of the walk

Parameters: proba_next_pos (2D cupyx.scipy.sparse.csr_matrix) – a matrix with entry i,j being the probability of moving from i to j for the random walk. The matrix has axis 0 of length n_samples (i.e. number of walks) and axis 1 of length n_nodes (i.e. the probability for the walk to move at each one of those nodes)
Returns: the next position of the walks, randomly sampled from proba_next_pos
Return type: (1D cp.ndarray)

_get_meeting_times(max_time_steps: int, start_position: cupy.ndarray) → cupy.ndarray

This method simulates random walks started at start_position and keeps track of their meeting times. Then it returns these meeting times, once all the walks have met or max_time_steps has expired.

Parameters

max_time_steps (int) – the number of time steps for which you want to simulate the random walks at most
start_position (1D cp.ndarray, optional) – a 1D cp.ndarray in which each entry is the starting positions of one sample of the random walk.

Returns

a 2D cp.ndarray with entry m,n being the meeting time between sample walk m and sample walk n. If two walks never meet the value of the m,n entry is -1

Return type

(2D cp.ndarray)

_next_step(array_current: cupyx.scipy.sparse.csr_matrix) → Tuple[cupy.ndarray, cupyx.scipy.sparse.csr_matrix]

Given the current step of all the samples of the random walk, computes the next step for all the samples of the random walk

Parameters

array_current (cupyx.scipy.sparse.csr_matrix) – A matrix with rows indicating the current position of the random walk. Each row is of the form [0,0, …, 0, 1, 0, …, 0]. The entry 1 indicates the current position of the random walk.

Returns

A tuple consisting of:

A 1D cp.ndarray with for each sample of the random walk the index of the vertex where the rw will jump next
A 2D cupyx.scipy.sparse.csr_matrix with rows indicates the next position of the random walk of a sample.

Each row is of the form [0,0, …, 0, 1, 0, …, 0]. The entry 1 indicates the next position of the random walk.

Return type

(cp.ndarray, cupyx.scipy.sparse.csr_matrix)

_start_position(n_samples: int, start_position: cupy.ndarray) → Tuple[cupy.ndarray, cupyx.scipy.sparse.csr_matrix]

Randomly chooses starting vertices for each sample of the random walks

Parameters

n_samples (int) – the number of random walks you want to sample
start_position (1D cp.ndarray) – starting positions for each sample of the walk (if None random starting positions are selected)

Returns

A tuple consisting of:

A 1D cp.ndarray with for each sample of the random walks the index of the starting vertex of the random walks
A 2D cupyx.scipy.sparse.csr_matrix with rows indicates the starting position of the random walk of a sample.

Each row is of the form [0,0, …, 0, 1, 0, …, 0]. The entry 1 indicates the position of the random walk.

Return type

[(cp.ndarray, cupyx.scipy.sparse.csr_matrix)]

get_meeting_times(max_time_steps: int, n_samples: int) → numpy.ndarray

Gets the meeting times necessary to compute delta_g, i.e., it computes n_samples of the meeting time of two randomly started walks.

Parameters

max_time_steps (int) – the number of time steps for which you want to simulate the random walks at most
n_samples (int) – the number of samples of the meeting time that you want

Returns

a 1D np.ndarray in which each entry is one sample of the meeting time of two randomly started walks. If the walks met after max_time_steps the entry is equal to -1 by default.

Return type

(1D np.ndarray)

static random_walk_matrix_from_adjacency(adjacency: scipy.sparse.csr_matrix, degrees: numpy.ndarray) → cupyx.scipy.sparse.csr_matrix

Returns the transition matrix of the random walk

Parameters

adjacency (scipy.sparse.csr_matrix) – the adjacency matrix of the graph for which we want to build the transition matrix
degrees (np.array) – the degrees of the nodes in the graph

Returns

the transition matrix of the random walk. This is defined as P(i,j) = 1/deg(i)

Return type

(cupy.scipy.sparse.csr_matrix)

Module contents

structural_diversity_index.UI: alias of MeetingTimesUI