recurrent-segmentation-sbm
Reccurent segmentation meets block models in temporal networks
Overview of the code
The file sbm_core.py
is the main part of the code, which contains utility functions for main algorithms.(K-segmentation algorithm and two variants of (K-H) segmentation algorithm).
The class optimize.py
contains code of three main algorithms. It uses the functions contained in sbm_core.py
.
The estimated intensity functions, groups/clusters and change-points can be obtained from calling functions in optimize.py
.
The file utils.py
contains the utility code to read data, SMAWK utilities, creating networkx graph object etc.
The file experiments.py
contains required functions to run the simulations.
Experimental files and Running the code
num_roles=1
num_segments=1
num_levels=1
algo_ver= 3
The files bikes_santander.py
, collegeMsg.py
, bitcoin.py
, eu_email_dep1.py
,eu_email_dep2.py
, mathoverflow.py
, and mooc.py
provide examples on real world dynamic networks.
This zip folder consists experiments for both for Synthetic and real-world datasets. The data we have used is publicly available.
There are four main user parameters. Example is as follows.
num_roles=2
num_segments=3
num_levels=2
algo_ver= 3
dest_folder='./Results/bikes/'
algo_ver 3
is dedicated to (K-H)-segmentation algorithm using SMAWK.
algo_ver 2
is dedicated to (K-H)-segmentation algorithm using naive segmentation.
There are two tuning parameters. Example is as follows.
theta = 1e-5
eta = 1
All the experimental files can be run directly to do the simulations.
From the file synthetic_experiment_1.py
to synthetic_experiment_5.py
contain code to simulate synthetic datasets using our algorithms.
These files contain the code to generate synthetic data as well.
The files likelihood_vs_H_bikes.py
,likelihood_vs_H_bitcoin.py
,likelihood_vs_H_dep2.py
return normalized likelihood values for
a set of given H
levels. eg: current_h
= 1,2,3,4,5,6,7,8,...20;
To reproduce the results of the paper just run synthetic_experiment_likelihood_vs_H_?.py
files.
Note that Normalized log-likelihood is the ratio between a particular likelihood and
the likelihoodvalue which corresponds to a single group and a single segment.
The file bike_times_edges_large.py
returns the running time and edges for given a fraction of edges.
To reproduce the results of the paper, eg. set _frac
= .4, .5, .8 or 1; one at a time. Note that running time can be dependent on the machine you run.
To switch the algorithm you choose, change algo_ver
parameter to either 2 or 3.
The file synthetic_experiment_time_vs_edges.py
returns the running time and edges for given NO_SAMPLES
.
To reproduce the results of the paper, eg. set NO_SAMPLES
= 50 from the list_samples = [50,100,150,200] ; one at a time.
Level-dependent Membership extension:
New algo_ver 4
is dedicated to level-dependent (K-H)-segmentation algorithm variant.
From the file ld-1-syn.py
to ld-5-syn.py
contain code to simulate synthetic datasets using our level-dependent algorithms.
These files contain the code to generate synthetic data as well.
The files ld-bikes_santander.py
, ld-bitcoin.py
, ld-eu_email_dep1.py
,ld-eu_email_dep2.py
, mathoverflow.py
, and mooc.py
provide examples on real world dynamic networks for level-dependent variant.
References
SMAWK code: https://www.ics.uci.edu/~eppstein/PADS/SMAWK.py