README.md


# recurrent-segmentation-sbm

Reccurent segmentation meets block models in temporal networks

# Overview of the code

The file `sbm_core.py` is the main part of the code, which contains utility functions for main algorithms.(K-segmentation algorithm and two variants of (K-H) segmentation algorithm).
The class `optimize.py` contains code of three main algorithms. It uses the functions contained in `sbm_core.py`.
The estimated intensity functions, groups/clusters and change-points can be obtained from calling functions in `optimize.py`.
The file `utils.py` contains the utility code to read data, SMAWK utilities, creating networkx graph object etc.
The file `experiments.py` contains required functions to run the simulations.

# Experimental files and Running the code

```
num_roles=1
num_segments=1
num_levels=1
algo_ver= 3
```

The files `bikes_santander.py` , `collegeMsg.py` , `bitcoin.py` ,  `eu_email_dep1.py`,`eu_email_dep2.py`, `mathoverflow.py`, and `mooc.py`
provide examples on real world dynamic networks.

This zip folder consists experiments for both  for Synthetic and real-world datasets.
The data we have used is publicly available.

There are four main user parameters.  Example is as follows.
```
num_roles=2
num_segments=3
num_levels=2
algo_ver= 3
dest_folder='./Results/bikes/'
```

algo_ver `3` is dedicated to  (K-H)-segmentation algorithm using SMAWK.
algo_ver `2` is dedicated to  (K-H)-segmentation algorithm using naive segmentation.

There are two tuning parameters. Example is as follows.
```
theta = 1e-5
eta = 1
```
All the experimental  files  can be run directly to do the simulations.

From the file `synthetic_experiment_1.py` to `synthetic_experiment_5.py` contain code to simulate synthetic datasets using our algorithms.
These files contain the code to generate synthetic data as well.

The files `likelihood_vs_H_bikes.py`,`likelihood_vs_H_bitcoin.py`,`likelihood_vs_H_dep2.py`  return normalized likelihood values for
a set of given `H` levels. eg: `current_h` = 1,2,3,4,5,6,7,8,...20; 
To reproduce the results of the paper just run `synthetic_experiment_likelihood_vs_H_?.py`files.

Note that Normalized log-likelihood is the  ratio  between  a  particular  likelihood  and  
the  likelihoodvalue  which  corresponds  to  a  single  group  and  a  single segment.

The file `bike_times_edges_large.py`  returns the running time and edges for given a fraction of edges. 
To reproduce the results of the paper, eg. set `_frac` = .4, .5, .8 or 1; one at a time. Note that running time can be dependent on the machine you run.
To switch the algorithm you choose, change `algo_ver` parameter to either 2 or 3.

The file `synthetic_experiment_time_vs_edges.py`  returns the running time and edges for given `NO_SAMPLES`.
To reproduce the results of the paper, eg. set `NO_SAMPLES` = 50 from the list_samples =  [50,100,150,200] ; one at a time.

Level-dependent Membership extension:

New algo_ver `4` is dedicated to  level-dependent  (K-H)-segmentation algorithm variant.

From the file `ld-1-syn.py` to `ld-5-syn.py` contain code to simulate synthetic datasets using our level-dependent algorithms.
These files contain the code to generate synthetic data as well.

The files `ld-bikes_santander.py` , `ld-bitcoin.py` ,  `ld-eu_email_dep1.py`,`ld-eu_email_dep2.py`, `mathoverflow.py`, and `mooc.py`
provide examples on real world dynamic networks for level-dependent variant.


## References

SMAWK code: https://www.ics.uci.edu/~eppstein/PADS/SMAWK.py