Multiplicity in the Partitioning of Signed Graphs

Arinik, Nejat

doi:10.6084/m9.figshare.14551113.v3

these_materials.zip (4.83 GB)

Multiplicity in the Partitioning of Signed Graphs

Version 3 2021-11-28, 22:23

Version 2 2021-09-01, 15:39

Version 1 2021-05-06, 20:46

thesis

posted on 2021-11-28, 22:23 authored by Nejat ArinikNejat Arinik

This repository contains all datasets and results used in the following PhD thesis: N. Arınık, Multiplicity in the Partitioning of Signed Graphs, Avignon Université, 2021

Chapter2: Correlation Clustering Resolution Methods

data&results/Dataset 2.1: This dataset is used to assess the peformances of the exact methods. We generate these complete and incomplete networks through our random signed network generator, which is publicly available online (https://github.com/CompNet/SignedBenchmark). For complete unweighted signed networks, this model relies on only three parameters: n (number of vertices), \ell_0 (initial number of modules) and q_{m} (proportion of misplaced edges, i.e. edges meant to be frustrated by construction). Moreover, we make the assumption that the proportion of misplaced edges is the same inside and between the modules. When it comes to incomplete unweighted signed networks, we introduce two more parameters, which are the density d of the graph and the proportion q_{neg} of the negative edges. The last parameter q_{neg} allows to control the ratio of positive to negative edges.

comparison between BD(F*_e) and B&C(F_e): containing the network and result files regarding the comparison of two exact methods relying on the ILP formulation with edge variables.
Comparison between B&C(F_v) and B&C(F_e): containing the network and result files regarding the comparison of two exact methods relying on two ILP formulations: vertex pair vs. edge variables.

data&results/Dataset 2.2: This dataset is used to assess the peformances of the heuristic methods. We generate these complete and incomplete networks through the same random signed network generator as in Dataset 2.1.

networks: For complete unweighted signed networks with d = 1, we generate 20 replications for parameter values \ell_0 = 3, n = 50 and q_m = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6}. In these networks, the value of q_{neg} with the considered parameters is approximately 0.7. For incomplete unweighted signed networks with d = {0.25, 0.50}, we generate 20 replications for parameter values \ell_0 = 3, n = 50, q_{m} = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6} and q_{neg} = {0.3, 0.5, 0.7}. In total, we produce 180 and 720 instances for complete and incomplete networks, respectively, which makes a total of 900 instances.
partitions: containing optimal partitions found by B&C(F_v) and the partitions found by the considered 10 heuristics.
evaluate-partitions: containing the evaluation results for each heuristic method (e.g. imbalance)

data&results/Dataset 2.3: This dataset is used to assess the peformances of the heuristic methods. We generate these complete and incomplete networks through our another random signed network generator, which is publicly available online (https://github.com/arinik9/SignedStabilityBenchmark). The optimal solution for a generated network is known by construction. For a given n, d and \ell_0, we first create a perfectly structurally balanced signed network with a built-in module structure. The underlying module structure constitutes the optimal partition. Then, for taking into account different positive to negative ratio values for internal and external edges we generate several signed networks by perturbing the initial signed network without affecting its underlying optimal partition, thanks to its definition of stability range.

networks: We generate signed networks with parameter values n = 150, d \in {0.25, 1.00} and \ell_0 = {6, 8, 10}. In total, we produce 13 and 49 instances for complete and incomplete networks, respectively, which makes a total of 62 instances.
partitions: containing the partitions found by the considered 10 heuristics. (Our source code: https://github.com/arinik9/BenchmarkCC)
evaluate-partitions: containing the evaluation results for each heuristic method (e.g. imbalance)
output csv

data&results/Dataset 2.4: This dataset is used to assess the peformances of the heuristic methods based on 6 real signed networks.

networks: containing 6 real signed networks, which are 'Wikipedia Election', 'Slashdot', 'Yeast', 'E. col', 'EGFR' and 'Macrophage'.
results/partitions:
results/evaluate-partitions:

figs: containing all figures used in Chapter 2.

Chapter3: Characterizing Measures for Partition Comparison

data&results

input: containing 6 csv files, each associated with an external measure. Each csv file contains the measure scores obtained a set of synthetic partitions pairs. The generation of these synthetic partitions is explained in Section 3.3.1.
output: containing raw and processed regression and sensitivity analysis results.

figs: containing all figures used in Chapter 3.

Chapter 4: Multiple Partitioning of Multiplex Signed Networks

data: We use the 7th term of European Parliament voting dataset extracted from the website itsyourparliament.

raw data: the whole raw data regarding the 7th term (2009-14)
rollcall-networks: focusing on a subset of the whole data and containing the networks of French and Italian MEPs voting on "Agriculture and Rural Development" (AGRI) roll-calls during 2012-13.

results:

rollcall-partitions: containing partitioning results, so-called patterns, by applying an exact method onto the networks of French and Italian MEPs, as explained in Section 4.3.1.
rollcall-clustering&characterization: containing the clustering&characterization results and final characteristic patterns, as explained in Sections 4.3.2, 4.3.3 and 4.3.4.

figs: containing all figures used in Chapter 4.

Chapter 5: Enumeration of the Space of Optimal Solutions For the Correlation Clustering Problem

data&results/Dataset 5.1 experiments (Our source code: https://github.com/arinik9/BenchmarkCC)

networks: We generate these complete and incomplete networks through the same random signed network generator as in Dataset 2.1. For complete unweighted signed networks with d = 1, we generate 20 replications for parameter values \ell_0 = 3, n \in {32, 36, 40, 45, 50} and q_m = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6}. In these networks, the value of q_{neg} with the considered parameters is approximately 0.7. For incomplete unweighted signed networks with d = {0.25, 0.50}, we generate 20 replications for parameter values \ell_0 = 3, n \in {32, 36, 40}, q_{m} = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6} and q_{neg} = {0.3, 0.5, 0.7}. In total, we produce 600 and 1,080 instances for complete and incomplete networks, respectively, which makes a total of 1,680 instances.
partitions: folder containing the partitioning results of two methods: EnumCC(3) vs. OneTreeCC(). Note that the results of OneTreeCC() are not shown for space considerations, except for n=50.
results/delay_exec_time: all the results and plots regarding the difference of execution times between EnumCC(3) and OneTreeCC() (i.e., EnumCC(3) minus OneTreeCC()), represented on the log-scaled y-axis of the plots. When such difference takes a negative value, this means our proposed method EnumCC(3) runs faster than OneTreeCC().
results/EnumCC_nb-jumps: all the results regarding the number of jumps related to EnumCC(3), i.e. njump(EnumCC(3))
results/exec_time: all the results regarding the execution times of EnumCC(3) and OneTreeCC().
results/nb-sols: all the results regarding the number of optimals solutions based on EnumCC(3). Note that we show the results of OneTreeCC() only for those with n=50, since both methods run out of the time limit of 12h for several networks with n=50.

data&figs/Dataset 5.1 extra experiment: containing the results of Table 5.3

networks
partitions

data&results/Dataset 5.2 experiment

benchmark netwoks: We generate these complete and incomplete networks through the same random signed network generator as in Dataset 2.3. We generate signed networks with parameter values n \in {30, 40, 50, 60, 70, 90}, d \in {0.25, 1.00} and \ell_0 = {2, 4, 6}. In total, we produce 214 and 184 instances for complete and incomplete networks, respectively, which makes a total of 398 instances.
benchmark partitions: folder containing the partitioning results of two methods: CoNS(r_{max}) with vs. without MVMO pruning, where r_{max} \in {3,4}.
results: two csv files containing benchmark results between CoNS(r_{max}) with vs. without MVMO pruning, where r_{max} \in {3,4}.

figs: containing all figures used in Chapter 5.

Chapter 6: Investigation of the Space of Optimal Solutions For the Correlation Clustering Problem

data&results/Dataset 5.1: We rely on the optimal partitions found for Dataset 5.1 in order to study the space of optimal solutions for the CC problem.

cluster analysis: containing the results of our clustering step, as explained in Section 6.4.3.
cluster characterization: containing the results of our cluster characterization step, as explained in Section 6.4.4.

data&results/Syrian Conflict Network & Partitions:
figs: containing all figures used in Chapter 6.