1 An Evolutionary Multiobjective Approach for Community Discovery in Dynamic Networks
Francesco Folino,Clara Pizzuti
Abstract—The discovery of evolving communities in dynamic networks is an important research topic that poses challenging tasks.Evolutionary clustering is a recent framework for clustering dynamic networks that introduces the concept of temporal smoothness inside the community structure detection method.Evolutionary-based clustering approaches try to maximize cluster accuracy with respect to incoming data of the current time step,and minimize clustering drift from one time step to the successive one.In order to optimize both these two competing objectives,an input parameter that controls the preference degree of a user towards either the snapshot quality or the temporal quality is needed.
In this paper the detection of communities with temporal smoothness is formulated as a multiobjective problem and a method based on genetic algorithms is proposed.The main advantage of the algorithm is that it automatically provides a solution representing the best trade-off between the accuracy of the clustering obtained,and the deviation from one time step to the successive.Experiments on synthetic data sets show the very good performance of the method when compared with state-of-
the-art approaches.
Index Terms—Evolutionary clustering,complex networks,dynamic networks,community discovery.
!
1I NTRODUCTION
The adaptability of networks to represent many real world complex systems,including those undergoing dynamic shifts of their structure,is generating a growing interest in the study of their topological https://www.wendangku.net/doc/f66387069.html,works are modeled as graphs,where nodes represent individual ob-jects,and edges represent interactions among these ob-jects.Individuals in a network interact with each other and exchange information by forming communities.The detection of community structure,i.e.the organization of nodes into groups having many connections inside the same cluster and relatively sparse connections between vertices of different communities,is a fundamental research topic in the study of complex networks.An important standpoint to analyze in networks is their dynamic behavior,i.e.the evolutions they go through over time.
Dynamic networks,in fact,capture the modi?cations of interconnections over time,allowing to trace the changes of network structure at different time steps.Many approaches have been proposed for the analysis and temporal evolution of dynamic networks[1],[2],[3],[4],[5],[6],[7],[8],[9], [10],[11],[12].Some of these methods[2],[5],[10],[3] employ the concept of evolutionary clustering,introduced by Chakrabarti et al.in[13],to catch the evolution of clusters in temporal data.
Evolutionary clustering groups data coming at different ?F.Folino is with National Research Council of Italy(CNR),Institute for High Performance Computing and networking(ICAR),Via Pietro Bucci,41C,87036Rende(CS),Italy,Email:{ffolino}@https://www.wendangku.net/doc/f66387069.html,r.it ?Clara Pizzuti is with National Research Council of Italy(CNR), Institute for High Performance Computing and networking (ICAR),Via Pietro Bucci,41C,87036Rende(CS),Italy,Email: {pizzuti}@https://www.wendangku.net/doc/f66387069.html,r.it time steps to produce a sequence of clusterings by in-troducing a framework called temporal smoothness.This framework assumes that abrupt changes of clustering in a short time period are not desirable,thus it smooths each community over time.Smoothness is realized by trading-off between two different criteria.The?rst,called snapshot quality,is that the clustering should re?ect as accurately as possible the data coming during the current time step.The second,called temporal cost,is that each clustering should not dramatically shift from one time step to the next one. In this paper a multiobjective approach,named DYN-MOGA(DYNamic MultiObjective Genetic Algorithms),to discover communities in dynamic networks by employing genetic algorithms[14],is proposed.The detection of community structure with temporal smoothness,in fact,can be formulated as a multiobjective optimization problem.The ?rst objective is the maximization of the snapshot quality, that measures how well the clustering found represents the data at the current time.The second objective is the minimization of the temporal cost,that measures the distance between two clusterings at consecutive time steps. In order to maximize the snapshot quality to measure the goodness of the division in communities of a network, different quality measures,used in the literature to capture the intuition of network community[15],are considered.In particular,we?x the attention on the concept of modularity, introduced by Newman and Girvan in[16].
To minimize the temporal cost we compute the Nor-malized Mutual Information(NMI for short),a well known entropy measure in information theory that measures the similarity of two clusterings,between the community structure obtained at the current time step with that obtained at the previous one[17].
DYNMOGA exploits the bene?ts of these two functions
Digital Object Indentifier 10.1109/TKDE.2013.1311041-4347/13/$31.00 ? 2013 IEEE
2
and discovers the communities in the network by selectively exploring the search space,without the need to know in advance the exact number of groups.This number is automatically determined by simultaneously optimizing the objectives.
Experiments on synthetic and real life networks show the capability of the multiobjective genetic approach to correctly detect communities with results competitive w.r.t. the state-of-the-art approaches.
It is worth to note that,though multiobjective evolution-ary algorithms have been proposed for partitioning static graphs[18],[19],[20],and for data clustering[21],their use for dynamic networks,however,has not been explored very much[22].
The main contributions of the paper can be summarized as follows:
?The detection of community structure in dynamic networks is formalized as a multiobjective optimiza-tion problem.The?rst objective searches for highly modular structures at the current time step,the second objective tries to minimize the differences between the community structure at the current time step and that obtained at the previous time step.
?The approach proposed can be considered as a general framework for evolutionary clustering.In fact,it is suf-?cient to change one of the two objective functions(or both)to implement and test different quality functions for analyzing dynamic networks.
?The method is parameter free.It does not need neither the parameter introduced by Chi et al.[2]to give different weight to the snapshot and temporal costs, nor the number of clusters to?nd.The former param-eter is automatically determined by the multiobjective method,while the number of communities to?nd at each time step is determined by the genetic represen-tation employed.
The paper is organized as follows.In Section2the concept of dynamic network is de?ned and the evolutionary cluster-ing problem is formalized.Section3gives a review of the evolutionary clustering approaches.Section4formulates the community detection problem in dynamic networks as a multiobjective optimization problem.Section5describes the method,the genetic representation adopted and the variation operators used.In section6the results of the method on synthetic and real life networks are presented, and a comparison with the approaches of[5]and[3]is reported.Section7,?nally,concludes the paper.
2P ROBLEM F ORMULATION
Let{1,...,T}be a?nite set of time steps and V= {1,...,n}be a set of individuals or objects.A static network N t at time t can be modeled as a graph G t= (V t,E t)where V t is a set of objects,called nodes or vertices,and E t is a set of links,called edges,that connect two elements of V t at time t.Thus G t is the graph representing a snapshot of the network N t at time t. V t?V is a subset of individuals V observed at time t.An edge(u t,v t)∈E t if individuals u and v have interacted at time t.
A community(also called cluster or module)in a static network N t is a group of vertices V t i?V t having a high density of edges inside the group,and a low density of edges with the remaining nodes V t/V t i.Let C t denote the sub-graph representing a community.
A clustering,or community structure,CR t= {C t1,...C t k}of a network N t at time t is a partitioning of G t in groups of nodes such that for each couple of communities C t i and C t j∈CR t,V t i∩V t j=?.
A dynamic network is a sequence N={N1,...,N T}of static networks,where each N t is a snapshot of individuals and connections among these individuals at time t.
3R ELATED WORK
Analyzing networks and their evolution is recently receiv-ing increasing interest from researchers[7],[23],[24].In fact,the representation of many complex systems through a static graph,even when the temporal dimension describing the varying interconnections among nodes is available,does not allow to study the network dynamics and the changes it incurs over time.Kumar et al.[4]studied the evolution of network properties of two large blogosphere networks, and tried to classify members of each network into groups and their changes.Sun et al.[9]introduced GraphScope, an ef?cient parameter-free method,based on the Minimum Description Length principle,to discover communities in evolving graphs.Asur et al.[1]have characterized the evolution of communities by de?ning critical events that occur in dynamic graphs.Tang et al.[25]studied multimode networks and introduced a spectral clustering framework to discover communities and?nd out how they evolve.There are several other methods for detecting communities in dynamic networks,not mentioned here.For a recent survey see[26].Many of the proposed approaches divide the community detection step from the temporal analysis of the network,i.e.?rst the communities are extracted,and then the structural differences over time are analyzed in order to determine the correspondences among communities in two consecutive time steps.
A different methodology,known as evolutionary cluster-ing,has been proposed by Chakrabarti et al.in[13].Since our method has been inspired by the evolutionary clustering framework,in the following lines a more detailed descrip-tion of evolutionary clustering approaches is reported. Chakrabarti et al.in[13]observe that,often,changes of connections in short time periods could be caused by noise.Thus,though clustering mainly depends on the current object connections,in many dynamic application domains,abrupt drifts from the most recent history should not be expected.At each time step a new clustering must be produced by simultaneously optimizing two con?icting criteria.The?rst is that the clustering should re?ect,as accurately as possible,the data coming during the current time step.The second is that each clustering should not shift dramatically from one time step to the successive.
3
To satisfy this last property a framework,called temporal
smoothness,is de?ned.This framework assumes that abrupt
change of clustering in a short time period is not desirable, thus it smooths each community over time.For smoothing,
a cost function composed by two sub-costs,the snapshot
cost(SC)and the temporal cost(T C),is de?ned.The snapshot cost SC measures how well a community structure CR t represents the data at time t.The temporal cost T C measures how similar the community structure CR t is to the
previous clustering CR t?1.As pointed out by the authors,
the clustering algorithm must trade-off the bene?t of main-taining a consistent clustering over time(temporal cost) with the cost of deviating from an accurate representation of the current data(snapshot cost)[13].The framework of evolutionary clustering is apt in those situations in which the clustering result is frequently and regularly consumed by a user.Thus mild changes are preferred over dramatic shifts because in such a way the user is not required to learn a new data segmentation.Evolutionary clustering will provide a smooth view of the transition for successive time steps.In this setting it is possible to associate clusters within the historical context,and thus to trace their evolution. Though evolutionary clustering is robust to noise,it does not allow that the number of communities varies over time, neither that new communities may appear,nor that existing communities may dissolve.
Chakrabarti et al.proposed the framework for generic
data clustering,and applied it to two well known clustering
methods,k-means and agglomerative hierarchical cluster-ing,to deal with evolving data.Thus they introduced the cost function for generic data objects.A specialized version of this function in the context of dynamic networks has been ?rst introduced by Chi et al.[2]and adopted also by Lin et al.[5],and Kim and Han[3].Chi et al.[2]de?ne the cost function as follows:
cost=α·SC+(1?α)·T C(1) whereαis an input parameter used by the user to empha-size one of the two objectives.Whenα=1the approach returns the clustering without temporal smoothing.When α=0,however,the same clustering of the previous time step is produced,i.e.CR t=CR t?1.Thus a value between 0and1is used to control the preference degree of each sub-cost.The authors proposed two evolutionary spectral clustering algorithms by using the normalized cut concept of Shi and Malik[27].The two methods differ in how the temporal cost T C is measured.
An evolutionary-based framework for analyzing com-munities and evolutions in dynamic networks,named F acetNet,has been proposed by Lin et al.[5].The framework employs a stochastic block model for gener-ating communities,and a probabilistic model based on the Dirichlet distribution to catch community evolutions.The authors de?ne the snapshot cost by using the KL-divergence between the observed node similarity matrix at time t and an approximate community structure computed by using a mixture model.F acetNet thus,at each iteration,updates the values of the approximate structure in order to decrease the cost function.Convergence to a local optimal solution is guaranteed by the monotonic decrease of the cost function.
F acetNet discovers communities that maximize the?t to the observed data and the temporal evolution.
Kim and Han[3]proposed a particle-and-density based clustering method based on the evolutionary approach of Chakrabarti et al.[13],extended to deal with a variable number of communities between different time steps.The method introduces the concept of nano-community and l-clique-by-clique(l-KK)to discover a variable number of communities that can evolve,form,and dissolve.A nano-community captures the evolution of a dynamic network over time at particle level.A community is modeled as a dense subset of nano-communities and l-KK.A biclique is a complete bipartite graph such that two nodes are connected if and only if they are in different partites.Being complete, each node in a partite is connected with all the nodes in the other partite.An l-clique-by-clique is an extension of a biclique to a number l of bicliques.A cost embedding technique to allow temporal smoothing and a density-based clustering method to?nd local clusters by optimizing the clustering modularity are proposed.Furthermore,to map communities between consecutive time periods,the mutual information concept among the clusterings obtained at time t?1and t is employed,by purifying the link distributions between consecutive networks in order to maximize the mutual information.The mapping can determine the stage of each community as evolving,forming,splitting.It is worth to note that the approach does not take into account neither the possibility of a community to split into multiple communities,nor that multiple communities merge into only one.
The described methods present two main limitations.The ?rst regards the number of clusters to?nd,the second is relative to the valueαto choose in order to apply temporal smoothness.The approach of Chi et al.allows for a varying number k of communities between successive time steps,however the value of k must be given as input parameter.Some heuristics are argued by the authors in order to determine its optimal value.F acetNet assumes a?xed number k of communities over time,though the authors suggest that,to detect the best community number at time t,the algorithms could be executed for a range of different k values,and that giving the best value of modularity could be chosen.Kim and Han’s approach does not need the number of communities,however,since their approach is density-based,they must set the density parameters.Another problem is the value to use for the temporal smoothness.All the approaches must give it as input parameter in order to control the preference degree of a user with respect to either the snapshot quality or the temporal quality.The two quality functions,however,are competing.In fact,optimizing one produces a degradation of the other.
In the following we propose a parameter-free method that automatically determines both the number k of communi-ties,and the optimalαvalue.
4
4M ULTIOBJECTIVE C LUSTERING
Given a static network N t,a multiobjective evolutionary clustering problem(Ω,F1,F2,...,F h)can be de?ned as min F i(CR t),i=1,...,h subject to CR t∈ΩwhereΩ={CR t1,...,CR t k}is the set of feasible cluster-
ings of N t at time stamp t,and F={F1,F2,...,F h}is a set of h single criterion functions.Each F i:Ω→R is a different objective function that determines the feasibility of the clustering obtained.Since F is a vector of competing objectives that must be simultaneously optimized,there is not one unique solution to the problem,but a set of solutions are found through the use of Pareto optimality theory[28].Given two solutions CR1and CR2∈Ω, solution CR1is said to dominate solution CR2,denoted as CR1?CR2,if and only if
?i:F i(CR1)≤F i(CR2)∧?i s.t.F i(CR1) The vector F maps the solution space into the objective function space.When the nondominated solutions are plot-ted in the objective space,they are called the Pareto front. Thus the Pareto front represents the compromise solutions satisfying all the objectives as best as possible. In the last few years many efforts have been devoted to the application of evolutionary computation to the de-velopment of multiobjective optimization algorithms.Evo-lutionary algorithms,in fact,proved very successful to solve multiobjective optimization problems because of the population-based nature of the approach that allows the generation of several elements of the Pareto set in a single run[29],[14]. In the next section we propose a multiobjective evolu-tionary community detection approach that tries to optimize both the snapshot cost and the temporal cost without the need to?x the control parameterα.The solutions contained on the Pareto front of the multiobjective optimization problem will represent the best compromise satisfying both the snapshot and temporal costs. It is worth to note that the word evolutionary has a different meaning with respect to the context in which it is used.For Chakrabarti et al.in[13]the term evolutionary is intended as temporal evolution.In the context of mul-tiobjective optimization it means evolutionary algorithms implementing the concept of Darwinian biological evolu-tion[30]. 5T HE DYNMOGA ALGORITHM The MultiObjective Genetic Algorithm(MOGA)we used is the Nondominated Sorting Genetic Algorithm(NSGA-II) proposed by Srinivas and Deb in[31]and implemented in the Genetic Algorithm and Direct Search Toolbox of MATLAB.NSGA-II builds a population of competing individuals and ranks them on the basis of nondominance (for a detailed description of the approach see[14]).In order to employ NSGA-II,DYNMOGA has been adapted with a customized population type that suitably repre-sents a partitioning of a network and endowed with two complementary objectives.In the following,we?rst give a brief introduction to Genetic Algorithms,and than the objective functions selected,the genetic encoding adopted, and the modi?ed variation operators used to work with this encoding are described. Genetic Algorithms(GAs)are a class of adaptive general-purpose search techniques inspired by natural evolution proposed by Holland[32]in the early1970s as computer programs that simulate the evolution process in nature.A standard Genetic Algorithm evolves a constant-size popu-lation of individuals(called chromosomes)by using the genetic operators of reproduction,crossover and mutation. Each chromosome is composed by a number of genes and represents a candidate solution to a given problem.An individual is associated with a?tness value that re?ects how good it is,with respect to the other solutions in the popula-tion.Reproduction operator copies elements of the current population into the next generation with a selected strategy. Crossover generates two new chromosomes by crossing two elements of the population.Mutation randomly alters genes of individuals. Objective Functions:As described in the previous section,we are interested in optimizing the cost function (formula(1))composed by the two competing objectives, the snapshot cost SC and the temporal cost T C.Since SC measures how well a community structure C t represents the data at time t,we need an objective function that maximizes the number of connections inside each community while minimizing the number of links between the communities. To this end we employ four quality measures,well known in the community detection?eld[15],that formalize the intuitive concept of community. Let CR t={C t1,...C t k}be a clustering of a network G t=(V t,E t)at time t with n nodes and m edges,C t a cluster having n s nodes and m s edges,m s(u)={v| v∈C t}be the number of nodes in C t connected with u,c s={(u,v)|u∈C t,v/∈C t}be the number of edges on the boundary of C t,l s the total number of edges joining vertices inside the module C t s,and d s the sum of the degrees of the nodes of C t s. The scores we consider are modularity[16], conductance[33],normalized cut[27],and community score[22].Their de?nition is as follows: Q= k s=1 [l s m ?(d s 2m )2](2) In modularity Q[16]the?rst term of each summand of is the fraction of edges inside a community,while the second one is the expected value of the fraction of edges that would be in the network if edges fall at random without regard 5 Fig.1.A network of7nodes partitioned in two communities{1,2,3,4} and{5,6,7},and the corresponding locus-based representation. to the community structure.Values approaching1indicate strong community structure. CO= k s=1 c s 2m s+c s(3) Conductance CO[33]measures the fraction of edges pointing outside the clustering. NC= k s=1 c s 2m s+c s +c s 2(m?m s)+c s(4) Normalized Cut NC[27]measures the fraction of total edge connections to all the nodes in the graph. CS= k s=1 ( u∈C t (m s (v) n s )2)× 2m s n s (5) Community Score CS[22]measures the fraction of internal edges of each cluster per nodes. The second objective must minimize the temporal cost T C,thus we need a metric to measure how similar the community structure CR t is to the previous clustering CR t?1.To this end we employ the Normalized Mutual Information,a well known entropy measure in information theory[17].Given two partitionings A={A1,...,A a} and B={B1,...,B b}of a network in communities,let C be the confusion matrix whose element C ij is the number of nodes of the community A i∈A that are also in the community B j∈B.The normalized mutual information NMI(A,B)is de?ned as: NMI(A,B)= ?2 c A i=1 c B j=1 C ij log(C ij N/C i.C.j) c A i=1 C i.log(C i./N)+ c B j=1 C.j log(C.j/N) (6) where c A(c B)is the number of groups in the partitioning A(B),C i.(C.j)is the sum of the elements of C in row i(column j),and N is the number of nodes.If A=B, NMI(A,B)=1.If A and B are completely different, NMI(A,B)=0.Thus our second objective at a generic time step t is to maximize NMI(CR t,CR t?1). Genetic representation:Our clustering algorithm uses the locus-based adjacency representation proposed in[34], adopted in[21]for data clustering and in[20]for com- munity detection in static networks.In this graph-based representation,an individual of the population consists of n genes g1,...,g n,where n is the number of nodes.Each gene can assume a value j in the range{1,...,n}.Genes represent nodes of the graph G=(V,E)modeling a network N,and a value j assigned to the i-th gene is interpreted as a link between the nodes i and j of V.This means that in the clustering solution found,i and j will be in the same cluster.A decoding step,however,is necessary to identify all the components of the corresponding graph. The nodes participating in the same component are assigned to one cluster.A main advantage of this representation is that the number k of clusters is automatically determined by the number of components contained in an individual and determined by the decoding step.Figure1shows a network partition and the corresponding encoded genotype. Input:Given a dynamic network N={N1,...,N T},the sequence of graphs G={G1,...,G T}modeling it,and the number T of time steps. Output:A clustering for each network N i of N. Method:Perform the following steps: 1Generate an initial clustering CR1={C11,...C1k}of the network N1without smoothing by optimizing only the?rst objective; 2for t=2to T 3Create a population of random individuals whose length equals the number n=|V t|of nodes of G t; 4while termination condition is not satis?ed do 5Decode each individual I={g1,...,g n}of the population to generate the partitioning CR t={C t1,...,C t k}of the graph G t in k connected components; 6Evaluate the two?tness values of the translated individuals; 7Assign a rank to each individual and sort them according to nondomination rank; 8Create a new population of offspring by applying the variation operators; 9Combine the parents and offspring into a new pool and partition it into fronts; 10Select points on the lower front(with lower rank)and apply the variation operators on them to create the next population; 11end while 12return the solution CR t={C t1,...C t k}of the Pareto front having the maximum modularity value; 13end for Fig.2.The pseudo-code of the DYNMOGA algorithm. Initialization:A population of random individuals is generated such that for each node i,the value of g i is randomly chosen among one of its neighboring nodes j. This means that the edge(i,j)exists. Uniform Crossover:Given two parents,a random binary mask is created.Uniform crossover(see Table1)then selects the genes where the mask is a0from the?rst parent,and the genes where the mask is a1from the second parent,and combines the genes to form the child.The child at each position i contains a value j coming from one of the two parents.Thus uniform crossover maintains node connections in the child individual since the edge(i,j) exists. TABLE1 Example of uniform crossover. Parent1:4322656 Parent2:3315476 Mask:0110011 O?spring4312676 6 Fig. 3.Normalized mutual information for different combinations of crossover and mutation rates for sysnthetic dataset#1. Mutation:The mutation operator,similarly to initializa-tion,for each node i randomly changes the value of g i with one of the neighbors of i.This mutation guarantees the generation of a mutated child in which each node is linked only with one of its neighbors. The pseudo-code of DYNMOGA is reported in Figure 2.Given a dynamic network N={N1,...,N T}and the sequence of graphs G={G1,...,G T}modeling it, DYNMOGA?nds a partitioning of the network N1by running the genetic algorithm that optimizes only the?rst objective.For a given number of time steps,the multiob-jective genetic algorithm creates a population of random individuals whose length is the number of nodes of the current graph G t.Then,for a?xed number of generations, it decodes the individuals to generate the partitioning at time step t,evaluates the objective values,assigns a rank to each individual according to Pareto dominance and sorts them.A new population is generated by applying the specialized variation operators described above.Parents and offspring are then combined,and the new pool is partitioned into fronts.The individuals with the lower rank are selected and variation operators are applied on them to create the new population.At the end of each time step DYNMOGA returns a set of solutions,i.e.all those contained in the Pareto front.Each of these solutions corresponds to a different trade-off between the two objectives and thus to diverse partitioning of the network consisting of various number of clusters.A criterion should be established to automatically select one solution with respect to another.To this end,we choose the partitioning having the highest value of modularity.This choice is motivated by the fact that, since the Pareto front already selected the nondominated solutions that satisfy the snapshot and the temporal cost at the best,a solution presenting a better community structure is preferable. Computational Complexity:DYNMOGA uses the Non-dominated Sorting Genetic Algorithm(NSGA-II)proposed by Deb et al.in[36].NSGA-II builds a population of competing individuals and ranks them on the basis of non-dominance.In[37]Jensen proposed an ef?cient algorithm for nondominated sorting that can be applied for NSGA-II. Jensen showed that the run-time complexity of the NSGA-II algorithm is O(gp log h?1p),where g is the number of generations,p is the population size,and h is the number of objective functions.Since DYNMOGA optimizes two objectives,its complexity will be O(gp log p).At each generation,however,genetic operators must be ex-ecuted.In particular,crossover needs O(n)time,mutation O(1)time,while?tness computation is composed of three terms:decoding of an individual,modularity and NMI computation.Decoding can be ef?ciently performed by using the disjoint set algorithm described in[35],that represents sets by rooted trees,where each set corresponds to a community.With this data structure the decoding step requires O(n log n)time[35]. To compute modularity we need to consider,for each node i its d i neighbors,thus the time complexity is O(m), where m is the number of edges.As regards normalized mutual information,it has been shown[38]that it can be ef?ciently computed in O(n)time.Fitness computation can thus be realized in O(n log n)+O(m)+O(n) time.Therefore,the overall complexity of DYNMOGA is O((gp log p)×(n log n+m)). In the next section we show that DYNMOGA is able to ?nd meaningful network structure for both synthetic and real life data sets. 6E XPERIMENTAL R ESULTS In this section we study the effectiveness of our approach by?rst employing modularity as objective function that optimizes the snapshot cost,and compare the results ob-tained by DYNMOGA w.r.t.the algorithm of Lin et al.[5] and Kim and Han[3]on synthetic networks for which the partitioning in communities is known.In such a case we show that our multiobjective genetic algorithm successfully detects the network structure and it is very competitive with respect to the other approaches.In Section6.7we then show the behavior of the method when conductance,normalized cut and community score are applied,and report the results obtained by each objective.Finally,we apply our method on two real-world networks. The DYNMOGA algorithm has been written in MAT-LAB,using both the Genetic Algorithms and Direct Search 2toolboxes.Parameter setting is a challenging research problem in evolutionary algorithms.In[39]the problem has been deeply investigated and the authors proved that, though it is possible to?nd good parameter values for a set of problems,general tuning that allows for good perfor-mance on a wide range of problems is dif?cult.As regards DYNMOGA,we employed a trial-and-error procedure on one of the synthetic dataset described below(dataset#1)by computing the normalized mutual information for different combinations of crossover and mutation rates.Figure3 shows the obtained values.It can be observed that they do not present high variation,thus we set crossover rate=0.8 and mutation rate=0.2,since,in general,high crossover rate and low mutation rate are suggested in the literature. Furthermore,we?xed elite reproduction=10%of the population size,roulette selection function,population size =100,and number of generations100. 6.1Evaluation measures In order to compare DYNMOGA and the other approaches on the synthetic data sets,two validation measures,the 7 normalized mutual information(NMI),already described in the previous section,and the error rate are employed.The error rate,as reported in[5],is computed by considering an n×k indicator matrix Z storing,for each of the n nodes, the community membership to one of the k communities obtained by the algorithm,and a similar indicator matrix G built for the ground truth.The error rate is then de?ned as the norm||ZZ T?GG T||,which measures the distance between the community structures represented by Z and G. 6.2Synthetic dataset#1 The?rst dataset we consider is the benchmark adopted by Lin et al.in[5].This data set is generated by the authors analogously to the classical benchmark proposed by Girvan and Newman in[40].The network consists of128nodes divided into four communities of32nodes each.Every node has a?xed average degree avgDegree,and shares a number z of links with the nodes not belonging to its community.Increasing z augments the noise level of the network.Two different values of avgDegree and z have been considered.When avgDegree=16a less dense network is generated,while with avgDegree=20we have a denser network.As regards z,a value z=5generates a more distinct community structure,while?xing z=6,the community structure becomes less clear.Edges are placed with higher probability between a pair of nodes in the same community,and with lower probability between nodes in different communities.These probabilities depend on the value of z.In order to introduce dynamics in G,nC%of nodes are moved among communities.To this purpose,two different cases have been considered.In the former,the community structure presents soft changes,thus the10% of nodes are randomly selected from each community,and randomly assigned to the other three communities.In the second case the changes are more substantial,thus the30% of nodes change their community at each time step.We considered20time steps and the values averaged over50 experiments are reported. Figures4and5show the error rate and the normalized mutual information obtained by DYNMOGA and F acetNet for the different networks built with the parameter values described.The results for F acetNet have been obtained forα=0.8,value chosen by the authors in their paper[5]. In particular,Figure4reports the error rates when(a)z= 5and percentage of node changes nC=10%,(b)z=5 and percentage of node changes nC=30%,(c)z=6and nC=10%,(d)z=6and nC=30%.Figure5reports the normalized mutual information for the same con?guration. These two?gures point out that DYNMOGA obtains lower error rate for all the4synthetic networks,at each time step.As regards the NMI,when the fuzziness is low and the community structure changes are mild(Figure5(a)),the NMI of the two methods are comparable.However,when the fuzziness increases,thus the networks present more dramatic changes,DYNMOGA is able to better detect the true community structure.This behavior is clear in Figures 5(c)and(d). F acetNet needs that the parameterαbe?xed by the user.Lin et al.[5]point out that automatically?nding the best value ofαhas not been investigated,and that setting a value pushes the algorithm to prefer a result more biased towards either snapshot quality or temporal smoothness. Figure6shows the error rate obtained by F acetNet when the average degree has been?xed to20,for increasing values of the parameterα,compared with that obtained by DYNMOGA.Figure6(a)shows the results for z=5and nC=30%,while Figure6(b)for z=6and nC=30%. In the former case,DYNMOGA outperforms F acetNet whenα≤0.7.In the latter case,the errors obtained by DYNMOGA are always lower,independently of the value ofα,except for slight differences at three time steps for α≥0.8.The?gure points out that the choice ofαis not an easy task because it in?uences the quality of the results.As already emphasized,DYNMOGA is able to automatically determine the best tradeoff between the two competing objectives. 6.3Synthetic dataset#2 The second synthetic dataset has been generated by taking into account some main events that may characterize the evolution of dynamic networks[7],[41],[1],[42].To this end we assumed four types of events,as introduced by Greene et al.in[42].The events are the following:?Birth and death:from the second time step on,10%of new communities are created by removing nodes from other existing communities,and randomly removing 10%of the existing communities. ?Expansion and contraction:10%of communities are randomly selected and expanded or contracted by25% of their size.When expanded,the new nodes are chosen at random from the other communities.?Intermittent communities:10%of communities from the?rst time step are hide. ?Merging and splitting:at each time step,10%of com-munities are split,10%of communities are chosen, and couples of communities are merged. We generated four synthetic data sets for the four dif-ferent types of events,for20time steps.The parameters to the generator have been set such that each network is constituted by1000nodes having mean degree of15 and maximum degree50,number of communities between 20and50,and mixing parameter(percentage of edges between communities)0.2.Figures7and8depict the error rate and the normalized mutual information on the four different data sets.The?gures clearly show that DYNMOGA outperforms F acetNet on all these four types of networks.In particular,it is worth to note that the error rate of DYNMOGA is the same or higher than that obtained by F acetNet at the?rst time step.However,from the second time step on,the error rate of DYNMOGA is sensibly lower,and the NMI value is notably higher.Note that that the behavior of F acetNet is rather different with respect to the type of event that can occur.In particular,in the case of birth/death of communities(Figure8(a))its NMI (a) (b) (c) (d) Fig.4. Error rate on synthetic dataset #1when avgDegree =16:(a)z =5and nC =10%,(b)z =5and nC =30%,(c)z =6and nC =10%,(d)z =6and nC =30%. (a)(b) (c) (d) Fig.5. NMI on synthetic dataset #1when avgDegree =16:(a)z =5and nC =10%,(b)z =5and nC =30%,(c)z =6and nC =10%,(d)z =6and nC =30%. (a) (b) Fig.6.Error rate on synthetic dataset #1when avgDegree =20:(a)z =5and nC =30%,(b)z =6and nC =30%,and αvarying between 0.1and 0.9.F N 0.i ,i=1,...9,stands for FacetNet result with input α=0.i (a)(b)(c)(d) Fig.7. Error rate on synthetic dataset #2:(a)birth and death,(b)expansion and contraction,(c)intermittent communities,(d)merging and splitting. drastically decreases from 0.99to 0.4over the 20time steps,while DYNMOGA maintains an NMI value near to 1.As regards the other three events,the NMI obtained by F acetNet diminishes in a more soft way,reaching a value not less than 0.80,while DYNMOGA does not obtain values lower than 0.92. As already reported,the results for F acetNet have been obtained by ?xing α=0.8.In order to check the in?uence of different values of αalso on this kind of data set,Figure 9shows the NMI values obtained when αassumes the values 0,0.4,0.6,and 1,respectively,in case of expansion and contraction events.When α=0the NMI obtained by F acetNet drastically decreases,since the cost function biases the method to ?nd solutions more similar to the previous time step,completely disregarding the current snapshot.The opposite behavior is obtained when α=1.Although in this case F acetNet has to ?nd the community structure that best ?ts the current snapshot,independently from the previous time step,it obtains NMI values between 0.97and 0.83,which are lower than that (a) (b) (c)(d) Fig.8.Normalized mutual information on synthetic dataset#2:(a)birth and death,(b)expansion and contraction,(c)intermittent communities,(d)merging and splitting. (a)α=0 (b)α=0. 4(c)α=0. 6 (d)α=1 Fig.9.NMIs on Derek-Green’s expansion and contraction datasets while varyingα. (a) (b)(c) Fig.10.Error(a)and Normalized Mutual Information(b),and degree distribution(c)in log-log scale on power law synthetic dataset(merging and splitting). obtained by DYNMOGA(above0.98from the second time step on). 6.4Power-law networks Lancichinetti et al.[43]proposed a new class of bench- marks(LFR benchmark)that extend the Girvan and New- man benchmark by introducing power law degree distri- butions and different community size.In[44]it has been experimented that many community detection algorithms perform well on the Girvan and Newman benchmark,but give poor results on the LFR benchmark.Although the network generator of Greene et al.in[42]is based on binary network generation tools proposed in[44],it does not allow to?x the exponent of the power law distribution from which assigning degrees to nodes.Thus,we generated an LFR benchmark constituted by1000nodes,average node degree20,maximum node degree50,exponent of degree distribution-2,community size distribution-1,and mixing parameter0.3.The generated networks have only one node with maximum degree50,while almost70% of nodes have degree lower than the average degree20. In order to introduce dynamics,for5time steps,10%of communities were chosen in a random way,and repeatedly they were split(steps2,4)and merged(steps3,5). Figure10(a)and Figure10(b)show error and nor- malized mutual information obtained by F acetNet and DYNMOGA.We can notice that,after the?rst time step, DYNMOGA is able to better recover the true evolving community structure,and it works well also on power law networks.Figure10(c)reports the degree distribution of networks.The?gure clearly shows the typical long tail of power law distribution. 6.5Synthetic dataset#3 In this section we compare DYNMOGA with the method of Kim and Han[3].It is worth to point out that the results reported for this latter approach are those appearing in[3], and provided by the authors.The comparison has been performed on two kinds of data sets.The?rst one is a dynamic network of a?xed number of communities(named SYN-FIX).The second one is a dynamic network with a variable number of communities(named SYN-V AR). SYN-FIX is similar to the synthetic dataset#1.The network consists of128nodes divided into four communi- ties of32nodes each.Every node has an average degree of16and shares a number z of links with the other nodes of the network.In order to introduce dynamics,3 nodes are randomly selected from each community in G t?1 and randomly assigned to the other three communities. SYN-V AR is obtained by modifying the generation method of SYN-FIX to introduce the forming and dissolving of communities and the attaching and detaching of nodes. The initial networks contains256nodes,divided in4 communities of64nodes each.10consecutive networks are generated by choosing8nodes from each community and generating a new community with these32nodes.This is done for5timestamps,then the nodes return to the original communities.Thus,the number of communities for the10 timestamps is4,5,6,7,8,8,7,6,5,4.The average degree (a)(b) (c)(d) Fig.11.Normalized mutual information on synthetic dataset #3:(a)SYN-FIX when z =3,(b)SYN-FIX when z =5,(c)SYN-VAR when z =3,(d)SYN-VAR when z =5. of each node in a cluster is set to the half of the size of this cluster.Furthermore,at each time step 16nodes are randomly deleted and 16new nodes are added to the network. We generated 10different networks for 10timestamps,run DYNMOGA on them,and computed the Normalized Mutual Information to measure the similarity between the true partitions and the detected ones. Figure 11shows the average normalized mutual infor-mation over the 10timestamps for SYN-FIX when z =3(Figure 11(a))and z =5(Figure 11(b)),for SYN-V AR when z =3(Figure 11(c))and z =5(Figure 11(d)). The ?gure shows the signi?cantly better results obtained by DYNMOGA with respect to Kim and Han’s algorithm.In fact,for SYN-FIX and SYN-V AR,when z =3,DYNMOGA obtains a value which is almost always 1,while the values obtained by Kim and Han’s algorithm are around 0.9for SYN-FIX and aroud 0.7for SYN-V AR.The differences,however,are much more remarkable when z =5.In this case DYNMOGA obtains values above 0.9for all the timestamps,while Kim and Han method fails to uncover the community structure.In fact,it returns values of normalized mutual information between 0.1and 0.2. (a) (b) Fig.12.Error and Normalized Mutual Information on synthetic dataset #2(merging and splitting),single objective versus multiobjective. 6.6Pareto front solution selection A characteristic of multiobjective optimization is the gen-eration of a set of solutions.Thus,a single solution out of this set must be selected.There has been a lot of re-search in this decision making problem,and many different approaches have been proposed [29].As already stated,DYNMOGA prefers,among the Pareto front solutions,that having community structure with the higher modularity value,since this concept has been recognized as the most suitable to interpret the intuitive idea of community.It is worth to note that,the optimization of the two objectives employed by DYNMOGA ,i.e.modularity and normalized mutual information,does not produce the same results of optimizing the single objective of modularity.In fact,in the former case,the Pareto front contains those non-dominated solutions that try to optimize both the snapshot cost (modularity)and the temporal cost (NMI),which is different than optimizing only the snapshot cost (modularity alone).In order to emphasize that optimizing two objectives gives better results than optimizing only modularity,Figure 12compares Error and NMI values obtained by DYNMOGA for the synthetic dataset #2(merging and splitting),with those returned when DYNMOGA is executed by ?xing the temporal cost at zero for all the time steps,thus completely disregarding NMI.The ?gure clearly points out the better performance of the multiobjective approach with respect to the single objective one. (a) (b) Fig.13.Normalized mutual information on synthetic dataset #2with different objective functions:(a)expansion and contraction,(b)merging and splitting. 6.7Changing objective function to compute snapshot cost Results presented so far have been obtained by employing modularity as snapshot cost function.As already pointed out,DYNMOGA can be considered as a general framework for evolutionary clustering.In fact,it is suf?cient to change one of the two objective functions (or both),to implement and test different approaches for analyzing dynamic net-works.In this section we consider the other three quality measures for community detection introduced in Section 5,and compare the normalized mutual information values computed by using these scores. 11 NMI has been computed for the synthetic dataset #2,in particular expansion and contraction,and merging and splitting networks.The results on the other networks are analogous,thus we do not report them for lack of space.Figure 13shows that modularity overcomes all the other three objectives,though community score works quite well and is comparable with it. It is worth to notice that it has been shown that the opti-mization of modularity has a resolution limit that depends on the total size of the network and the interconnections of the modules [45].This limit implies the important drawback that,searching for partitioning of maximum modularity,may lead to solutions in which important structures at small scales could not be discovered.However,to overcome this problem,Granell et al.[46]introduced a resolution control parameter γin the modularity formulation.The new formula being Q R = k s =1[l s m ?γ(d s 2m )2 ].When γ=1the original formulation is obtained,while for increasing values of γ,smaller groups of nodes can be found.Thus,despite the criticisms regarding resolution limit problem,the use of modularity as snapshot cost seems the best choice. (a)g =50(b)g =100(c)g =200 Fig.14. (a)Running time of DYNMOGA in seconds against number of nodes varying as {128,256,512,1024,2048,4096}with corresponding number of edges {1938,4018,8184,16158,33026,65256},on the synthetic dataset #1,z=5,nC=10%,for different combinations of population size p and number of generations g . 6.8Scalability Analysis One of the main criticisms in using Genetic Algorithms,compared with traditional optimization algorithms,is the high execution time required to generate a solution.The major limitation of evolutionary algorithms is,in fact,the repeated ?tness function evaluation that,for complex prob-lems could often be prohibitive.The problem is exacerbated when large populations of individuals are used,and,in particular for the multiobjective approach.In our method ?tness evaluation is rather ef?cient,thus the main problem comes from the network size. To evaluate the scalability of the method we used the synthetic dataset #1,avgDegree=16,z=5,nC=10%,with number of nodes increasing as {128,256,512,1024,2048,4096},and corresponding number of edges {1938,4018,8184,16158,33026,65256},population size p ,and number of generations g varying in the interval [50,100,200]. Figure 14reports the time requirements for one time step,for the different combinations of p and g .The ?gures show that the scaling of DYNMOGA is lower than quadratic in the number of nodes. We now want to study the in?uence of population size and number of generations on the accuracy of the method. (a) (b) Fig.15.Error variation (a)and execution times (b)for different combina-tions of population size p and number of generations g on synthetic dataset #2:merging and splitting,n=1000,m=7728. It is worth to point out that population size p can be viewed as a measure of the parallel search level a GA supports,since p different solutions are considered at the same time to ?nd the local optimum.The more complex the problem to solve,the larger the population to use because the chance of ?nding optimal solutions augments with increasing p .Because of the computation requirements,the choice must balance the quality of the solution versus the computation times.A similar reasoning applies to the number of generations.The longer the algorithm works,the ?tter the solution it can obtain.However,after a number of generations,that depends on the problem,continuing the execution does not provide any improvement of the local optimum obtained so far because the algorithm gets stuck in that local optimum.Figure 15(a)shows how the error varies for different combinations of population size p and number of generations g on synthetic dataset #2(merging and splitting).In particular both p and g assume values in the range [50,100,200].The ?gure points out that the error can drastically drop down from around 9500with p =50and g =50to around 600if p =200and g =200.However the running time (Figure 15(b))sensibly increases.Thus,if the computing resources are limited,the choice of p and g should be done by considering the trade-off between execution time and desired accuracy. It is worth to point out that,though Genetic Algorithms are naturally suited to be implemented on parallel architec-tures [47],the capabilities of the approach to deal with very large networks could be degraded because of the demanding computing requirements.6.9 Real-life data sets In this section we apply our method to two real-life dynamic networks:Cell Phone Calls and Enron mail .Cell Phone Calls:the former data set comes from the VAST 2008mini challenge 3:Cell Phone Calls 1and consists of cell call phone records among the members of the ?ctitious Paraiso movement,covering a period of ten days in June 2006.These records have been used to build a network where each node corresponds to a unique cell phone,and an edge between two cell phones is created when a phone call between the two cell phones occurs.For each edge,the day and time of the phone call are reported.The number of cell phones is 400.Thus ten weighted 1.https://www.wendangku.net/doc/f66387069.html,/hcil/V ASTchallenge08/ 12 TABLE2 Cell dataset statistics:T is the number of time steps(10days on June2006), M the modularity,|C|the number of communities,|V|the number of nodes, |E|the number of edges,|E|?the number of distinct undirected edges(i.e. number of different phone calls between two members),Z the average degree,CC the Watts-Strogatz clustering coef?cient,and B the betweenness of network. T M|C||V||E||E|?Z CC B 10.664032370987525 2.62500.03280.2992 20.656135373964499 2.49500.01920.2532 30.658730374953509 2.54500.01370.2594 40.6540313741013514 2.57000.01790.2289 50.662632373991508 2.54000.01600.2895 60.665125373963512 2.56000.02070.2246 70.657133367936498 2.49000.01180.2356 80.6329363651005511 2.55500.02030.3127 90.653834374982518 2.59000.02900.2991 100.6467323841040530 2.65000.00950.2253 networks can be built by considering the phone calls for each of the ten days.Each edge is labelled with the number of phone calls occurred in that day between two members of Paraiso movement.Five persons are considered the most important in the network,Ferdinando Catalano(node201) and his brother Estaban(node6),David Vidro(node2),and his two brothers Jorge and Juan(nodes3and4).These?ve core members changed their cell phone numbers between days7and8,therefore,for the last three days their node number changed from201,6,2,3,4to301,307,310,361, 398,respectively. Some statistical informations regarding the network are reported in Table2.Since the true community structure is not known,we followed the same approach of Lin et al. [5].We?rst considered the overall network and computed the community structure by applying only the?rst step of our method.The resulting network division had an average modularity value of0.52and an average number of communities equal to25. After that,the clustering obtained on the overall network was considered as the ground truth division,and both the error rate and the normalized mutual information have been computed by executing DYNMOGA on the ten networks. The values obtained are reported in Figure16.It can be observed that DYNMOGA?nds communities that constitute a balance between the snapshot and the temporal costs. In fact,the similarity among the groups obtained for the overall network and those computed for each time step is around0.5(Figure16(a)).It is worth to note that,for each time step,both the modularity values and the number of communities found are higher than those computed on the overall network.In fact we obtained a number of communities between30and36for the different time steps, with a modularity value varying from0.64to0.66(see Table2).This means that the execution of the method on the dynamic network provides a more structured division and allows for a deeper analysis of the network. Finally,in Figure17,the communities of the?ve most important members,described above,at the?rst time step are visualized,along with the evolution of their communi-ties at time steps7and8.The three?gures clearly point out that these?ve persons had a central role until the (a) NMI (b)Error Fig.16.Normalized mutual information and error rate of the CELL dataset. (a)Time step1 (b)Time step7 (c)Time step8 https://www.wendangku.net/doc/f66387069.html,munity snapshots on CELL data set at different time steps. seventh day,directly communicating with almost all the other members of the same group.At the8th day,instead, this peculiarity is lost,as expected and obtained by other studies relative to this network[48]. Enron mail data set:the second real dataset is an email collection from a US enterprise with potential anomalous email communications spanning over a time range of about 13 (a)NMI(b)Error Fig.18.Normalized mutual information and error rate for the Enron dataset. TABLE3 Enron dataset statistics:T is the number of time steps(2001monthly partitioning),M the modularity,|C|the number of communities,|V|the number of nodes,|E|the number of edges,|E|?the number of distinct undirected edges(i.e.number of different mails between two employees),Z the average degree,CC the Watts-Strogatz clustering coef?cient,and B the betweenness of network. T M|C||V||E||E|?Z CC B 10.636811961070180 2.38410.44150.0850 20.68037931559204 2.70200.54060.1002 30.655012971844218 2.88740.46460.0984 40.6571121081869257 3.40400.44300.1462 50.5699151251919292 3.86750.49930.4384 60.6943101201001231 3.05960.39590.1389 70.6530101091325252 3.33770.48820.1833 80.535691312270396 5.24500.47830.4331 90.6324101283152361 4.78150.49500.3070 100.53021313586935757.61590.49570.3494 110.570791276276469 6.21190.50620.1915 120.615281132146325 4.30460.45190.2467 3years(i.e.,from1999to2002).The original dataset2 contains around517,431emails from151users distributed in3500folders.For our tests,we considered a cleaned version of this corpus3described in[49],and containing a subset of252,759emails from151employees.We further reduced the size to about50,000messages by considering only emails exchanged among Enron’s employees. In order to perform a monthly-based analysis we concen-trated on the year2001since it encompasses the maximum number of emails(i.e.more than33,000).We split it in 12subsets,one for each month,by following the same approach used for the CELL dataset.The number of communities obtained on the overall network has been six. Figure18shows the NMI and Error for the12time steps. Also in this case,the dynamic approach provides a?ner division of the network,as pointed out in Table3,where some statistical informations regarding the network for each of the12time steps are reported. 7C ONCLUSIONS A multiobjective method based on genetic algorithms for detecting communities in dynamic networks has been pre-sented.The algorithm,at each time step,provides the so-lution representing the best trade-off between the accuracy of the clustering obtained with respect to the data at the current time step,and the drift from one time step to the successive.The proposed approach can be considered 2.available at https://www.wendangku.net/doc/f66387069.html,/~enron/ 3.available at ftp://https://www.wendangku.net/doc/f66387069.html,/sims/philpot/data/enron-mysqldump.sql.gz as a general framework for evolutionary clustering since changing one of the two objective functions(or both)allows to implement and test different criteria for the analysis of dynamic networks.Experimental results on several kinds of synthetic data sets showed the good performance of our approach compared with other state-of-the-art methods. It is worth to point out that genetic algorithms,com-pared with traditional optimization algorithms,require high execution time to generate a solution.Experiments showed that increasing population size p and number of generations g produces a positively in?uence on the accuracy of the method.However the running time sensibly increases. Thus,if the computing resources are limited,the choice of p and g should be done by considering the trade-off between execution time and desired accuracy. 8A CKOWLEDGMENTS We wish to thank Yu-Ru Lin for providing us the F acetNet software,including the generation of the synthetic data set #1,Derek Green for the software generator of the synthetic dataset#2,and Min-Soo Kim for the synthetic data set generator,and the results obtained by his method on these synthetic networks. R EFERENCES [1]S.Asur,S.Parthasarathy,and D.Ucar,“An event-based framework for characterizing the evolutionary behavior of interaction graphs,” ACM Transactions on Knowledge Discovery from Data,vol.3,no.4, p.Paper16,2009. [2]Y.Chi,X.Song,D.Zhou,K.Hino,and B.Tseng,“On evolutionary spectral clustering,”ACM Transactions on Knowledge Discovery from Data,vol.3,no.4,Article17,2009. [3]M.Kim and J.Han,“A particle-and-density based evolutionary clus- tering method for dynamic networks,”in Proc.of the International Conference on Very Large Data Bases(VLDB’09),2009,pp.–. [4]R.Kumar,J.Novak,and A.Tomkins,“Structure and evolution of online social networks,”in Proc.Int.Conf.on Knowledge Discovery and Data Mining(KDD’06),2006,pp.611–717. [5]Y.-R.Lin,S.Zhu,H.Sundaram,and B.L.Tseng,“Analyzing communities and their evolutions in dynamic social networks,”ACM Transactions on Knowledge Discovery from Data,vol.3,no.2, Article18,2009. [6]L.Leskovec,J.Kleinberg,and C.Faloutsos,“Graphs over time: densi?cation laws,shrinking diameters and possible explanations,” in Proc.Int.Conf.on Knowledge Discovery and Data Mining (KDD’05),2005,pp.177–187. [7]G.Palla, A.Barabasi,and T.Vicsek,“Quantifying social group evolution,”Nature,no.466,2007. [8]M.Spiliopoulou,I.Ntoutsi,Y.Theodoridis,and R.Schult,“Monic- modeling and monitoring cluster transitions,”in Proc.Int.Conf.on Know.Discovery and Data Mining(KDD’06),2006,pp.706–711. [9]J.Sun,C.Faloutsos,S.Papadimitriou,and P.Yu,“Graphscope: parameter-free of large time evolving-graphs,”in Proc.International Conference on Knowledge Discovery and Data Mining(KDD’05), 2005,pp.687–696. [10]L.Tang,H.Liu,J.Zhang,and Z.Nazeri,“Community evolution in dynamic multi-mode networks,”in Proc.International Conference on Knowledge Discovery and Data Mining(KDD’07),2007,pp. 677–685. [11]T.Xu,Z.Zhang,P.S.Yu,and B.Long,“Evolutionary clustering by hierarchical dirichlet process with hidden markov state,”in Proc.of the8th IEEE International Conference on Data Mining(ICDM’08), 2008,pp.658–667. [12]——,“Dirichlet process based evolutionary clustering,”in Proc.of the8th IEEE International Conference on Data Mining(ICDM’08), 2008,pp.648–657. 14 [13] D.Chakrabarti,R.Kumar,and A.Tomkins,“Evolutionary cluster- ing,”in Proc.of the12th ACM Int.Conf.on Knowledge Discovery and Data Mining(KDD’06),2006,pp.554–560. [14]K.Deb,Multi-Objective Optimization using Evolutionary Algo- rithms.John Wiley&Sons,Ltd,Chichester,England,2001. [15]J.Leskovec,https://www.wendangku.net/doc/f66387069.html,ng,and M.Mahoney,“Empirical comparison of algorithms for network community detection,”in Proc.Int.World Wide Web Conference(WWW2010),2010,pp.631–640. [16]M. E.J.Newman and M.Girvan,“Finding and evaluating community structure in networks,”Physical Review E,vol.69,p. 026113,2004.[Online].Available:https://www.wendangku.net/doc/f66387069.html,/abstract? id=oai:https://www.wendangku.net/doc/f66387069.html,:cond-mat/0308217 [17]L.Danon,A.Daz-Guilera,J.Duch,and A.Arenas,“Comparing community structure identi?cation,”Journal of Statistical Mechan-ics,vol.P09008,2005. [18] D.Datta,J.Figuera,C.Fonseca,and F.Tavares-Pereira,“Graph partitioning through a multi-objective evolutionary algorithm:A preliminary study,”in Proc.of the Genetic and Evolutionary Com-putation Conference(GECCO’08),2008,pp.625–632. [19]G.N.Demir, A.S.Uyar,and S.G.˙Oˇg˙u d˙u c˙u,“Multiobjective evolutionary clustering of web user sessions:a case study in web page recommendation,”Soft Computing,vol.14,no.6,pp.579–597,2010. [20] C.Pizzuti,“A multiobjective genetic algorithm to?nd communities in complex networks,”IEEE Transactions on Evolutionary Compu-tation,vol.16,no.3,pp.418–430,2012. [21]J.Handle and J.Knowles,“An evolutionary approach to multiobjec- tive clustering,”IEEE Transactions on Evolutionary Computation, vol.11,no.1,pp.56–76,2007. [22] F.Folino and C.Pizzuti,“A multiobjective and evolutionary clus- tering method for dynamic networks,”in Proc.of the International Conference on Advances in Social Networks Analysis and Mining (ASONAM10),,2010,pp.256–263. [23]P.J.Mucha,T.Richardson,K.Macon,M.A.Porter,and J.Onnela, “Community structure in time-dependent,multiscale,and multiplex networks,”Science,vol.328,pp.876–878,2010. [24]P.Holme and J.Saramaki,“Temporal networks,”2011.[Online]. Available:https://www.wendangku.net/doc/f66387069.html,/pdf/1108.1780.pdf [25]L.Tang,H.Liu,J.Zhang,and Z.Nazeri,“Identifying evolving groups in dynamic multimode networks,”IEEE Transactions on Knowledge and Data Engineering,vol.24,no.1,pp.72–85,2007. [26]T.Y.Berger-Wolf,C.Tantipathananandh,and D.Kempe,“Dynamic community identi?cation,”in in P.S.Yu et al.(eds.)Link Mining: Models,algorithms,and Applications,2010,pp.307–336. [27]J.Shi and J.Malik,“Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.22,no.8,pp.888–905,2000. [28]M.Ehrgott,Multicriteria Optimization.Springer,Berlin,2nd edition,2005. [29] C.A.C.Coello,https://www.wendangku.net/doc/f66387069.html,mont,and D.A.V.Veldhuizen,Evolu- tionary Algorithms for Solving Multi-Objective Problems.Springer, 2007. [30] D.Goldberg,Genetic Algorithms in Search,Optimization,and Machine Learning.Addison-Wesley,1989. [31]N.Srinivas and K.Deb,“Multiobjective optimization using non- dominated sorting in genetic algorithms,”Evolutionary Computation, vol.2,no.3,pp.221–248,1994. [32]J.H.Holland,Adaptation in Natural and Arti?cial Systems.Univ. of Michigan Press,Ann Harbor Mich.,1975. [33]R.Kannan,S.Vempala,and A.Vetta,“On clustering:Good,bad and spectral,”Journal of the ACM,vol.51,no.3,pp.497–515,2004. [34]Y.Park and M.Song,“A genetic algorithm for clustering problems,” in Proc.of3rd Annual Conf.on Genetic Algorithms,1989,pp.2–9. [35]T.H.Cormen,C.E.Leiserson,R.L.Rivest,and C.Stein,Introduc- tion to Algorithms-Second Edition.Mit Press,2007. [36]K.Deb,A.Pratap,S.Agarwal,and T.Meyarivan,“A fast and elitist multi-objective genetic algorithm:NSGA-II,”IEEE Transactions on Evolutionary Computation,vol.6,no.2,pp.182–197,2002. [37]M.T.Jensen,“Reducing the run-time complexity of multiobjective eas:The nsga-ii and other algorithms,”IEEE Transactions on Evo-lutionary Computation,vol.7,no.5,pp.503–515,2003. [38]L.Can-Tao and H.Bao-Gang,“Mutual information based on renyi’s entropy feature selection,”in Proc.of the IEEE International Confer-ence on Intelligent Computing and Intelligent Systems(ICIS2009), 2009,pp.816–820.[39]S.K.Smit and A.E.Eiben,“Parameter tuning of evolutionary algorithms:Generalist vs.specialist,”in Applications of Evolutionary Computation,Springer,2010,pp.542–551. [40]M.Girvan and M.E.J.Newman,“Community structure in social and biological networks,”in Proc.National.Academy of Science. USA99,2002,pp.7821–7826. [41] C.Tantipathananandh,T.Y.Berger-Wolf,and D.Kempe,“A frame- work for community identi?cation in dynamic social networks,”in Proc.of the13th ACM SIGKDD Int.Conf.on Knowledge Discovery and Data Mining(KDD’07),2007,pp.217–226. [42] D.Greene, D.Doyle,and P.Cunningham,“Tracking the evolu- tion of communities in dynamic social networks,”in International Conference on Advances in Social network Analysis and Mining (ASONAM’10),2010,pp.176–183. [43] https://www.wendangku.net/doc/f66387069.html,ncichinetti,S.Fortunato,and F.Radicchi,“Benchmark graphs for testing community detection algorithms,”Physical Review E, vol.78,no.046110,2008. [44] https://www.wendangku.net/doc/f66387069.html,ncichinetti and S.Fortunato,“Community detection algorithms: a comparative analysis,”Phys.Rev.E,vol.80,no.056117,2009. [45]S.Fortunato and M.Barth′e lemy,“Resolution limit in community detection,”Proc.National Academy of Science,USA,vol.104, no.36,2007. [46] C.Granell,S.G′o mez,and A.Arenas,“Unsupervised clustering analisys:A multiscale complex network approach,”Journal of Bifurcation and Chaos,vol.22,no.7,p.1230023,2012. [47]M.Tomassini,Parallel and Distributed Evolutionary Algorithms:A Review.in Evolutionary Algorithms in Engineering and Computer Science,J.Wiley and Sons,Chichester et al.eds.,1999. [48] A.Perer,“Using socialaction to uncover structure in social networks over time,”in Proc.of IEEE Symposium on Visual Analytics Science and Technology(VAST’08),2008,pp.213–214. [49]J.Shetty and J.Adibi,“The enron email dataset database schema and brief statistical report,”2004.[Online].Available: https://www.wendangku.net/doc/f66387069.html,/~ enron/ Francesco Folino is currently researcher at the Institute of High Performance Computing and Networks(ICAR-CNR)of the National Research Council of Italy.He graduated in Computer Science Engineering in2003,and he holds a Ph.D.in Computer Science Engi- neering from the University of Calabria(Italy) in2006.His research interests mainly con- cern the?elds of Information Systems,Data Mining,Process Mining and Social Network Analysis.He is involved in several projects at ICAR-CNR concerning applications of data mining and knowledge discovery techniques. Clara Pizzuti received the Laurea degree in Mathematics from the University of Cal- abria,Italy.She is a senior researcher at the Institute of High Performance Computing and Networking(ICAR)of the Italian National Research Council(CNR).Since1995she is also a contract professor in the department of Computer Science at the University of Cal- abria.In the past,she worked in the research division of a software company on deductive databases,advanced logic based systems, and abduction.She has published more than seventy papers in conference proceedings and journals.Her research interests include knowledge discovery in databases,data mining,data streams,bioin-formatics,e-health,social network analysis,evolutionary computa-tion,genetic algorithms,and genetic programming.She is serving as program committee member of international conferences,and as reviewer for several international journals.