文档库 最新最全的文档下载
当前位置:文档库 › Spectral Analysis of Internet Topologies Abstract — We

Spectral Analysis of Internet Topologies Abstract — We

Spectral Analysis of Internet Topologies Christos Gkantsidis,Milena Mihail,and Ellen Zegura

College of Computing

Georgia Institute of Technology,Atlanta,Georgia30332

Email:{gantsich,mihail,ewz}@https://www.wendangku.net/doc/6511286355.html,

Abstract—We perform spectral analysis of the Internet topol-ogy at the AS level,by adapting the standard spectral?ltering method of examining the eigenvectors corresponding to the largest eigenvalues of matrices related to the adjacency matrix of the topology.We observe that the method suggests clusters of ASes with natural semantic proximity,such as geography or business interests.We examine how these clustering properties vary in the core and in the edge of the network,as well as across geographic areas,over time,and between real and synthetic data. We observe that these clustering properties may be suggestive of traf?c patterns and thus have direct impact on the link stress of the network.Finally,we use the weights of the eigenvector corresponding to the?rst eigenvalue to obtain an alternative hierarchical ranking of the ASes.

I.I NTRODUCTION

Studying and modeling network topologies is necessary for protocol performance evaluation and simulation of a variety of network problems.Early modeling efforts focused around random graphs with relatively regular degree distributions[6], [37],[40],[41].With the rapid growth of the network and the persistent effort of network measurement[13],[14],[34],real topology data started becoming available,in particular at the AS(Autonomous System)https://www.wendangku.net/doc/6511286355.html,ing such data Faloutsos et al.?rst observed that the degree distribution of the AS level topology is actually consistently highly skewed[11]. Consequently,the research community has shown considerable interest in obtaining topology models that better resemble the real data[2],[5],[18],[23],as well as understanding the impact of such network topologies on the performance of network protocols[27],[32].

This new generation of synthetic Internet topology models is strongly driven by the observed skewed statistics of the degree sequence and its evolution,and by even further observations of more detailed graph theoretic characteristics of the network. Most notably,following the natural intuition that,for example, geography must be relevant in the real Internet topology, [5]paid special attention to the“clustering”coef?cient;the observation of the signi?cance of geography has been also made in[39]and[21].

In this paper we revisit the issue of clustering.As opposed to previous work that has focused on the clustering coef?cient, our starting point is the method of spectral?ltering.This method examines the large eigenvalues of matrices related to the adjacency matrix,and looks for clusters in the eigenvectors The?rst and second authors were funded by NSF ITR-0220343;the third author was funded by NSF ANI-0081557.This work was also funded by a Georgia Tech Eden?eld Faculty Fellowship.associated with these eigenvalues.Indeed,the?rst reference to the large eigenvalues of the adjacency matrix of the AS Internet topology is the“eigenvalue power-law”which was reported together with the“degree power-law”in[11].The connection between spectral?ltering and graph connectivity, including clustering,has been extensively studied in discrete mathematics(e.g.see the books of[9],[29]and the further references that they point to),and has found very successful applications in information retrieval and data-mining where clusters represent groups of data with semantic proximity [3],[17],[20],[26],[28].Practical experience suggests that spectral analysis might be better suited for data that lack regularity(thus it has been extensively used in computer science),while clustering coef?cients are better suited for data that have stronger regularities(thus they have been extensively used by physicists who study lattices,crystals,etc.).Indeed,by de?nition,spectral?ltering yields a large number of clusters, and it can be applied iteratively in subgraphs of a network.By contrast,it is not clear how to grow clusters around nodes with large clustering coef?cient and this approach is not typical in information retrieval or data-mining1.

Our contributions include:

?The observation that the eigenvectors related to the largest eigenvalues of the adjacency matrix of the AS topology examined in[11]do not express interesting clusters.This is an experimental validation of the result of[24]who showed that the eigenvalues power law is a consequence of the degree power law.We thus conclude that further normalizations are needed to retrieve non-trivial cluster-ing properties.

?Adaptation of the spectral?ltering method in the con-text of the AS Internet topology,by(a)performing inverse frequency normalization via stochastic ma-trices,(b)considering similarity transformations and

(c)considering the entire topology as well as subgraphs of

the topology.As a result,we get non-trivial groupings of ASes with clear semantic proximities,such as geography and business interests.We note that without this adapta-tion,i.e.,by considering the eigenvectors corresponding to the eigenvalues of the adjacency matrix as in[11],we get trivial groupings corresponding to the large ISPs and their customers:this is indeed a restatement of the highest degrees.

1Though a related approach called“k-means”is quite common;but we do not expand further on it,since we do not use it in this work.

?The observation that the clustering properties

in the core and the edge of the network and geographic areas,(b)persist over time,and(c)are accurately matched by synthetic Internet topology erators,though the Power Law Random Graph model comes close[2].

?Study of the connection between the information

by spectral?ltering and link stress(link stress can thought of as a?rst approach towards congestion).

particular,we argue that the eigenvectors associated with the largest eigenvalues are suggestive of non-trivial in-tracluster traf?c patterns that cause signi?cant decrease in the link stress.The decrease is much more notable in the Internet than in any synthetic topology.If on the other hand the traf?c patterns become intercluster the link stress correspondingly increases.This reasoning is in line with[7],[10]which suggest that network characteristics should be studied in the context of the design problem they are trying to solve.

?A method to de?ne intracluster and intercluster“traf?c”

patterns.These are patterns that deviate from uniform treatment of all pairs of nodes,and may represent“good”

and“bad”test case for network performance.

?A detailed and ef?cient AS ranking method according to the?rst eigenvector of a suitably de?ned stochastic matrix,which has strong correlation with other known hierarchical assignments[31].This approach is an adap-tation of the pagerank used by Google[25].An adaptation of the same method for ranking links between ASes, found that rankings are highly correlated with link stress under uniform traf?c.A further adaptation of the method to obtain groups of ASes that correspond to seemingly highly stressed cuts.

The balance of the paper is as follows:In Section II we cover necessary primitives from linear algebra and highlight the intuition behind the spectral?ltering method.We also introduce normalizations and similarity transformations,and discuss their suitability and necessity for graphs with skewed statistics,like the Internet topology.In Section III we describe the spectral?ltering results for the AS Internet topology, and give the qualitative nature of the information retrieved by the eigenvectors.In Section IV we give an application of the information retrieved by the eigenvectors in terms of de?ning non-trivial traf?c patterns that deviate from uniform traf?c.In Section V we give a method of ranking ASes and links between ASes that is highly correlated with hierarchical assignments.We summarize in Section VI.

II.S PECTRAL A NALYSIS OF M ATRICES ARISING FROM

G RAPHS

In this Section we give a high level overview of the intuition and the primitives of spectral?ltering.We discuss the basics of eigenvalues and eigenvectors of matrices,some useful transformations and normalizations,and why the eigenvectors corresponding to the large eigenvalues contain information relevant to clustering.This motivates the processing that we Fig.1.The adjacency matrix(left)of a random graph on600vertices.There is a dot in position(i,j)iff there is a link between i and j.The?rst and second diagonal blocks correspond to subgraphs with high connectivity.Off-diagonal blocks represent sparse edges between the subgraphs.The second eigenvector(right)assigns positive weights to the nodes of the?rst block and negative weights to the nodes of the second block.

will do to the eigenvectors of the AS Internet topology in Section III.We also give a plausible explanation of the eigenvalue power-law of[11]as a restatement of the Zipf with exponent1rank-degree distribution;this serves as additional motivation for the processing in Section III,in the sense that without this processing the spectral method does not give non-trivial information.

A.The Spectral Filtering Method

Let G(V,E),|V|=n,be an undirected graph and let A be its adjacency matrix:a ij=1if(i,j)∈E,a ij=0otherwise. Since G is undirected,A is symmetric a ij=a ji.In general, the(i,j)-th entry of a symmetric matrix can be thought of as a measure of the correlation between parameters i and j. Let e be an n-dimension real vector; e can be thought of as a function on the vertices of G.We say that e is an eigenvector of A with eigenvalueλif and only if e A=λ e.It is a well known fact of linear algebra that every n×n real symmetric matrix A has a spectrum of n orthonormal eigenvectors e1, e2,..., e n with real eigenvaluesλ1≥λ2≥...≥λn[16],[38].The eigenvectors are unique up to degeneracies related to equal eigenvalues.In general,the spectral?ltering method can be applied with any matrix with real spectrum.

We demonstrate the essence of the spectral method with an example.The left panel of Figure1gives the adjacency matrix of a symmetric graph.A dot in position(i,j)in this graph corresponds to a link between i and j.There are two highly connected clusters in this graph;the?rst includes nodes1through200and the second all the other nodes.The two clusters are connected with a few links.The right panel of Figure1plots the weights assigned by the eigenvector which corresponds to the second largest eigenvalue.The nodes belonging to the?rst cluster were assigned positive weights and the nodes of the second cluster negative weights.Thus, an ef?cient heuristic to separate the two clusters is to examine the eigenvector.

In broad lines,the spectral?ltering method for an n×n symmetric matrix A proceeds as follows:

STEP1:Compute the k largest eigenvalues of A together with the corresponding eigenvectors.The parameter k depends on

Fig.2.Typical pro?le of the most positive weights assigned to nodes by the eigenvector corresponding to a large eigenvalue.This pro?le was taken from a principal eigenvector of the stochastic normalization of the AS topology. the application and the instance,but it is always one to two orders of magnitude smaller than n.

STEP2:For each i,1≤i≤k,let e i be the eigenvector associated withλi.Sort the vertices according to the weight assigned by e i.A typical pro?le of the sorted vertices is in Figure2.Cut towards the most positive end(or towards the most negative end),with special preference to sharp jumps, if they exist(a good example of a sharp jump can be found in Table II).These groups are candidates for clustering and/or semantic proximity.

In general,the eigenvectors corresponding to large eigen-values tend to capture global characteristics of the graph and its semantics,such as groups of nodes S?V for which the ratio

edges inside S edges incident to S =|{(i,j)∈E:i∈S,j∈S}|

|{(i,j)∈E:i∈S,j∈V}|(1)

is large,indicating clusters of relatively high connectivity and, thus,presumably further semantic proximity,not necessarily otherwise expressed in the data(the deep theory of“expander”graphs supporting this claim can be found,for example in[9], [29]).In addition,because there is no polynomial time algo-rithm to?nd a set S minimizing the above ratio,the spectral method is an ef?cient heuristic.Eigenvectors corresponding to small eigenvalues tend to capture noise,or local characteristics that are explicit or can be easily computed from the data. B.Algebraic Primitives of Spectral Filtering

More formally,we list a few technical facts which build the intuition behind the spectral?ltering method(the statements are straightforward,though some of the proofs to which we point are quite technical).

(a)The largest eigenvalueλ1of a d-regular graph is d and the corresponding eigenvector assigns uniform weights to all vertices[9],[22].All other eigenvaluesλi,2≤i≤n are small, |λi|≤O(√d),almost surely[9].

(b)The eigenvaluesλi,1≤i≤n,of a graph with m edges and maximum degree d are bounded by|λi|≤min{√m,d}[22].(c)The spectrum of the union of vertex disjoint graphs is the union of their spectra[9],[22].

(d)If A and B are the adjacency matrices of not necessarily disjoint graphs with eigenvaluesα1≥α2≥...≥αn and β1≥β2≥...≥βn,then the eigenvalues of their union C= A+B areγ1≥γ2≥...≥γn withαi+βn≤γi≤αi+β1, 1≤i≤n[16],[38].In addition,the corresponding invariant subspaces of C follow from the invariant subspaces of A perturbed by no more than the maximum invariant subspace of B[16],[30].

The intuition behind the spectral?ltering method is that, if we take the union of two vertex disjoint regular random graphs A1and A2and connect them with a few random edges B,then,combining Facts(a)through(e)above,the spectrum of C=A1+A2+B will haveγ1 γ2 d(corresponding to the largest eigenvalues of A1and A2)andγi O(√d), 3≤i≤n.Furthermore,we expect to identify the vertices of A1and A2by examining the eigenvectors corresponding to the?rst two eigenvalues.See Figure1.Indeed,the second eigenvector assigns mostly large negative weights on A1and mostly large positive weights on A2.

C.Similarity Transformation SIM(A)=A·A T

Now suppose that G(V,E),|V|=n,is a directed graph,and thus the adjacency matrix A is no longer symmetric.A is no longer guaranteed to have a complete real spectrum,and the notion of clustering is not well de?ned either.Let A T be the transpose of A,i.e.,a T ij=a ji.Notice that the product A·A T is a symmetric matrix.Notice further that its(i,j)-th entry is n k=1a ik a jk,measuring the number of nodes that i and j point to in common.In the case where the nodes represent ASes and edges are directed from customers to their providers,the above sum relates i and j to the number of their common providers.Similarly,the product A T·A relates i and j to the number of their common customers.The transformation A·A T is very common in spectral analysis.Depending on the application,it is called self-adjoint,co-citation,co-variance, or similarity transformation.Here we shall use the notation SIM(A)=A·A T.

D.Stochastic Normalization

The intuition behind the spectral?ltering method that we gave in the previous paragraphs referred to regular graphs.Indeed, in practice,the spectral?ltering method has been found to deteriorate rapidly when the frequencies of non-zero entries vary substantially[17],which is certainly the case with the very skewed degrees of Internet topologies.Inverse frequency normalization is a general approach to restore spectral?ltering in such cases.

In its simplest form,inverse frequency normalization divides each entry a ij with the sum j a ij of the entries of the corresponding row,thus obtaining a matrix where all the rows add up to1.Notice that this is now a stochastic matrix,in the sense that it describes the transition probabilities of a Markov chain in the natural way.If,in addition,we make all diagonal entries a ii=1/2and multiply all other entries by1/2the

range of the eigenvalues shifts to (0,1)2.Like symmetric matrices,such stochastic matrices have a complete spectrum of real eigenvalues and eigenvectors.For any matrix A ,we denote its stochastic normalization N(A ).In what follows,we may apply the stochastic normalization to either A or SIM(A ),thus getting N(A )or N(SIM(A )).

E.Faloutsos’Eigenvalue Power-Law

[11]examined the spectrum of the adjacency matrix of the AS Internet topology,without performing any normalization or other transformation.They reported a power-law on the twenty or so largest eigenvalues of this matrix with exponent between .45and .5.

[24]observe that Faloutsos’eigenvalue power-law is a direct consequence of the degree sequence power-law along the lines of Facts (d)and (e)of Section II.A,in the following sense (see also Figure 3):

STEP 1:Decompose an undirected AS topology A as A =F +E ,as follows.Initially F is the set of vertices that have the k highest degrees,and let d 1,d 2,...,d k be these degrees.Initially F contains no edges.Let E be the entire AS topology graph.Now we will remove some edges of E and add them to F ,so as to create k disjoint stars in F .We do this by the following process:For each vertex v that is not in F ,if v is incident to k v vertices in F ,pick one of these vertices u with probability proportional to the degree of u in the entire graph,make the edge {v,u }incident to the vertex u ∈F and remove the edge {v,u }from E .Notice that F is now a set of vertex

disjoint stars with degrees d 1,d 2,...,d

k ,and E is the initial AS topology where all edges belonging to the stars have been removed.

STEP 2:Notice that the eigenvalues of a star of degree d are ±√d and 0with multiplicity d ?1[22].Thus,by Fact (d)of Section II.A,the largest eigenvalues of F are d 1, d 2,..., d k .Also,by Fact (e)of Section II.A,the largest eigenvalues of A =F +E cannot be perturbed by more than the largest eigenvalues of E .

STEP 3:For typical AS topologies,we have found experi-mentally that the above procedure,for k =100,gives d i d i ,1≤i ≤k ,hence the largest eigenvalues of F are close to √d 1,√

d 2,...,√d k ,and th

e largest eigenvalues o

f E are,in the worst case strictly smaller than √d 1and on the average 1/5of √d 1.Now by Fact (e)of Section II.A,the largest eigenvalues of A =F +E can be understood to be close to √d 1,√d 2,...,√

d k .Hence,for graphs wher

e the largest degrees follow Zip

f with exponent close to 1,as [11]reported for the AS Internet,the largest eigenvalues follow a power-law with exponent close to .5,also as [11]reported for AS the Internet.We also refer to [12],[19],[24]for formal analysis of these results in stochastic models of power-law random graphs.

2On

the other hand,the eigenvectors of stochastic matrices are not

necessarily orthogonal,and sometimes additional normalizations that rectify orthogonality are necessary for good results.In our analysis this did not turn out to be necessary.We also note that there are many further normalization methods,including so-called Laplacians and divisions by logarithmic or other functions of j a ij

,but,again,we did not use them in our analysis.

Fig.3.and compare them to the square roots of the 100largest degrees.The eigenvalue power-law follows the degree power-law.Both axes are in log scale.

We may now conclude that by looking at the eigenvectors corresponding to the largest eigenvalues examined in [11]we should not hope to get information beyond the ASes of highest degree and their customers.Indeed,in experiment,we have found these eigenvectors to be highly concentrated on the large ISPs.Therefore,to obtain more interesting clusters,we will need the processing discussed in Section II-C.

III.S PECTRAL A NALYSIS OF AS I NTERNET T OPOLOGY In this Section we describe the spectral analysis that we performed on AS Internet topologies.We discuss the used data,the processing,the behavior of large eigenvalues,and the resulting groups of ASes from the corresponding eigenvectors.We show that clustering varies in the core and the edge of the network,as well as across different geographic areas.On the other hand,the clustering is consistent over time.Finally,we compare the spectral characteristics of the real AS topologies to synthetic topologies.

A.Data Used,Transformations and Normalizations We have used topology data from two sources.The ?rst source is the data of [1]who collect BGP routing information from many routers in the Internet and combine all the routing tables to reconstruct the undirected AS https://www.wendangku.net/doc/6511286355.html,ing the heuristics in [31],they also provide the information whether an edge of the undirected topology corresponds to a customer-provider or a peering relationship.Finally,[31]give a heuristic to assign the ASes to the levels of a 5-level hierarchy.The most important ASes,such as big ISPs in the core of the Internet,are assigned to level 1.The smallest ASes are assigned to level 5.The topological data from this effort dating on April 6,2002are the ones used most in our study 3.

The second set of data is from [13].Though this data is far less complete,it has the advantage that it spans the time period of 1997to date.We have thus used this data to study the evolution of clustering over time.[13]does not contain

3We

should note that perhaps the most complete set of data is in [8].It

was dif?cult to annotate these data with the AS hierarchy information of [1],and thus we did not use them.

information about the relationships between the ASes.We

used the algorithm of[15]to infer AS relationships4.

The data in both[1]and[13]are not perfectly accurate. do not believe though that this affects the results of our

in the sense that missing links would quite likely

the clustering?ndings.

An AS topology without AS relationships corresponds to undirected graph with a symmetric adjacency matrix A,in natural way.For such a topology we perform spectral ysis on the stochastic normalization N(A).An AS

with customer-provider or peer relationships corresponds to directed graph A ,where a ij=1and a ji=0if and only if i a customer of j and j is a provider of i,and a ij=a ji=1 and only if i and j are peers(in all other cases the entries 0).For such a topology we perform spectral analysis on stochastic normalization N(SIM(A )).

If we perform spectral analysis starting from the

undirected graph A or directed graph A we?nd that the clusters indicated by the eigenvectors associated with the large eigenvalues correspond to groups of nodes assigned levels3,4 and5of the hierarchy of[31],thus are away from the core of the network.This is intuitive,since we expect the edge of the network to have more areas with higher connectivity inside the area and relatively lower connectivity to the rest of the network,along(1)of Section II.Similarly,we expect that the core of the network is better connected,and thus the ratio(1) of Section II is consistently higher in the core.

To capture the clustering properties of the core of the network we have to explicitly isolate the core from the edge and analyze the core alone.We have used two methods to isolate the core.When information about the AS hierarchy is available,such as in[1],we de?ne the core to be the subgraph that contains only the ASes assigned to levels one through four. When the hierarchical information is not available,as in[13], we iteratively prune all the nodes in the graph that have degree one or two.The graph whose core we wish to?nd can be either directed or undirected.We denote the core as Core(A) and Core(A )depending on whether the original graph was undirected or directed respectively.As above,we perform spectral analysis to N(Core(A))and N(SIM(Core(A ))).

B.Results for the Entire AS Topology

Figure4shows the largest eigenvalues of the AS topology of [1].We have considered the adjacency matrices of the topology with and without AS relationships,for both the entire network and the core.The point to notice in this graph is that the eigenvalues are quite high,indicating the existence of clusters in the underlying topology.Another interesting observation is the drop in the eigenvalues between the entire topology and the core of the network.This is expected because the core was constructed by removing small ISP’s which tend to cluster more.

4In addition to customer-provider and peering,[15]includes sibling re-lationships;to be consistent with our?rst set of data,we replace sibling relationships with peering relationships.Fig.4.The largest eigenvalues of the AS topology.The top line corresponds to the entire topology without AS relationships N(A).The second line corresponds to the entire topology with AS relationships N(SIM(A )).The third line corresponds to the core without AS relationships N(Core(A)).The bottom line corresponds to the core with AS relationships N(SIM(Core(A ))).

TABLE I

A SAMPLE OF A CLUSTER FOUND IN THE N(Core(A ))TOPOLOGY.

AS Weight Level Description Country

32570.10962Tiscali Intl Network DE

33030.10712Swisscom Ltd CH

2930.10323ESnet US

55110.09861France Telecom,Worldwide IP Backbone FR

35490.09861Globalcrossing US

35820.09833University of Oregon US

45130.09721Globix Corporation US

64610.09673Primary AS for Abovenet US

16680.09172AOL Transit Data Network US

12990.09161TeliaNet Global Network SE

33560.09071Level3Communications North America US

7010.08971Alternet US

35610.08961Cable&Wireless(CW)US

63950.08892Broadwing Communications US

89180.08842Carrier1Autonomous System GB

45650.08692Epoch Internet US

12390.08671SprintLink Backbone US

60790.08663RCN Backbone AS US

62590.08623Fiber Network Solutions,https://www.wendangku.net/doc/6511286355.html,

24970.08522IIJNET JP

29140.08461Verio US

28280.08462XO Communications,https://www.wendangku.net/doc/6511286355.html,

25480.08422DIGEX-AS US

54590.08403London Internet Exchange Ltd.GB

56500.08402Electric Lightwave,https://www.wendangku.net/doc/6511286355.html,

Note:This cluster is taken using the eigenvector which corresponds to the highest eigenvalue.The ASes in this group are big ISP providers,mostly in North America and Europe.The weights of the eigenvector did not show a sharp jump.

Next we give some representative groups of nodes cor-

responding to the highest weights assigned by eigenvectors

corresponding to large eigenvalues.The?rst example was

taken using the N(SIM(Core(A ))).The group corresponds to the largest eigenvalue,which is1.0.In Table I,we list the

members of the group that take the highest weights in the

eigenvector.

In Table II,we give a group of ASes that belong to

Chinese ISP providers.This was taken from the eigenvector

of N(SIM(Core(A )))that corresponds to the6th largest eigenvalue with value0.8363.Notice that the clusters of relatively big ASes in Tables I and II(levels1through3of

TABLE II

A SAMPLE OF A CLUSTER FOUND IN THE N(Core(A ))TOPOLOGY

AS Weight Level Description Country

98100.50913China Netcom CN

98050.44633SIEMENS LTD.CHINA CN

93050.31414Beijing Feihua Communication Technol-

ogy Co,Ltd

CN

179690.30864AS OF VLINE CN

74670.2798421VIANET(CHINA),INC CN

176200.26223China Netcom CN

75490.17993The North China regional network of

CEInet

CN

48080.17434Chinanet Beijing Site AS CN

174310.16894Beijing TONEK Information Technology

Development Company

CN

93940.15533CHINA RAILW AY Internet(CRNET)CN

176220.14653China Netcom CN

99290.10763China Netcom CN

47990.09103Golden Bridge Network of China CN

102120.05243Optic Communications Co.,https://www.wendangku.net/doc/6511286355.html,

47740.05103Abone JP

48130.04954China Telecom GUANGDONG

PROVINCE BACKBONE NETWORK

CN

48120.04954China Telecom(Group),Shanghai Tele-

com Company

CN

174440.04753NWT IP Network HK

74740.04512Optus Communications AU

116080.04262Accretive Networks,https://www.wendangku.net/doc/6511286355.html,

69930.04222InterNAP US

https://www.wendangku.net/doc/6511286355.html, US

41340.04072Data Communications Bureau,MPT CN

56500.03782Electric Lightwave,https://www.wendangku.net/doc/6511286355.html,

16680.03582AOL Transit Data Network US

40580.00583LinkAGE Online Ltd.HK

Note:This group was found in the eigenvector corresponding to the6th largest eigenvalue.The last entry(AS4058)does not belong to the group.We have included it to indicate a typical sharp jump suggestive of where to cut a group.

TABLE III

A SAMPLE OF A CLUSTER FOUND IN THE N(SIM(A ))

AS Weight Level Description Country

15536-0.24725CEDEFOP GR

15948-0.21025ICE/HT fundamental and technological

research

GR

20813-0.21025Hellenic Open University GR

6802-0.18684National Educational and Research Infor-

mation Network

BG

3268-0.17665CYNET,Cyprus Academic Network,

Cyprus

CY

13092-0.17655Univerzitet u Beogradu YU

2546-0.17075ARIADNE NETWORK GR

3323-0.17075National Technical University of Athens GR

5470-0.17075Aristotle University of Thessaloniki GR

5489-0.17075T.E.I.of Thessaloniki GR

6744-0.17075Computer Technology Institute GR

6867-0.17075University of Crete GR

8248-0.17075Greek High-School Internet Network GR

8253-0.17075Democritus University of Thrace Network GR

8278-0.17075Technical University of Crete GR

8617-0.17075University of the Aegean GR

8618-0.17075Ionion University GR

8643-0.17075ATHENAnet GR

8700-0.17075T.E.I.OF LARISSA GR

8762-0.17075T.E.I of Crete GR

Note:The whole group contains several more ASes related mostly to academic institutions in Greece,Cyprus,and occasional ASes from other Balkan countries.There was a sharp jump(not indicated in the?gure for lack of space)after which the entries were clearly outside the Balkans.This group was found in the eigenvector corresponding to the2nd largest eigenvalue.

the hierarchy)appear in prominent positions when we examine the core of the topology.As we shall see below,such clusters do not appear when we examine the entire topology.

In Table III we give a group of ASes that belong to Greek academic institutions.This was taken from the eigenvector of N(SIM(A ))that corresponds to the2nd eigenvalue with value 0.9539.Notice that this cluster of rather small ASes(levels4 and5of the hierarchy)appears in prominent position when we examine the entire topology.

We should note that the three examples presented here are typical.We chose to include the particular examples wanting to give one cluster from each continent.C.Results speci?c to Geography

Is the Internet topology homogeneous across the entire globe? Do the same connectivity patterns apply everywhere?The?rst synthetic models of Internet topologies which emphasized the principle of preferential connectivity[18],[23]were implic-itly making such homogeneity assumptions.Recently,these assumptions have been challenged,most notably in[21],[39] who show strong correlation between the placement of ASes and routers with geography as well as economic development. We second and strengthen these?ndings,by observing that different geographic parts of the network exhibit different connectivity patterns.

We have used the data of[33]to assign ASes to continents. We constructed three graphs for the continents of North America(NA),Europe(EU)and Asia(AS)5.We included AS relationships,thus obtaining non-symmetric adjacency matrices A NA for North America,A EU for Europe and A AS for Asia.In Figure5we give the largest eigenvalues

of N(SIM(A NA)),N(SIM(A EU))and N(SIM(A AS)).We also give the plots for the spectrum of the correspond-ing cores N(SIM(Core(A NA))),N(SIM(Core(A EU)))and N(SIM(Core(A AS))).The point to notice is that,both in the entire topology and in the core,North America exhibits less clustering than Europe and Asia.This can be understood intuitively by thinking of the network in North America as being at a later evolutionary stage,and hence is more connected.

D.Spectrum Consistency over Time

Is the spectral behavior of the Internet topology consistent over time?See Figure6.We have used snapshots from[13]taken one year apart and found consistent behavior of the largest eigenvalues of N(A).This con?rms the intuitive belief that the spectrum is a robust characteristic of a topology.Figure6 refers to the entire AS topology without AS relationships.We have observed similar behavior in the evolution of the AS topology with AS relationships,as well as the core of the topology,and when restricted to speci?c continents.

E.Synthetic topologies

In Figure7we give the largest eigenvalues of the AS Internet topology,as well as similar graphs generated by Inet[18], Waxman,growth with preferential connectivity according to Barabasi-Albert and the improved GLP heuristics[5],[23] which explicitly tries to capture better clustering(all the above for the same number of nodes as the Internet topology),and the power law random graph(PLRG)model of[2](for the speci?c degree sequence of the Internet topology).We give the spectrum of both the entire AS topology and the core (recall that the core of synthetic topologies where there is no other indication of hierarchy is obtained by iterative pruning). For the entire Internet topology,all synthetic generators, except for the Power Law Random Graph(PLRG)[2],have 5It is possible that some ASes are present in more than one continents.We treated such ASes as belonging to only one continent.However,their number is very small,and the results are not affected.

Fig.5.The spectrum of different continents.The top graph is for the entire topology of each continent,while the bottom graph is for the core of the topology of each continent.

smaller eigenvalues.This means that they do not contain as strong clusters as the real Internet.This could have been expected since no synthetic generator attempts to capture such explicit notions as geography and business interests.But,why is PLRG an exception?Note that PLRG does not even generate a connected graph[2].So,the same random principles that generate several isolated connected components in the entire graph,generate several badly connected subgraphs within the giant connected component.

For the core of the topologies,the W AXMAN and BA models produce higher eigenvalues.We believe that this is a pathological byproduct that these topology generators do not attempt to simulate any notion of core.Therefore,the behavior of the spectrum after pruning small degree vertices is the same as the entire topology.the network[27],[32].Our approach is closer to the latter, and in?uenced from the proposal of[7],[10]that topology properties should be studied in connection to the functionality of the network.In particular,we shall study the correlation of the information retrieved from the eigenvectors of Section III to the performance of a primitive experiment that studies the “congestion”in the network.

For an undirected(without AS relationships)topology, suppose that we send one unit of traf?c along a minimum hop(shortest)path from each node to every other node6.This induces a stress for each link de?ned as the total number of paths going though the link.We study the maximum link stress,which can be thought of as an indicator of congestion. Intuitively,we expect that there is more traf?c between ASes that have geographic or business relationships.We use the following spectral-?ltering based heuristic to group ASes into clusters:

(a)If n is the size of the topology,consider theα·n,where α=.5in our experiments,largest eigenvalues of N(A),and the eigenvectors associated with each such eigenvalue.

6In case of many shortest paths,we pick one of them arbitrarily.

Fig.7.Spectrum of real and synthetic Internet topologies.The top graph corresponds to the entire topology.The bottom graph corresponds to the core.

(b)Consider the nodes H1and H2that are assigned the highest β·n positive and the highestβ·n negative weights in each such eigenvector.The parameterβis set to.25in our experiments.

(c)Each AS which appears in H1or H2for at least one examined eigenvector will be assigned to the cluster of the positive or negative end of the?rst eigenvector in whose H1 or H2it appeared.In this way we assign ASes to at most one cluster.

We say that a traf?c pattern is %intraclustered if each node sends %of its traf?c exclusively inside the cluster that it belongs,and1? %of its traf?c uniformly to all nodes(thus uniform traf?c is0%intraclustered).

We are interested in studying the change in the max link load as the traf?c shifts from uniform to intracluster(and, intercluster).It is reasonable to expect that,in general,topolo-gies with higher principal eigenvalues,and thus worse cuts(in the sense of(1)of Section II),should tend to exhibit worse

TABLE V

IN MAX LINK STRESS AND AVERAGE EXPECTED HOP DISTANCE,AS

THE TRAFFIC SHIFTS FROM UNIFORM TO INTRACLUSTERED.

Internet Internet Inet Inet

Avg.Avg.

Max Exp.Max Exp.

Link Hop Link Hop

Stress Dist Stress Dist

0%100.0% 3.3744100.0% 2.7499

20%91.5% 3.285597.7% 2.7151

40%83.0% 3.196595.4% 2.6802

60%74.4% 3.107693.1% 2.6454

80%65.9% 3.018790.8% 2.6106

100%57.4% 2.929788.5% 2.5757

Note:The same trend applies to the other synthetic topologies.

stress behavior.Thus,as we shift traf?c from uniform to (resp.intercluster),we expect the maximum link to drop(increase)signi?cantly,since we are increasing

decreasing)the traf?c that stays inside the cluster and (resp.increasing)the traf?c that crosses the bad cut.

the AS Internet topology is exhibiting sharper shift stress behavior than several synthetic topologies from (BA,GLP,Waxman[4],[5],[23],[37])Inet[18],and

[2].The results are given in Table IV.Assume for

that the traf?c is20%intraclustered.Then,the

link stress for the AS topology dropped to91.5% in uniform traf?c.For the same intracluster traf?c,the link stress in the topology generated by Inet dropped to Thus,the maximum link stress decreased by a factor

in the case of the AS topology and by2.3%in the of Inet.At the extreme of100%intraclustered traf?c the link stress in the Internet drops by more than40%,while synthetic topology the drop was less than23%,with of Waxman,in which case the drop was around 30%.

We therefore propose that the information retrieved from the eigenvectors associated with the largest eigenvalues may be suggestive of intracluster traf?c patterns.We propose to use the clusters suggested by these eigenvectors as one meaningful way to generate traf?c patterns that deviate from uniform traf?c.One additional remark is due.It may be thought that the decrease in link stress under intracluster traf?c patterns is a straightforward consequence of shorter min-hop paths that would be used in an intraclustered traf?c pattern,See Table V.For each node,de?ne its expected hop distance as the expected hop distance of the node from every other node under a speci?c traf?c pattern.Notice that both in the Internet and in the synthetic topology produced by Inet,the drop in the average expected hop distance is not nearly as striking as that of the max link stress.We therefore conclude that the drop in the link stress is a result of a better distribution of the weighed shortest paths rather than a mere decrease of their length.Thus the intracluster traf?c pattern is indeed non trivial. Similar observation apply to intercluster traf?c patterns.

TABLE IV

D ROP OF MAX LINK STRESS AS TH

E TRAFFIC SHIFTS FROM UNIFORM TO INTRA-CLUSTER AND INTER-CLUSTER.

A.Intra-cluster

Internet Inet PLRG GLP Waxman BA

0%100.0%100.0%100.0%100.0%100.0%100.0%

20%91.5%97.7%95.6%95.8%94.1%96.4%

40%83.0%95.4%91.2%91.6%88.2%92.9%

60%74.4%93.1%86.9%87.3%82.3%89.3%

80%65.9%90.8%82.5%83.1%76.3%85.8%

100%57.4%88.5%78.1%78.9%70.4%82.2%

B.Inter-cluster

Internet Inet PLRG GLP Waxman BA

0%100.0%100.0%100.0%100.0%100.0%100.0%

20%108.5%102.4%102.5%103.9%107.6%102.8%

40%116.9%104.7%105.1%107.8%116.1%104.7%

60%125.4%107.1%107.6%111.7%124.6%107.1%

80%133.8%109.5%110.1%115.6%113.2%109.5%

100%142.3%111.8%112.7%119.5%141.7%111.8%

Note:The AS Internet exhibits more drop than any synthetic topology(almost twice as much with the exception of the Waxman model).We note that these numbers refer to the core of the network.The behavior was similar when we did the same experiment in the whole network,and in each speci?c continent.

V.R ANKING BY THE F IRST E IGENVECTOR

The“signi?cance”of an AS,or its position in a hierarchy,is a subjective matter,in the sense that ASes are never explicitly or implicitly assigned such rankings.There is relatively good agreement about the“top”and“bottom”of a hierarchy.For example,an ISP that has only peers and no provider is almost surely very big,while an AS that has no customers or peers and only one or two providers is almost surely very small. In two separate efforts,[15]and[31]gave heuristics to assign hierarchical levels to ASes,after inferring AS relationships and taking into account several non-trivial further characteristics. In this Section we observe that a different heuristic,based on the weights assigned to the ASes by the?rst eigenvector of a suitably de?ned modi?cation of the directed AS graph(i.e., after AS relationships have been inferred),is highly correlated with the hierarchy of[31].

The proposed heuristic is an adaptation of the pagerank method used by Google to infer quality of Web pages.The analogy is natural.Both the directed AS topology and the WWW are directed graphs.In the WWW,a hyperlink pointing from a page i to a page j indicates an endorsement of importance from i to j.In the Internet,an edge pointing from a customer i to a provider j can be thought of as a similar endorsement of importance,while in peers the endorsement becomes mutual.

The ranking method is the following.Let A be the directed adjacency matrix.For each node i de?ne the outdegree of i as d out(i)=|{j:a ij=1}|.Now consider the stochastic matrix

P(A):p ij= =αd out(i)+1?αn if a ij=1

=1?αif a ij=0

The above stochastic matrix represents a random walk on the directed graph A ,where with probabilityαwe go to a provider or peer chosen uniformly at random,and with probability1?αwe jump to a uniformly random node from the set of all nodes(the latter step is a standard correction to avoid degeneracies pertaining to sinks).

Letπ(v)be the stationary probability of the stochastic matrix P(A).Google assigns to Web pages pagerank quality π(v).By analogy,we assign to each AS hierarchical weight π(v).In Figure8we compare the hierarchy of[31]to our hierarchical weightπ(v).We have usedα=.95;the results are similar for any.9≤α≤.99.To plot the graph,we have grouped the ASes by their level in the hierarchy.Then,we sort the ASes in each group by their weight inπ(v)and plot the weights in decreasing order.Observe that we use logarithmic scale for both axes.

There is notable correlation between the weights assigned to the ASes and their level in the hierarchy.Nodes assigned by[31]in high levels have higher values inπ(v).Also,the weights assigned to the ASes of a group are in general higher than the weights assigned to ASes that belong in groups of lower level.One noticeable exception is the weights assigned to levels4and5.ASes in these levels have very small degree and they cannot be easily separated by the page rank method. At?rst glance it seems that there is an“anomaly”in the?gure, since there are some ASes that are assigned larger weights than ASes which belong to higher levels.We argue that this could be a problem of the subjective nature of hierarchical assignment,and/or the heuristic used by[31]to assign ASes to levels.We will discuss two examples to make this point. The largest weights in levels2and3have a very high value which is comparable to the weights assigned to nodes in level1.These weights correspond to the ASes of Tiscali Intl Network(AS number3257)and of Abovenet(AS number 6461)respectively.We believe that they had to be assigned in the highest level.This is justi?ed by their degrees in the adjacency matrix,which are330and585respectively,and by

the reputation they have as big ISP providers.

We extend the above method to obtain an assignment of signi?cance to links.If n is the number of ASes and m is the number of links of the undirected AS topology,let N =n 2be the number of pairs of ASes and associate with each such pair a shortest path between their endpoints.We may now consider the m ×N traf?c matrix T ,where each row corresponds to a shortest path and there is a 1on the columns of the links used by the https://www.wendangku.net/doc/6511286355.html,ing the SVD method,which is a generalization of the decomposition into eigenvalues and eigenvectors for non-square matrices,we can compute the left eigenvector of T that corresponds to the largest eigenvalue.Just like pagerank,this eigenvector gives an order of importance to links.Links that get higher values are associated with links that accept more traf?c and thus are candidates to be places of congestion.Observe that this statement was made without making any assumption about the traf?c between any two ASes.

To ?nd the correlation between the importance assigned to links and the amount of traf?c they receive we did the following experiment 7.We assumed that between each pair of ASes there is some amount of traf?c ?owing drawn from a uniform distribution that takes values between 0and 2traf?c units 8.After performing shortest path routing and assigning loads to links,we have ordered the links by their load.We are interested to ?nd the relation between this ordering and the ordering given by the weights in the eigenvector.In Figure 9we depict this relationship.There is a point in (i,j )when a link is in i -th position sorted by the load and in j position sorted by the weight in the eigenvector.Indeed it is easy to observe that there is strong correlation between the importance of the link and the amount of traf?c it receives.The correlation coef?cient in this case is 0.8594indicating

7For

this example we have used an induced graph of the real topology

which includes all the ASes in levels 1and 2as assigned by [31].Memory and processing limitations did not allow us to work with bigger matrices.8Setting the traf?c to 1for each pair gave the same results.

eigenvector of the SVD of the traf?c matrix with the load of the link.Correlation coef?cient is 0.8594.

Fig.10.An example of a link cluster.

this strong correlation.

In addition,it is possible to use the left eigenvectors to identify clusters of related links that form a cut in the original adjacency matrix.The links in the cut carry traf?c between areas in the Internet that are not well connected and thus they are candidates to be points of congestion.As a simple example we give Figure 10,where we draw a cluster of links (cluster in the same sense as the clusters de?ned earlier for ASes)taken from the left eigenvector which corresponds to the second largest eigenvalue.Intuitively,we expect that indeed the trans-atlantic links to carry a lot of traf?c and thus be points of congestion as indicated.We have observed similar clusters using the other eigenvectors,and also in positions that seem intuitively natural (across Central and Eastern Europe,across the Paci?c,e.t.c).It is still an open question to us how the clusters observed in the AS topology related with the link clusters.

VI.S UMMARY

Spectral ?ltering is a well known information retrieval method.We studied the adaptation of spectral ?ltering in the AS Internet topology.We found that the information retrieved corresponds to groups of nodes with semantic proximities.We found that the clustering behavior varies in the core and in the

edge of the network,and across different geographic areas. We gave two applications of spectral?ltering.The?rst is to identify non-trivial intacluster and intercluster traf?c patterns. Such traf?c patterns affect the stress on elements of the network.The second application is an adaptation of Google’s PageRank to obtain an alternative detailed characterization of hierarchy.Our study proves that spectral?ltering methods can be successful in processing Internet topologies.

Beyond information retrieval,spectral methods have found great applicability in information compression,via the tech-nique of low rank approximations[3],[26],[28].Examining if such low rank approximations apply to the networking context (e.g.speed-up simulations)is a very important open question. We believe that our study is a?rst step in this direction. Finally we should mention that the?rst reference to the high end of the spectrum of Internet topologies is due to[11],and another interesting study can be found in[36]who discuss properties of the entire spectrum and relate them to certain structural properties of the Internet graph.

A CKNOWLEDGMENT

We thank Lixin Gao for providing her scripts,Lakshmi Sub-ramanian for providing his data.We thank Thomas Erlebach, Amin Saberi,and the anonymous referees for insightful sug-gestions on the presentation.

R EFERENCES

[1]S.Agarwal,L.Subramanian,J.Rexford,and R.Katz.Charac-

terizing the internet hierarchy from multiple vantage points web page.https://www.wendangku.net/doc/6511286355.html,/sagarwal/research/BGP-hierarchy/, July2002.

[2]William Aiello,Fan R.K.Chung,and Linyuan Lu.A random graph

model for massive graphs.In ACM Symposium on Theory of Computing, pages171–180,2000.

[3]Yossi Azar,Amos Fiat,Anna R.Karlin,Frank McSherry,and Jared Saia.

Spectral analysis of data.ACM Symposium on Theory of Computing, pages619–626,2001.

[4]Albert-L′a szl′o Barabasi and R′e ka Albert.Emergence of scaling in

random networks.Science,286:509–512,1999.

[5]Tian Bu and Don Towsley.On distinguishing between internet power

law topology https://www.wendangku.net/doc/6511286355.html,com2002.

[6]K.Calvert,M.Doar,and E.Zegura.Modeling internet topology.IEEE

Communications Magazine,June1997.

[7]J.M.Carlson and J.Doyle.Highly optimized tolerance:A mechanism

for powerlaws in design systems.Physics review E,60(2),pages1412–1427,1999.

[8]Chen Chang,Jamin Govindan,and Willinger Shenker.The origins of

power-laws in internet topologies https://www.wendangku.net/doc/6511286355.html,com2002.

[9]Fan R.K.Chung.Spectral graph theory.American Mathematical Society

Book Series,1997.

[10] A.Fabrikant,E.Koutsoupias,and C.H.Papadimitriou.Heuristically

optimized tradeoffs:A new paradigm for powerlaws in the internet.

ICALP2002,2002.

[11]M.Faloutsos,P.Faloutsos,and C.Faloutsos.On power-law relationships

of the internet topology.Proc.of ACM SIGCOMM,1999.

[12]I.J.Farkas,I.Der′e nyi,A.Barab′a si,and T.Vicsek.Spectra of real-world

graphs:Beyond the semi-circle law.e-print cond-mat/0102335,2001.

[13]National Laboratory for Applied Retwork Research.Route views

archive.https://www.wendangku.net/doc/6511286355.html,/Routing/rawdata.[14]Cooperative Association for Internet Data Analysis.Caida.

https://www.wendangku.net/doc/6511286355.html,.

[15]L.Gao.On inferring autonomous system relationships in the internet.

IEEE Glogal Internet,2000.

[16]G.H.Golub and C.F.Van Loan.Matrix computations.Johns Hopkins

University Press,1989.

[17]P.Husbands,H.Simon,and C.Ding.On the use of the singular value

decomposition for text retrieval.1st SIAM Computational Information Retrieval Workshop,2000.

[18] C.Jin,Q.Chen,and S.Jamin.Inet:Internet topology generator.(CSE-

TR-433-00),2000.

[19]K.I.Kahng and D.Kim.Spectra and eigenvectors of scale-free networks.

Physical Review E.,Vol64,2001.

[20]J.Kleinberg.Hubs,authorities,and communities on the www.ACM

Computing Surveys,31(4es):5,1999.

[21]Anukool Lakhina,John Byers,Mark Crovella,and Ibrahim Matta.On

the geographical location of internet resources.Boston University,2002.

[22]https://www.wendangku.net/doc/6511286355.html,binatorial problems and exercises.North Holland,

Amsterdam,1979.

[23] A.Medina,I.Matta,and J.Byers.On the origin of power-laws in

internet topologies.ACM Computer Communications Review,April 2000.

[24]M.Mihail and C.H.Papadimitriou.On the eigenvalue power law.

Random2002,2002.

[25]Lawrence Page,Sergey Brin,Rajeev Motwani,and Terry Winograd.The

pagerank citation ranking:Bringing order to the web.Stanford Digital Library Technologies Project,1998.

[26] C.H.Papadimitriou,P.Raghavan,H.Tamaki,and https://www.wendangku.net/doc/6511286355.html,tent se-

mantic indexing:A probabilistic analysis.Proc.Principles of Database Systems(PODS),1999.

[27]P.Radoslavov,H.Tangmunarunkit,H.Yu,https://www.wendangku.net/doc/6511286355.html,indan,S.Shenker,

and D.Estrin.On characterizing network topologies and an-alyzing their impact on protocol https://www.wendangku.net/doc/6511286355.html,C-CS-TR-00-731, https://www.wendangku.net/doc/6511286355.html,/radoslavov00characterizing.html,March2000.

[28]https://www.wendangku.net/doc/6511286355.html,rmation retrieval algorithms:A survey.Proc.8th SIAM

Symposium on Discrete Algorithms(SODA),1997.

[29] A.Sinclair.Algorithms for random generation and counting:A markov

chain approach.Springer-Verlag,1997.

[30]G.W.Stewart and J.Sun.Matrix perturbation theory.Academic Press,

1990.

[31]L.Subramanian,S.Agarwal,J.Rexford,and R.H.Katz.Characterizing

the internet hierarchy from multiple vantage points.In IEEE Infocom, June2002.

[32]H.Tangmunarunkit,https://www.wendangku.net/doc/6511286355.html,indan,S.Jamin,S.Shenker,and W.Will-

https://www.wendangku.net/doc/6511286355.html,work topology generators:Degree-based vs structural.Sig-comm2002.

[33]NetGeo https://www.wendangku.net/doc/6511286355.html,geo tool.https://www.wendangku.net/doc/6511286355.html,/tools/utilities/netgeo,

2002.

[34]https://www.wendangku.net/doc/6511286355.html,.Public route server and looking glass site list.

https://www.wendangku.net/doc/6511286355.html,.

[35]V.V.Vazirani.Approximation algorithms.Springer-Verlag,2001.

[36]Danica Vukadinovic,Polly Huang,and Thomas Erlebach.A

spectral analysis of the internet topology.Dimacs Workshop on Internet and WWW Measurement,Mapping,and Modeling,https://www.wendangku.net/doc/6511286355.html,/vukadinovic01spectral.html,2001.

[37] B.M.Waxman.Routing of multipoint connections.IEEE Jour.Selected

Areas in Communications(Special Issue:Broadband Packet Communi-cations),6(9):1617–1622,1988.

[38]J.H.Wilkinson.The algebraic eigenvalue problem.Oxford Science

Publications,1965.

[39]Soon-Hyung Yook,Hawoong Jeong,and Albert-Laszlo Barabasi.Mod-

eling the internet’s large-scale topology.https://www.wendangku.net/doc/6511286355.html,/abs/cond-mat/0107417,2002.

[40] E.Zegura,K.Calvert,and S.Bhattacharjee.How to model an

internetwork.In Proceedings of IEEE INFOCOM,1996.

[41] E.Zegura,K.Calvert,and M.Donahoo.A quantitative comparison of

graph-based models for Internet topology.Transactions on Networking, Vol5,No6,pages770–783,1997.

相关文档