Epidemic spreading in real networks: an eigenvalue viewpoint

How will a virus propagate in a real network? Does an epidemic threshold exist for a finite graph? How long does it take to disinfect a network given particular values of infection rate and virus death rate? We answer the first question by providing equations that accurately model virus propagation in any network including real and synthesized network graphs. We propose a general epidemic threshold condition that applies to arbitrary graphs: we prove that, under reasonable approximations, the epidemic threshold for a network is closely related to the largest eigenvalue of its adjacency matrix. Finally, for the last question, we show that infections tend to zero exponentially below the epidemic threshold. We show that our epidemic threshold model subsumes many known thresholds for special-case graphs (e.g., Erdos-Renyi, BA power-law, homogeneous); we show that the threshold tends to zero for infinite power-law graphs. We show that our threshold condition holds for arbitrary graphs.


Introduction
Computer viruses remain a significant threat to today's networks and systems. Existing defense mechanisms typically focus on local scanning of virus signatures. While these mechanisms can detect and prevent the spreading of known viruses, they do little for globally optimal defenses. The recent proliferation of malicious code that spreads with virus code exacerbates the problem [10,24,25]. From a network dependability standpoint, the propagation of malicious code represents a particular form of fault propagation, which may lead to the ultimate demise of the network (consider distributed denial-of-service attacks). With the exception of a few specialized modeling studies [7,8,16,19,26], much still remains unknown about the propagation characteristics of computer viruses and the factors that influence them.
In this paper, we investigate epidemiological modeling techniques to reason about computer viral propagation. Kephart and White [7,8] are among the first to propose epidemiology-based analytic models. Their studies, however, are based on topologies that do not represent modern networks. Staniford et al. [23] reported a study of the Code Red worm propagation, but did not attempt to create an analytic model. The more recent studies by Pastor-Satorras et al. [16,17,18,19,20] and Barabási et al. [2,4] focused on epidemic models for power-law networks.
This work aims to develop a general analytic model of virus propagation. Specifically, we are interested in models that capture the impact of the underlying topology but are not limited by it. We found that, contrary to prior beliefs, viral propagation is largely determined by intrinsic characteristics of the network. Our model holds for arbitrary graphs and renders surprisingly simple but accurate predictions.
The layout of this paper is as follows: section 2 gives a background review of previous models. In section 3, we describe our proposed model. We show that our model conforms better to simulation results than previous models over real networks. In section 4, we revisit the issue of epidemic threshold and present a new theory for arbitrary graphs-the epidemic threshold of a given network is related intrinsically to the first eigenvalue of its adjacency matrix. We summarize in section 6.

Earlier models and their limitations
The class of epidemiological models that is most widely used is the so-called homogeneous models [1,11]. A homogeneous model assumes that every individual has equal contact to everyone else in the population, and that the rate of infection is largely determined by the density of the infected population. Kephart and White adopted a modified homogeneous model in which the communication among individuals is modeled as a directed graph [7]: a directed edge from node i to node j denotes that i can directly infect j. A rate of infection, called the birth rate, β, is associated with each edge. A virus curing rate, δ, is associated with each infected node.
If we denote the infected population at time t as η t , a deterministic time evolution of η t in the Kephart-White model (hereafter referred to as the KW model) can be represented as where k is the average connectivity. The steady state solution for Equation where N is the total number of nodes. An important prediction of Equation 1 is the notion of epidemic threshold. An epidemic threshold, τ , is the critical β/δ ratio beyond which epidemics ensue. In a homogeneous or Erdös-Rényi network, the epidemic threshold is, where k is the average connectivity [7]. These earlier models provide a good approximation of virus propagation in networks where the contact among individuals is sufficiently homogeneous. However, there is overwhelming evidence that real networks (including social networks [21], router and AS networks [6], and Gnutella overlay graphs [22]) deviate from such homogeneity-they follow a power law structure instead. Computer viruses, therefore, are likely to propagate among nodes with a high variance in connectivity.
Pastor-Satorras and Vespignani studied epidemic spread for power-law networks where the connectivity distribution is characterized as P (k) = k −γ (P (k) is the probability that a node has k links) [14,16,18,19]. Power-law networks have a highly skewed connectivity distribution and for certain values of γ resemble the Internet topology [6]. Pastor-Satorras et al. developed an analytic model (we refer to their model as the SV model) for the Barabási-Albert (BA) power-law topology (γ = 3). Their steady state prediction is, where m is the minimum connection in the network. The SV model, however, depends critically on the assumption γ = 3, which does not hold for real networks [9,6]. This model yields less than accurate predictions for networks that deviate from the BA topology, as we will show later in the paper. Pastor-Satorras et al. [18] also proposed an epidemic threshold condition where k is the expected connectivity and k 2 signifies the connectivity divergence. Following [19], Boguñá and Satorras studied epidemic spreading in correlated networks where the connectivity of a node is related to the connectivity of its neighbors [3]. These correlated networks include Markovian networks where, in addition to P (k), a function P (k|k ′ ) determines the probability that a node of degree k is connected to a node of degree k ′ .
While some degree of correlations may exist in real networks, it is often difficult to characterize connectivity correlations with a simple P (k|k ′ ) function. Indeed, prior studies on real networks [6,15] have not found any conclusive evidence to support the type of correlation as defined in [3]. Hence, we will not discuss models for correlated networks further in this paper.
We present a new analytic model that does not assume any particular propagation topology. We will show later that our model subsumes previous models that are tailored to fit special-case graphs (homogeneous, BA power-law, etc.).

The proposed model
In this section, we describe a model that does not assume homogeneous connectivity or any particular topology. We assume a connected network G = (N, E), where N is the number of nodes in the network and E is the set of edges. We assume a universal infection rate β for each edge connected to an infected node, and a virus death rate δ for each infected node. Table 1 lists the symbols used. β Virus birth rate on a link connected to an infected node δ Virus curing rate on an infected node t Time stamp p i,t Probability that node i is infected at t ζ i,t Probability that node i does not receive infections from its neighbors at t η t Infected population at time t k Average degree of nodes in a network k 2 Connectivity divergence

Model
Our model assumes discrete time. During each time interval, an infected node i tries to infect its neighbors with probability β. At the same time, i may be cured with probability δ. We denote the probability that a node i is infected at time t as p i,t . We define ζ i,t , the probability that a node i will not receive infections from its neighbors at time t as, We assume that a node i is healthy at time t if • i was healthy before t and did not receive infections from its neighbors at t (defined by ζ i,t ) OR • i was infected before t, cured at t and did not receive infections from its neighbors (defined by ζ i,t ) OR • i was infected before t, received and ignored infections from its neighbors, and was subsequently cured at t Note that the third bullet above is due to potentially concurrent curing and infection events. We subsequently define the healthy probability of a node i at time t, 1 − p i,t , to be Note that for the last term on the right hand side of Equation 6 we assume that the probability that a curing event at node i takes place after infection from neighbors is roughly 50%. Given a network topology and particular values of β and δ, we can solve Equation 6 numerically and obtain the time evolution of infected population, η t ,

Experiments
In this section, we present a set of simulation results. The simulations are conducted to answer the question-how does our model perform in real, BA power law, and homogeneous graphs? We use a real network graph collected at the Oregon router views 1 . This dataset contains 31180 links among 10900 AS peers. All synthesized power-law graphs used in this study are generated using BRITE [12]. Unless otherwise specified, each simulation plot is averaged over 15 individual runs.
We begin each simulation with a set of randomly chosen infected nodes on a given network topology 2 . Simulation proceeds in steps of one time unit. During each step, an infected node attempts to infect each of its neighbors with probability β. In addition, every infected node is cured with probability δ. An infection attempt on an already infected node has no effect. Figure 1 shows the time evolution of η as predicted by our model (see Equation 6) on the 10900node Oregon AS graph, plotted against simulation results and the steady state prediction of the SV model in Equation 3 (Since the SV model does not estimate the transients, we plot the steady state only.) As shown, our model yields a strictly more precise result than the SV model. Figure 2 compares the predictions of our model against the SV model for Barabási-Albert networks (see Equation 3). The topology used in Figure 2 is a synthesized 1000-node BA network. Since the SV Both simulations were performed on an Oregon network graph, with k = 5.72 and β = 0.14. In both cases, our model conforms much closer to the simulation results than the SV model.

model (see Equation 3
) is specifically tailored for BA networks, we expect the comparison to be simply a sanity check. As shown, both models conform nicely to the simulation results, though our model appears to be slightly more precise.  Figure 3 shows simulation results of epidemic spreading on a synthesized 1000-node random network, plotted against the KW model [7] and our model. The network is constructed according to the Erdös-Rényi model [5]. Since an Erdös-Rényi network is sufficiently close to being homogeneous as far as epidemiological models are concerned, the results in Figure 3 suggest that our model is as pre-cise as the KW model, which is designed specifically for homogeneous networks. In one case where β is 0.2 and δ is 0.72, the simulated spreading appears to follow our prediction more closely than that of the KW model. The experiments we show here, conducted on a real network, a synthesized BA power-law network, and an Erdös-Rényi network, illustrate the predictive power of our model-as a general model, it subsumes prior models and produces predictions that equal or outperform predictions that target specific topologies.

Epidemic threshold and eigenvalues
As described previously, an epidemic threshold is a critical state beyond which infections become endemic. Predicting the epidemic threshold is an important part of an epidemiological model. The epidemic threshold of a graph depends fundamentally on the graph itself. The challenge therefore is to capture the essence of the graph in as few parameters as possible. We present one such model here that predicts the epidemic threshold with a single parameter-the largest eigenvalue of the adjacency matrix of the graph-for arbitrary graphs.
We note that previous models have derived threshold conditions for special-case graphs. For instance, the epidemic threshold for a homogeneous network is the inverse of the average connectivity, k . Similarly, the threshold for infinite power-law networks is zero. However, a unifying model for arbitrary, real graphs has not appeared in the literature. The closest model thus far is the one put forth by Pastor-Satorras et al. (see Equation 4). We show later that their model is not accurate for arbitrary graphs.
In this section, we describe a general theory for epidemic threshold that holds for arbitrary graphs. We observe that the epidemic threshold is a condition linking the virus' birth and curing rate to the adjacency matrix of the graph, such that an infection becomes an epidemic if the condition holds, and dies out if it does not. Our theory is surprisingly simple yet accurate at the same time. We show later in this section that this new threshold condition subsumes prior models for special-case graphs. Table 2 lists the symbols used in this section.

A
Adjacency matrix of the network trA The transpose of matrix A λ i,A The i-th largest eigenvalue of A u i,A Eigenvector of A corresponding to λ i,A S The 'system' matrix describing the equations of infection λ i,S The i-th largest eigenvalue of S

Table 2. Symbols for eigenvalue analysis
Next, we will show that our estimate for the epidemic threshold τ is where λ 1,A is the largest eigenvalue of the adjacency matrix A of the network.
Theorem 1 (Epidemic Threshold) If an epidemic dies out, then it is necessarily true that β δ < τ = 1 λ1,A , where β is the birth rate, δ is the curing rate and λ 1,A is the largest eigenvalue of the adjacency matrix A.
As we show in Lemma 1 in the Appendix, the matrices A and S have the same eigenvectors u i,S , and their eigenvalues, λ i,A and λ i,S , are closely related: Using the spectral decomposition, we can say Using this in Equation 13, Without loss of generality, order the eigenvalues such that λ 1,A ≥ λ 2,A . . .. For an infection to die off and not become an epidemic, the vector P t should go to zero for large t, which happens when ∀i, λ t i,S tends to 0. That implies λ 1,S < 1. So, which means that, τ = 1 λ1,A 2 Theorem 2 (Exponential Decay) When an epidemic is diminishing (therefore β/δ < 1 λ1,A ), the probability of infection decays exponentially over time.
Proof: We have: where C is a constant vector. Since the value of λ 1,S is less than 1 (because of the no-epidemic condition). the values of p i,t are decreasing exponentially over time. Corollary 1 When the network is below the epidemic threshold, the number of infected nodes decays exponentially over time.
Proof: Let n t denote the number of infected nodes at time t.
where C i are the individual elements of the matrix C in Equation 18 above. Because i C i is a constant and λ 1,S < 1 (from Theorem 1), we see that n t decays exponentially with time.

2.
The exponential decay in the number of infected nodes is shown clearly in Figure 4, where we plot the logarithm of the number of infected nodes, η t , versus t. Two plots are shown: One for the star topology, and one for the Oregon dataset. In both cases, we observe that for large values of time t, the plots become linear, implying that the number of infected nodes decays exponentially.

Discussion-generality of our threshold condition
We now turn to show that our threshold condition is general and holds for other graphs. In particular, we show that the threshold condition holds for a) homogeneous, b) star, c) infinite power-law, and d) finite power-law graphs. We do that with the following corollaries.

Corollary 2
The new threshold model holds for homogeneous or random Erdös-Rényi graphs.
Proof: As reported previously, the epidemic threshold in a homogeneous network or a random Erdös-Rényi graph is τ hom = 1/ k where k is the average connectivity [7]. It is easily shown that, in a homogeneous or random network, the largest eigenvalue of the adjacency matrix is k . Therefore, our model yields the same threshold condition as the homogeneous models [11]. Proof: In a star topology, we have two types of nodes, the center node and the satellite nodes. Suppose that we have d satellites, the first eigenvalue of the adjacency matrix, λ 1 , is √ d. The stability condition then becomes which means that δ = β * √ d to achieve stability, thus rendering τ = 1 √ d .
2 Figure 5 shows an infection spread over time in a 100-node star graph with β = 0.016. Given τ = 1/ √ 99, the critical δ on the threshold is 0.16. We plotted our propagation model as given by Equation 6 in Figure 5(b). As shown, the propagation model confirms our prediction for the critical δ. More specifically, the theoretical results rendered by the propagation model closely reflect the simulation when δ > 0.16. For δ < 0.16, there is no epidemic. For δ = 0.16, a very interesting setting appears.
For the case of δ = 0.16, our propagation model seems to show that the expected number of infected nodes η t drops approximately at the rate of t −1 , which is qualitatively different from the other two cases: for δ > 0.16, η t ≈ λ t 1 ; for δ < 0.16, η t stabilizes. This suggests a phase transition phenomenon.    Figure 6(d) depicts a further example for the star topology, plotting the number of infected nodes η 200 at time t=200 for several values of the β/δ ratio. We plot both theoretical (see Equation 6) and simulation results. We also show the two epidemic thresholds with vertical lines: Our threshold with "crosses" at β/δ = 1/λ 1,A = 0.1 and the SV threshold with "squares" at β/δ = 0.02. The simulation results indicate that our threshold is clearly in the correct region, while the SV threshold prediction is not accurate.
to [13]). Since d max ∝ ln(N ) and N is infinite, λ 1,A is infinite. Our epidemic threshold condition states that δ must be greater than β * λ 1,A in order for there not be any epidemic. Therefore, the epidemic threshold is effectively zero for infinite power-law networks. This result concurs with previous work, which finds that infinite power-law networks lack epidemic thresholds. We structured the experiment such that 5000 nodes are infected initially. Simulations proceed with β = 0.001 and δ ranging from 0.05 to 0.14. For the particular values of β and λ 1,A , our epidemic threshold model predicts a critical δ at 0.0587211, while the SV threshold prediction puts the critical δ at 0.2078. As shown in Figure 6(a), the simulation with δ = 0.05 reaches equilibrium while the run with δ = 0.07 approaches zero at approximately time-tick 600. The run with δ = 0.06 approaches zero steadily, but has yet to reach it at time-tick 1000. These results closely mirror our threshold prediction, which shows a critical δ at approximately 0.06. Figure 6(b) shows an alternate view of the experiment result, plotting the number of infected nodes η at time t=500 for several values of the β/δ ratio. We plot both theoretical (see Equation 6) and the simulation results. We also show the two epidemic thresholds with vertical lines: Our threshold with "crosses" at β/δ= 1/λ 1,A = 0.0167 and the SV threshold with "squares" at β/δ= 0.0048. Notice that our threshold is clearly in the correct region, while the SV threshold prediction is less precise.
It was brought to our attention that Boguñá et al. derived an epidemic threshold condition for correlated networks based on the largest eigenvalue of a specialized connectivity matrix, C [3]. Each entry C k,k ′ of C is defined by kP (k|k ′ ) where P (k|k ′ ) indicates the probability that a k-linked node is connected to a k ′ -linked node. In [3], they used a continuous-time model and arrived at the eigenvalue based threshold condition following a different line of reasoning. While the two results are similar for correlated networks, our threshold condition is more general.

Conclusions -contributions
How will a virus propagate in a real computer network? What is the epidemic threshold for a finite graph, if any? How long does it take for a viral outbreak to reach steady state? These questions have for decades intrigued researchers. In this paper we attempt to answer these questions by providing a new analytic model that accurately models the propagation of viruses on arbitrary graphs. The primary contributions of this paper are: • We propose a new model for virus propagation in networks (Equation 6), and show that our model is more precise and general than previous models. We demonstrate the accuracy of our model in both real and synthetic networks.
• We show that we can capture the viruspropagation properties of an arbitrary graph in a single parameter, namely the eigenvalue λ 1,A . We propose a precise epidemic threshold, τ = 1/λ 1,A , which holds irrespective of the network topology; an epidemic is prevented when δ > δ c = β * λ 1,A . We show that our epidemic threshold is more general and more precise than previous models for special-case graphs (e.g., Erdös-Rényi, homogeneous, BA power-law); we show that it tends to zero for infinite power-law graphs.
• We show that, below the epidemic threshold, the number of infected nodes in the network decays exponentially.
Future research directions abound, both for theoretical as well as experimental work. One could examine phase transition phenomena, when we are exactly on the epidemic threshold. Another promising direction is to enhance the model with a "vigilance" parameter to model environmental factors that affect viral propagations.

Acknowledgments
The authors wish to thank Dr. Benoit Morel, Dr. Anthony Brockwell, and Dr. Deborah Brandon for many insightful discussions on the subject. We  also like to thank the anonymous reviewers for their helpful comments.

Appendix
Lemma 1 (Eigenvalues of the system matrix) The i − th eigenvalue of S is of the form λ i,S = 1 − δ + βλ i,A , and the eigenvectors of S are the same as those of A.
Proof: Let u i,A be the eigenvector of A corresponding to eigenvalue λ i,A . Then, by definition, Au i,A = λ i,A u i,A (because A is symmetric in our case). Now, Thus, u i,A is also an eigenvector of S, and the corresponding eigenvalue is (1 − δ + βλ i,A ). 2