A note on asymptotic distributions in directed exponential random graph models with bi-degree sequences

ABSTRACT The asymptotic normality of a fixed number of the maximum likelihood estimators (MLEs) in the directed exponential random graph models with an increasing bi-degree sequence has been established recently. In this article, we further derive a central limit theorem for a linear combination of all the MLEs with an increasing dimension. Simulation studies are provided to illustrate the asymptotic results.

The degree sequence is one of the most important graph statistics and has been extensively studied (Britton et al., 2006;Blitzstein and Diaconis, 2011;Bickel et al., 2011;Zhao et al., 2012;Hillar and Wibisono, 2013;Rinaldo et al., 2013). When the degree sequence fully captures the information of a graph, the Koopman-Pitman-Darmois theorem forces the probability distribution on graphs to be an exponential form, which is called β-model for binary edges by Chatterjee et al. (2011) and "maximum entropy models" for weighted edges by Hillar and Wibisono (2013). The asymptotic theory of the maximum likelihood estimators (MLEs) for these models has been derived recently (Chatterjee et al., 2011;Hillar and Wibisono, 2013;Yan and Xu, 2013;Yan et al., 2015). In the directed case, there is one out-degree and one in-degree for each vertex, referred to the bi-degree. Yan et al. (2016) have given a rigorous analysis for directed ERGMs with the bi-degree sequence as the exclusively sufficient statistics, including establishing the uniform consistency of the MLE and the asymptotic normality for a fixed number of the MLEs when the edges take the binary, continuous, and infinite discrete values.
In this article, we further establish a central limit theorem for a linear combination of all the MLEs in directed ERGMs with an increasing bi-degree sequence, built on the work of Yan et al. (2016). Obviously, the previous results on the fixed number of the MLEs are one specific case of the latter.
For the remainder of this article, we proceed as follows. In Section 2, we present the main results. Simulation studies are given in Section 3. We make some discussion in Section 4. Since the proofs for the Theorems 2 and 3 are similar to that of Theorem 1, we only show the proof of Theorem 1 in Section 5.2 in Section 5. The proof of the Theorems 2 and 3 are relegated to the Online Supplementary Material.

Main results
First, we give a brief description of directed ERGMs with the bi-degree sequence as the exclusively sufficient statistics. Consider a directed graph G on n ≥ 2 vertices labeled by 1, . . . , n. Let a i, j ∈ is the weight of the directed edge from i to j, where ∈ R is the set of all possible weight values and A = (a i, j ) is the adjacency matrix of G. We assume that there are no selfloops, i.e., a i,i = 0. Let σ be a σ -algebra over the set of weight value and υ be a canonical σ -finite probability measure on ( , σ ). If is discrete, then υ is the counting measure; if is continuous, then υ is the Borel measure. Let υ n(n−1) be the product measure on n(n−1) . The density or probability mass function on G parameterized by the exponential family distribution with respect to some canonical measure υ is where Z(α, β) is the log-partition function, α = (α 1 , α 2 , . . . , α n ) and β = (β 1 , β 2 , . . . , β n ) are parameter vectors. This model can be viewed as a generalized model of the p 1 model by Holland and Leinhardt (1981) without the reciprocity to weighted edges, and as a directed version of the β-model. Hereafter, we will refer to model (1) as the p 0 model, where the subscript "0" means the simpler model than the p 1 model.
Let θ = (α 1 , . . . , α n , β 1 , . . . , β n−1 ) and g = (d 1 , . . . , d n , b 1 , . . . , b n−1 ) . Let θ be the MLE of θ. Denote by V the Fisher information matrix of the vector parameter θ. In order to solve the asymptotic distribution of θ with a fixed k, Yan et al. (2016) have proposed an approximate inverse of V to derive its approximately explicit expression. Here, the approximate inverse is used for representing the linear combination of all the MLEs as a function of the linear combination of all the bi-degrees with the coefficients {λ i } n i=1 and {κ j } n j=1 . Details are given in Section 5. Consider the condition: The above conditions imply ∞ i=1 λ 2 i < ∞ and ∞ j=1 κ 2 j < ∞ that is common in study of asymptotic distribution for a weighted sum of a sequence of independent random variables. The central limit theorems for the linear combination of all MLEs in the case of binary weight, continuous weight, and discrete weight are given in the following three subsections, respectively.

Binary weights
In this subsection, we assume that network edges take values from the set = {0, 1}, υ is the counting measure, and a i, j , 1 ≤ i = j ≤ n are mutually independent Bernoulli random variables with The log-partition function Z(θ) is i = j log(1 + e α i +β j ) and the likelihood equations are Yan et al. (2016) shows that the diagonal elements of V are Let θ * denote the true parameter vector. The central limit theorem for the linear combination of all the estimatorsα andβ is stated as follows, whose proof is given in Section 5.2 in Section 5.
is asymptotically normal with mean 0 and variance

Continuous weights
In the case of continuous weights, i.e., = [0, ∞), υ is the Lebesgue measure, and a i, j , 1 ≤ i = j ≤ n are mutually independent exponential random variables with the density : and the natural parameter space is To follow the tradition that the rate parameters are positive in exponential families, we take the transformation θ = −θ, α i = −α i , and β j = −β j . The corresponding natural parameter space then becomes Here, we denote byθ the MLE ofθ. The log-partition Z(θ) is i = j log(ᾱ i +β j ) and the likelihood equations are Letθ * denote the true parameter vector. The central limit theorem for the linear combination of all the estimatorsα andβ is stated as follows, whose proof is given in Online Supplementary Material.

Discrete weights
In the case of discrete weights, i.e., = N 0 , υ is the counting measure, and a i, j , 1 ≤ i = j ≤ n are mutually independent geometric random variables with the probability mass function: ) and the likelihood equations are Yan et al. (2016) shows that the diagonal elements of V are Letθ * be the true parameter vector. The central limit theorem for the linear combination of all the estimatorsα andβ is stated as follows, whose proof is given in Online Supplementary Material.
is asymptotically normal with mean 0 and variance where H i = (v i,i /v 2n,2n ) 1/2 and H n+ j = (v n+ j,n+ j /v 2n,2n ) 1/2 . Remark 1. By choosing special cases of the sequence {λ i } ∞ i=1 and {κ j } ∞ j=1 , we can obtain the central limit theorem for any combinations of fixed number of MLEs or a linear combination with a growing number of the MLEs. For example, if we let (λ k+1 , λ k+2 . . .) = (0, 0 . . .) with a fixed k, then the theorems cover the results of Yan et al. (2016). And we let all κ j = 0, we get the central limit theorem for the linear combination of the estimator of the out-degree parameter vector, i.e., n Remark 2. We don't take a direction linear combination of θ, i.e., i λ i (θ i −θ * i ). By Yan et al. (2016), the asymptotic variance ofθ i is 1/v 1/2 ii . It may lead to different convergence rates for θ i s. Therefore, it is natural to utilize the linear combination of their normalized versions, i.e.,

Simulations
In this section, we assess the approximation performance of the linear combination of all MLEs based on the Theorems 1-3 through numerical simulations. The parameter settings are similar to those of Yan et al. (2016). For the coefficient sequence, we set λ i = i −2 , κ j = j −2 , whose absolute sum converges. The parameter setting took a linear form. Specially, for the case with binary weight, we set α * i+1 = iL/(n − 1) for i = 0, . . . , n − 1; for the case with discrete weights, we setᾱ * i+1 = 0.2 + (n − 1 − i)L/(n − 1) for i = 0, . . . , n − 1. In both cases, we considered four different values for L = 0, log(log(n)), log(n) 1/2 , and log(n). For the case with continuous weights, we considered: L = 0, log(log(n)), log(n), and n 1/2 . For the parameter values ofβ, letβ * i =ᾱ * i , i = 1, . . . , n − 1 for simplicity andβ * n = 0 by default. Note that by Theorems 1-3, the quantile-quantile (QQ) plots of the empirical distributions of , 3 against the standard normal distribution were drawn to evaluate the asymptotic approximation. We ran 10,000 simulations for each scenario.
We simulated three values 100 and 200 for the number of vertices n. When L = log(n) the MLE in the case of binary and discrete weights failed to exist with 100% frequencies. Therefore, the QQ plots were not available under these cases. This also demonstrated the previous findings of Yan et al. (2016). On the other hand, when L = log(log(n)), the frequencies that MLE did not exist are 0.3%, 0.54% for n = 100 in the case of binary weights and discrete weights, respectively. When L = [log(n)] 1/2 , the frequencies that the MLE did not exist are 15:86%; 1:02% for n = 100; 200 in the case of binary weights, respectively; When L = [log(n)] 1/2 , the frequencies that the MLE did not exist are 56:83%; 12:63% for n = 100; 200 in the case of discrete weights, respectively. In the other setting, we did not observe the non existence of the MLE.
The QQ plots for n = 100 and n = 200 are similar, and we only show those for n = 200 in Figure 1 to save space. The horizontal and vertical axes are the theoretical and empirical quantiles, respectively. The gray lines correspond to the reference line y = x. In the first and second rows of Figure 1, the sample quantiles coincide with theoretical ones very well. In the continuous weights case, the plots under log(log(n)), log(n) were similar and only those for log(log(n)) are shown here. On the other hand, in the third row, the approximation of asymptotic normality is also good when L = 0, log(log(n)) while there is notable derivations when L = [log(n)] 1/2 .

Discussion
In this article, we have derived the asymptotical normality for a linear combination of all the MLEs in the directed ERGMs with an increasing bi-degree sequence. The results can be used to construct a confidence interval for all the parameters simultaneously. For example, an approximate 1 − α confidence interval for certain parameter vector θ * is where z α is the α−quantile of the standard normal distribution and σ k is the variance of the linear combination given in Theorems 1, 2, and 3, respectively. The absolute sum of the coefficient of the linear combination is assumed to be bounded. In order to see whether this condition can be relaxed, we made some additional simulations by letting λ i = i −1 , κ j = j −1 and kept other parameters unchanged in the simulation of the Section 3. The quadratic sums of {1/i} ∞ i=1 and {1/ j} ∞ j=1 converge, but their absolute sum diverges. Some results (not shown) indicate that the approximation of asymptotic normality is still good. Therefore, it would be of interest to see whether condition (2) could be relaxed.
The conditions imposed on Q n and q n may not be best possible. For instance, in the case of the continuous weights, we assume Q n /q n = o(n 1/50 /(log n) 1/25 ) while the QQ plots when L = log(log(n)) in Figure 1 are still good. This suggests that the condition on Q n might be relaxed. A similar phenomena can also be observed in the other two cases. It should be noted that the asymptotic behavior of the MLEs depends not only on Q n and q n , but also on the configuration of all the parameters. It would be of interest to investigate this problem.

Proof of theorems
In this section, we give proofs for the lemmas and theorems presented in Section 2.

Preliminaries
We start with some notations. For a vector x = (x 1 , . . . , x n ) ∈ R n , denote by x ∞ = max 1≤i≤n |x i | the ∞ -norm of x. For an n × n matrix J = (J i, j ), let J ∞ denote the matrix norm induced by the ∞ -norm on vectors in R n , i.e., We introduce a class of matrices. Given two positive numbers m and M with M ≥ m > 0, we say the (2n − 1) × (2n − 1) matrix V = (v i, j ) belongs to the class L n (m, M) if the following holds: Yan et al. (2016) propose to approximate the inverse of V , V −1 , by the matrix S = (s i, j ), which is defined as where δ i, j = 1 when i = j and δ i, j = 0 when i = j. We define another matrix norm · for a matrix A = (a i, j ) by A := max i, j |a i, j |. To quantify the accuracy of this approximation, Yan et al. (2016) have proved the following proposition.
where c 1 is a constant that does not depend on M, m, and n.
Before proving Theorem 1, we show two lemmas below.

Proof. Note that
Similar to Yan et al. (2016), we have Therefore, by Lemma 8 of Yan et al. (2016), we have Consequently, The following lemma is due to Theorem 1 in Yan et al. (2016).
Lemma 2. Assume that θ * ∈ R 2n−1 with θ * ∞ ≤ τ log n where 0 < τ < 1/24 is a constant, and that A ∼ P θ * , where P θ * denote the probability distribution (1) on A under the parameter θ * . Then, as n goes to infinity, with probability approaching one, the MLE θ exist and satisfies Further, if the MLE exists, it is unique.