Network Functional Varying Coefficient Model

Abstract We consider functional responses with network dependence observed for each individual at irregular time points. To model both the interindividual dependence and within-individual dynamic correlation, we propose a network functional varying coefficient (NFVC) model. The response of each individual is characterized by a linear combination of responses from its connected nodes and its exogenous covariates. All the model coefficients are allowed to be time dependent. The NFVC model adds to the richness of both the classical network autoregression model and the functional regression models. To overcome the complexity caused by the network interdependence, we devise a special nonparametric least-squares-type estimator, which is feasible when the responses are observed at irregular time points for different individuals. The estimator takes advantage of the sparsity of the network structure to reduce the computational burden. To further conduct the functional principal component analysis, a novel within-individual covariance function estimation method is proposed and studied. Theoretical properties of our estimators, which involve techniques related to empirical processes, nonparametrics, functional data analysis and various concentration inequalities, are analyzed. We analyze a social network dataset to illustrate the powerfulness of the proposed procedure.


Introduction
Functional linear regression is an important statistical tool for exploring regression relationships of data in the form of functions. Since being introduced by Ramsay and Dalzell (1991), it has been further studied by many researchers (Faraway 1997;Chiou, Müller, and Wang 2003). In practice, the model has been successfully applied to a wide range of fields, which include gene expression analysis (Leng and Müller 2006), image data modeling (Zhu, Li, and Kong 2012), and longitudinal data analysis (Müller 2005;Huang et al. 2014).
The original form of the functional linear regression model assumes a linear relation between a functional response and many exogenous covariates. Since the model was proposed, it has been extended in various ways. For instance, Yao, Müller, and Wang (2005b) and Ferraty, Van Keilegom, and Vieu (2012) incorporated functional covariates and investigated corresponding asymptotic properties. Hall and Hosseini-Nasab (2006) studied the properties of functional principal component analysis. The additive structure and mixture components were considered by Müller and Yao (2008) and functional curves. This is timely and critically important as an increasing amount of network data is emerging in numerous applications (Zhao, Levina, and Zhu 2012;Chen, Chen, and Xiao 2013;Sojourner 2013;Härdle, Wang, and Yu 2016;Zou et al. 2017).
To incorporate the network dependence structure into functional data modeling, we employ the idea of the network autoregression model. Consider a network with N nodes, indexed with i = 1, · · · , N. For the ith node, the response is recorded as Y i (t) and the associated covariate vector is X i . The network nodes are linked by the network relationships, which is recorded by an adjacency matrix A = (a ij ) ∈ R N×N . Specifically, let a ij = 1 if there is a link from the ith node to the jth node; otherwise, a ij = 0. We model Y i (t) by a linear combination of several factors. The first factor is X i , which is consistent with the classical functional linear regression models. The second factor is the overall response from the node's connected friends, that is, j a ij Y j (t), which is referred to as the network autoregression term and quantifies the network influence. We would like to note that this term creates challenges in model estimation, since the responses of the focal node and its connected friends may not be observed at the same time point t. As a result, conventional estimation methods do not apply, and novel estimators need to be devised. In addition to these two important factors, to handle the special feature of functional data, we also consider the withincurve dependence using functional principal components.
The classic network autoregression model has been applied in the network regression literature. For instance, in social network analysis, researchers found that a user's posting behavior is positively related to his/her following friends' activities (Zhu et al. 2017Huang et al. 2020). In financial risk management, there is evidence showing that a market disturbance spreads through stock networks (Acemoglu, Ozdaglar, and Tahbaz-Salehi 2015;Zou et al. 2017;Zhu et al. 2019). However, several important limitations are worth noting. First, the model coefficients are assumed to be fixed across time. Therefore, the model cannot reflect time-varying regression patterns. Second, the model assumes the response Y i (t) to be observed across all individuals at the same time point t. As a result, it does not allow for responses observed at irregular time points. Finally, the model does not consider potential withincurve time dependence for Y i (t).
Therefore, we propose a network functional varying coefficient (NFVC) model, which simultaneously incorporates timevarying coefficients, irregular observation time points and the network dependence structure. To perform estimation, a novel nonparametric conditional least-squares estimation (LSE) approach is studied, which is related to, but very different from, the LSE for cross-sectional network autoregression model (Zhu et al. 2019). The estimator can reduce the computational burden, especially for sparse networks. Subsequently, given the estimated model, functional principal component analysis (FPCA) is conducted to further account for the withinindividual dependence. Therefore, we devise a novel nonparametric method for estimating the covariance function under the network dependence structure, which is crucial for the FPCA procedure. The asymptotic properties are investigated for both functional parameter estimation as well as FPCA results, which prompts us to assemble tools related to empirical process, nonparametric analysis, and various concentration inequalities on martingale sequences. Simulation studies demonstrated the satisfactory finite sample performance of our procedure. Last, we illustrate the proposed method by a real data example.
The rest of the article is organized as follows. We first describe the newly proposed model and define some notations in Section 2. We describe the parameter estimation procedure and the related theoretical results in Section 3. The estimation and the theory regarding the covariance function and functional principal component analysis are studied in Section 4. The simulation studies are illustrated in Section 5, and we present a real data example in Section 6. We conclude the paper in Section 7. All detailed theoretical proofs are provided in the supplementary material.

Main Model
We consider a network with N nodes in the time period 0 ≤ t ≤ 1. For each node i, we assume the observations are sampled from a functional trajectory Y i (t). To describe the link between different nodes, we use an adjacency matrix A = (a ij ) ∈ R N×N , where a ij = 1 if the ith node follows the jth node; otherwise, a ij = 0. Following the convention, we let a ii = 0. To capture the network dependence, we propose an NFVC model In Equation (1), the first term X i β(t) describes the nodal effect of its own features X i . Specifically, X i ∈ R q is an exogenous covariate vector and the associated regression parameter function β(t) ∈ R q captures time-varying effects. The second term is the network effect, which reflects the influences of the connected nodes at the same time point. Here m i = N j=1 a ij , reflecting the out-degree of node i. Hence, m −1 i j a ij Y j (t) characterizes the average responses of the ith node's connected friends. The network connection strength is quantified by ρ(t). Last, η i (t)+ε i (t) is the innovation term, which is further decomposed into two parts. The first part η i (t) captures the time dependence across 0 ≤ t ≤ 1 while the second part ε i (t) is assumed to be i.i.d. As a result, without the network effect, the NFVC model will degenerate to a classic functional regression model (Chiou, Müller, and Wang 2003;Yao, Müller, and Wang 2005a;Li and Hsing 2010;Zhu, Li, and Kong 2012). For the ith individual, we observe the nodal covariate X i and corresponding responses at time points We treat X i as non-stochastic; hence, we work in the fixed design context. The model can be easily generalized to incorporate functional covariates X i (t); see Section C.1 in the supplementary material. We also assume we observe the network connections a ij for i, j = 1, . . . , N.
Without loss of generality, we assume X i is standardized, so Next, following the convention of functional regression (Yao, Müller, and Wang 2005a), we express η i (t) as a linear combination of functional principal components. Specifically, where b ik s are centered uncorrelated random variables with E(b 2 ik ) = λ k , and φ k (t) are the corresponding orthogonal eigenfunctions. Define C(s, t) = cov{η i (s), η i (t)} and then we have C(s, t) = ∞ k=1 λ k φ k (s)φ k (t). Following the convention (Hall and Hosseini-Nasab 2006;Li and Hsing 2010), we assume the nonzero λ k is distinct. In addition, η i (·) and η j (·) are independent of each other for i = j. Last, we assume {ε i (t)} to be an independent Gaussian process with var{ε i (t)} = σ 2 ε .

Notations
For convenience, we define the following notations. Let Then Equation (1) can be expressed using a matrix form as Let S(t) = I N −ρ(t)W and Y * (t) = S −1 (t)Xβ(t)+S −1 (t)Y(t), then Equation (2) further leads to S(t)Y(t) = Xβ(t) + Y(t) + E(t) and We collect all functional parameters by Remark 1. The model Equation (2) can be seen as a generalization of the autoregressive model in the time series analysis literature to the network/spatial data setting. To guarantee the model stationarity, we typically require sup t |ρ(t)| < 1 (Lee and Yu 2009;Yu and Lee 2010;Lee and Yu 2013). As a result, we can express the stationary solution of Equation (2) as For an arbitrary matrix M = (m ij ) ∈ R p×p , let λ i (M) be the ith eigenvalue of M, where |λ 1 (M)| ≥ |λ 2 (M)| ≥ · · · ≥ |λ p (M)|. In addition, let M F = tr(M M) 1/2 be the Frobenius norm of the matrix. For any vector v ∈ R p , let v −i ∈ R p−1 be the vector excluding the ith element. Let e i be the vector with its ith element 1 and all others 0. For an arbitrary function

An LSE Method
We first consider a hypothetical situation, where Y i (t) is available at the same time t for all i. We can then rewrite Equation (2) as We illustrate two methods for estimating θ (t) in this setting. The first is a quasi-maximum likelihood estimator (Lee 2004a), obtained by maximizing with respect to θ(t). Although this estimator is statistically efficient and stable in many scenarios, it requires computing the determinant of S(t), which is an N × N matrix and hence the computation is expensive. A computationally more feasible alternative, proposed in Huang et al. (2019) and Zhu et al. (2020), is to minimize with respect to θ(t), where D(t) = [diag{M(t)}] −1 and M(t) = I N + ρ 2 (t)W W. To understand Equation (5), we can verify that under normality of E * (t), we can write is the vector Y(t) with its ith component excluded. Hence, we can view Equation (5) as a composite maximum likelihood estimator of θ (t), although its validity does not rely on the normal assumption.
Remark 2. As noted in Huang et al. (2019) and Zhu et al. (2020), the main advantage of Equation (5) is to consider both the firstand second-order friends in the computation. To see this, we expand Equation (5) as We observe that for each node i, Q{θ(t)} involves only two types of nodes, the first-order friends {j : w ij = 0 or w ji = 0} and the secondorder friends {j : N j =1 w j i w j j = 0}. As a consequence, the computation is much less demanding, especially when the network is sparse. Here, "sparse" means that the network density (i.e., i,j a ij /(N 2 − N)) is small, which implies fewer firstand second-order friends in general. Due to the computational simplicity, Equation (5) forms the backbone of our proposed estimator.
Remark 3. In practice, the adjacency matrix A may vary with time t since the network connections may change over time, that is, A = A(t) and W = W(t). To handle this, we need to replace W with W(t) in the objective function Equation (5). The subsequent estimation will then be slightly more complex since we need to smooth W(t) together with the responses. We provide the details of the estimation in Section C.2 of the supplementary material.

Kernel-Based Least-Squares Estimator (KLSE)
Minimizing Equation (5) is of course not feasible because at any t, we do not necessarily observe the whole vector Y(t).
To overcome the hurdle that the observations are obtained at irregular time points for different individuals, we use the kernel idea to localize the individual terms in Equation (5)

. Specifically, let U i {ρ(t)} ≡ D(t)S (t)S(t)e i and ξ {θ(t)} ≡ D(t)S (t)Xβ(t).
Then, the objective function Q{θ(t)} can be rewritten as Because Y i (t) may not be available, a natural idea is to replace Y i (t) by its neighbors using kernel method. We now investigate the details of this approach. We first note that by Equation (3) and the definition of Y * (t), which is a better target function to minimize than and Inserting these approximations into Equation (7), we obtain the kernel smoothed objective function where We define the minimizer of Equation (10), that is, θ (t) = arg min θ(t) Q{θ (t)}, as the kernel-based least-squares estimator (KLSE).
Remark 4. Note that we especially exclude the square terms Y i (t ii ) 2 in Equation (8) because such terms generate the error variance σ 2 ε , which should be zero in the denoised version Q * {θ(t)}.

Asymptotic Theory
We now show that KLSE is consistent. We require a set of regularity conditions. The first two conditions are imposed on observations and functional coefficients. The third condition concerns the network structure. The last three conditions are related to the covariates and kernel estimation.
In addition, the supreme norm of these functions, of their firstand second-order derivatives are all bounded by a finite constant. Furthermore inf t∈ [0,1] (C3.1) (Connectivity) Treat W as a transition probability matrix of a Markov chain, whose state space is defined as the set of all the nodes in the network (i.e., {1, . . . , N}). Assume the Markov chain is irreducible and aperiodic. Define π = (π 1 , . . . , π N ) ∈ R N as the stationary distribution of the Markov chain, such that (a) π i ≥ 0 and bounded, and of bounded variation, and it is supported on Xβ(t), X). There exist positive constants δ and τ such that and Condition (C1) ensures that the observed time points are sufficiently homogeneously spread out on the entire time domain instead of clumping in some regions. Condition (C2) requires related functions to be bounded and sufficiently smooth. Condition (C3) puts various constraints on the network structure W, all are routinely required in standard network literature (Zhu et al. 2017Huang et al. 2020). Specifically, Condition (C3.1) requires the network to satisfy a certain degree of connectivity. This is a weak condition and can easily hold for networks in reality (Newman, Barabasi, and Watts 2011). Condition (C3.2) restricts the extent of the heterogeneity of the nodes in the network. Specifically, e i W k 1 can be treated as the kth order degree of the node i. Hence, it restricts the "superstar" effects in the network. In addition, the weak network dependence is guaranteed by restricting the largest absolute eigenvalue of W. Last, Condition (C3.3) requires certain convergence of the network structures, which is also assumed in the crosssectional case . Condition (C3.3) is also verified numerically in Section C of the supplementary material.
Next, Condition (C4) can be seen as a law of large number type assumption on X (Lee 2004b;Zhu et al. 2020). Condition (C5) imposes requirements on the number of time points and the bandwidth. In general, a larger N results in a smaller h, which then leads to a larger number of time points (i.e., larger n min ) to make valid estimation (Zhu, Li, and Kong 2012). As outlined in Zhang and Wang (2016), we consider here an ultra-dense functional data scheme. This assures the parametric functional data convergence rate and leads to ignorable asymptotic bias. Condition (C5) and Condition (C3.2) lead to both weak time and network dependences. Condition (C6) is a standard requirement on the kernel function. Last, Condition (C7) contains the identifiability requirement of the parameters, which restricts the multicollinearity effect of X * (t). In addition, the network effect is restricted within (−1, 1) to guarantee the stationarity of the process over the whole network (Liu 2014;Yang and Lee 2017;Zhu et al. 2020). Similar identifiability conditions are widely used in the literature (Yang and Lee 2017;Zhu et al. 2020).
With the above preparation, we can establish the consistency and asymptotic normality of the estimator θ(t).
The proof is given in Section A.4 in the supplementary material. By Theorem 1, the convergence rate N −1/2 is slower than the standard nonparametric rate (Nn min h) −1/2 , which is caused by the within-curve time dependence. Under weak dependence both in time and across the network, the result of Theorem 1 shows that the estimator attains the parametric rate, and there is also no asymptotic bias, which agrees with the result under ultra-dense functional data discussed by Zhang and Wang (2016). In addition, the convergence rate here agrees with the theoretical result in the classical functional data case without a network structure (Li and Hsing 2010;Zhu, Li, and Kong 2012).

Functional Principal Component Analysis (FPCA)
After conducting parameter estimation, we proceed to perform the FPCA. The core part of FPCA is estimating the covariance function C(s, t). The spectral decomposition of C(s, t) is where λ 1 ≥ λ 2 ≥ · · · ≥ 0 are ordered eigenvalues satisfying ∞ k=1 λ k < ∞, and φ k (t) are the corresponding orthonormal eigenfunctions. By the orthonormal property of the eigenfunctions, we have C(s, t)φ k (s)ds = λ k φ k (t). Using this relationship, once the estimation of the covariance function C(s, t) is obtained, the eigenvalues and the eigenfunctions can be subsequently estimated by discretizing the smoothed covariance function; see discussions by Rice and Silverman (1991) and Capra and Müller (1997).

Estimation of the Covariance Function
To estimate the covariance function C(s, t), a typical approach is to first estimate the individual function η i (t), denoted by η i (t), and then estimate C(s, t) using C (s, t) (Yao, Müller, and Wang 2005a;Zhu, Li, and Kong 2012). However, this is infeasible in our setting when there exists network dependence and different nodes are observed at different points. Specifically, to estimate η i (t), we typically use However, Y j (t ik ) is not generally available because we allow irregular and node-specific observation times.
We thus estimate C(s, t) through an alternative approach.

Note that cov{E * (s), E * (t)} = cov{S(s)Y(s)−Xβ(s), S(t)Y(t)− Xβ(t)} = {C(s, t)
Because our goal is in estimating C(s, t) without the additional error variance term σ 2 ε I(s = t), we take advantage of the uncorrelated error property, and directly exclude the possibility of generating error variance σ 2 Because the summation inv jk (s, t) does not include the case (j, j ) = (k, k ), it eliminates σ 2 ε explicitly. Then we estimate C(s, t) by To establish the convergence result of C(s, t), we first establish a uniform convergence result for θ (t), which strengthens the corresponding result in Theorem 1.
The proof of Theorem 2 is given in Section A.5 in the supplementary material. Note that the convergence rate of sup t θ (t)− θ (t) is slightly slower than that in Theorem 1, which agrees with the result in the independence case (Zhu, Li, and Kong 2012). Moreover, the convergence rate is also related to the network dependence. Specifically, a higher extent of heterogeneity (i.e., larger K w ) will result in a slower convergence rate. Using Theorem 2, we can establish the uniform convergence for C(s, t).
The proof of Theorem 3 is given in Section A.6 in the supplementary material. Compared to the independent case (Li and Hsing 2010;Zhu, Li, and Kong 2012), the convergence rate of C(s, t) is also slower due to the existence of the network dependence effect. Because of the network dependence, which propagates to all aspects of the proof, in combination with the time correlation within each node, Theorems 2 and 3 are very difficult to establish. The proofs involve empirical process treatments, nonparametric analysis, and various concentration inequalities on martingale sequences. In fact, a Bernstein's type inequality is further established under network condition (C3) in Lemma 6 and Lemma 10 in the supplementary material, which may be of independent interest.

Convergence of FPCA
Once the uniform convergence result is established for C(s, t), we proceed to analyze the convergence of FPCA by using Equation (13). Note that φ j (t) and φ j (t) are identifiable up to a sign change. Following the convention (Hall and Hosseini-Nasab 2006), we allow φ j (t) to take an arbitrary sign but we choose the sign of φ j such that φ j − φ j is minimized. We establish the convergence of the eigen-decomposition in Corollary 1.
Corollary 1. Assume Conditions (C1)-(C7) to hold. Let J 0 be a positive constant and let λ 1 > λ 2 > · · · λ j > · · · . In addition, for 1 ≤ j < k ≤ J 0 , assume min λ j ,λ k >0 (λ j − λ k ) ≥ C > 0. Let δ N be the same as in Theorem 3. Then, for any j (1 ≤ j ≤ J 0 ), Compared to the conventional approaches, the FPC analysis is conducted under the effects of both exogenous covariates X and network dependence. Similar to the conclusion in the independent case, the convergence rate is the same as the estimated covariance function in Equation (16).
To summarize, the theoretical contributions in this paper are mainly in three aspects. First, we establish the uniform convergence of the functional model parameter θ (t) under the network dependence structure. Second, we establish the uniform convergence of the covariance function by incorporating the network dependence measure. Third, we establish the convergence of eigenvalues and eigenfunctions, which further leads to the convergence of the functional principal component scores. As a result, this enables us to perform predictions, which are discussed in detail below.

Estimation of FPC Scores and Selection of Number of Eigenfunctions
Following the standard literature (Yao, Müller, and Wang 2005a), assume that the process η i (t) can be well approximated by using the first K eigenfunctions, that is, The eigenfunctions φ k (t) is estimated by the eigen-decomposition of C(s, t) and the FPC score b ik is subsequently obtained.
To obtain the result, we first set n grid points as v 1 < v 2 < · · · < v n , which are equally spaced within Yao, Müller, and Wang (2005a), the FPC scores b ik can be estimated using conditional expectation on η i as The trajectory η i (t) that we aim to predict can be expressed as η * and ζ h,i (v k ) given in Equation (9). As a result, we are able to predict η i (t) by using the first K eigenfunctions as η K We establish the theoretical properties of b ik and η K i (t) in Corollary 2.

Corollary 2. Assume Conditions (C1)-(C7) and n = O(1).
Then lim N→∞ b ik = b ik with probability tending to 1. In addition, for all t ∈ (0, 1), The proof of Corollary 2 is provided in Section A.8 in the supplementary material. By Corollary 2, the predictor η K i (t) behaves as well as the best linear estimator η * i (t). To select the number of eigenfunctions K, we can adopt AIC and BIC type criteria (Yao, Müller, and Wang 2005a). To this end, we write the profiled pseudo-Gaussian log-likelihood given all estimated FPC scores b ik as ,1 (t), . . . , ζ h,N (t)) , and b k = ( b 1k , . . . , b Nk ) . Then, the AIC and BIC objective functions are AIC (K) = −2 (K) + 2K and BIC (K) = −2 (K) + KC N , where C N is a function of N and C N /N → 0 (Zhu, Zhu, and Feng 2010;Cai, Li, and Zhu 2020). In practice, we recommend choosing C N as N 0.8 , which works very well in our simulations. The number of eigenfunctions is then selected by minimizing the objective functions as K AIC = arg min K AIC (K) and K BIC = arg min K BIC (K). Importantly, during this process, the first additive term can be discarded so we do not need to compute the determinant of the large matrix S(v l ).
Under these parameter settings, we generate the responses as follows. First we partition [0, 1] into equally spaced grids V = {v 1 , v 2 , . . . , v n max } with v 1 = 0 and v n max = 1. On the grids, we generate the responses Y i (v j ) for 1 ≤ i ≤ N and 1 ≤ j ≤ n max using the NFVC model Equation (2). Specifically, the covariates X i and the measurement errors ε i (t) are independently sampled from N(0, I 2 ) and N(0, σ 2 ), respectively, where we set σ = 0.2. Next, for the ith node, we randomly delete n * i responses on the grids, where the number n * i is uniformly sampled from 0 ≤ n * i ≤ 10. As a result, the number of observed responses for each node grows in the same order as n max . Last, we describe the generating mechanisms of the two types of networks below, and describe an additional experiment to study the network sparsity and its impact.
Scenario 1: (Stochastic Block Model) We first consider the stochastic block model, which is one of the most popular network structures in the literature (Wang and Wong 1987;Nowicki and Snijders 2001). To generate the model, we randomly assign a block label k (k = 1, 2, . . . , K) for each node, where K is set to 10. Let pr(a ij = 1) = 0.9N −1 if i and j belong to the same block, and pr(a ij = 1) = 0.3N −1 otherwise. Consequently, nodes with the same block labels are more likely to be connected.
Scenario 2: (Power-Law Distributed Network) We next consider the power-law distributed network, which models the situation where the majority of nodes have relatively few links but a small proportion (i.e., celebrities) possess a large number of links (Barabási and Albert 1999). The model assumes that the degrees follow the power-law distribution (Clauset, Shalizi, and Newman 2009). Specifically, we generate the in-degree m i = j a ij for node i from the discrete power-law distribution, that is, P(m i = k) = ck −α , where c is the normalizing constant and α = 2.5. Then, we randomly select m i nodes for the ith node as its followers.

Performance Measurements and Simulation Results
To gauge a reliable evaluation of the simulation study, we repeat the experiments for 500 times. The network size is set to be N = 100, 200, and 500, with n max = 50, 100, and 200, respectively. For simplicity, we select the bandwidth using the rule of thumb method (Li and Racine 2007). To estimate the functions, we follow the standard functional data estimation procedure in Yao, Müller, and Wang (2005a).
We first evaluate the estimation accuracy of the functional coefficients for Scenario 1-2. To measure the estimation accuracy, we consider the integrated mean squared error (IMSE) for the curves, which is defined as 1 0 { θ j (t)−θ j (t)} 2 dt when estimating θ j (t). We estimate the integral by discretizing the interval [0, 1] into 20 equally spaced intervals. Similarly we define the IMSEs for the covariance function and the eigen-functions, and the mean squared error (MSE) for the eigenvalues. We also use supremum norm (SUP) to evaluate the performance of the functional estimators, that is, where we also discretize the interval [0, 1] into 20 equally spaced intervals to facilitate the analysis. The SUPs for C(s, t) and φ j (t) are defined similarly.
The simulation results are summarized in Table 3. We can see that, as the network size N increases, the estimation accuracy grows higher. For instance, as N increases from 100 to 500, the IMSE for ρ(t) decreases from 1.31×10 −3 to 0.026×10 −3 for the power-law distributed network. The estimated parameter functions are given in Figures E.1 and E.2, and the estimated eigenfunctions are presented in Figure E.3, all of which are given in the supplementary material. Likewise, as the sample size increases, the estimated parameters are closer to the true values and the estimation tends to be more stable with narrower confidence bands. This corroborates with the theoretical findings. To see the computational performance in Scenario 3. We vary (α, ν) from (2, 0.9) to (10, 0.5). Accordingly, the network density  1927 0.1923 0.1920 0.1920 0.1921 increases from 0.018 to 0.384. As we can observe from Figure  E.4, the computational time (measured in seconds) increases as the network becomes denser, where we set N = 200 and n max = 100.
We further compare the prediction accuracy of the proposed method with the network autoregression (NAR) model in Zhu et al. (2017). Since the NAR model is a time series model and not built for functional data, we first smooth the observations for each subject. Subsequently, we split the time interval [0, 1] into 20 equal length sub-intervals and choose the median point in each interval as the observation time. Then, we use the grids (1, 3, 5, . . . , 19) for training, and test the prediction power on grids (2, 4, . . . , 20) for both methods. We summarize the average prediction errors for the stochastic block network and the power-law distributed network in Table 1. The prediction error is calculated with the IMSE, and the BIC criterion is used for selecting the number of principal components. Compared to Zhu et al. (2017), the proposed NFVC model clearly has higher prediction accuracy. In addition, the prediction error of the NFVC model decreases as the number of observations increases.
Last, we investigate the performance of AIC and BIC in determining K in Table 2. As expected, AIC always chooses a larger number of eigenfunctions compared to BIC. To aim for a parsimonious model, we recommend using BIC. We also perform sensitivity analysis for different choices of K. Specifically, we evaluate the prediction errors of our method by increasing K from 1 to 6, as demonstrated in Table 2. It can be seen clearly that the prediction error is not sensitive to the choice of K for K ≥ 2.

Social Network Data Analysis
We now use the tools developed so far to analyze the behavior of users on social media using a social network dataset. The data was collected from Sina Weibo, which is similar to Twitter and is the largest online social media in mainland China. Our dataset contains the posting behavior of N = 850 active users from Jan. 2nd, 2014 to Jan. 9th, 2014. Hence, the time period includes a week starting on a Thursday.
The response Y i (t ik ) is the number of posts of the ith user in the kth hour. Y i (t) reflects the activeness of the ith social network user, which is assumed to have a smooth trajectory of time t. As an example, Figure E.5 in the supplementary material contains a snapshot of two Weibo posts from one user in the dataset. In addition, we select one user "BreakingNews" and two of his following friends. All three users have backgrounds in social media and economics. The posting times of the three users are visualized in Figure 1, who tend to post more frequently during the daytime. In addition, the active periods of the three users are similar, which indicates the user may share similar dynamic patterns with the user's connected friends. Next, the total number of posts by each user is illustrated in the histogram in the left panel of Figure 2, where the average post number per user is 146.7. We also draw the total number of Weibo posts by all 850 users within each hour in the right panel of Figure 2. We can observe an obvious periodic pattern with roughly three peaks each day. In addition to the responses, we include two nodal covariates of the users. The first is gender, which is 1 for male users and 0 for female users. Approximately 76% of the users are male. The second is the number of user self-created labels, which are short keywords defined by the user themselves to describe his/her life status. We construct the adjacency matrix by using users' follower-followee rela- Table 3. The simulation results for the stochastic block network and the power-law distributed network.  C(s, t), together with the MSE for λ 1 and λ 2 in the first line. We report the corresponding SUP values in the second line. The columns of ρ(t), β 1 (t), β 2 (t), φ 1 (t), φ 2 (t) are original results multiplied by 10 −3 .  tionships, that is, a ij = 1 if the ith user follows the jth user on the Weibo platform. The network density is calculated as i,j a ij /{N(N − 1)} = 0.047. This implies that the network is highly sparse.
We use the NFVC model to analyze the data. The estimated functional coefficients are given in Figure 3. First, we can observe a daily periodic pattern for all estimates, where the estimated functions reach peaks during the day and decrease at night. The pattern reflects the human diurnal nature. Second, the estimated network effect ρ(t) is positive, which implies that a user's posting behavior is positively related to their following friends. The network strength seems to be much lower on Saturday, possibly because many people devote Saturday to aspects of life other than social networking. Regarding the nodal covariates, β 1 (t) suggests that the male users tend to be less active than the female users on weekends, but more active on Thursday and Friday. The posting behaviors of the two genders are more similar on other days. Finally, from β 2 (t), we find that the users with more self-created labels tend to be more active during the week days but less active on weekends. This is possibly the result of their more active personal characters in combination with richer and more diversified personal lives on weekends.
We further conduct the FPCA analysis using the estimated C(s, t). With the BIC criterion we select K = 3. The estimated eigenfunctions are visualized in Figure 4. The first eigenfunction appears to reflect the general periodic dynamic patterns of user activities, while the second and third eigenfunctions seem to characterize the movements during weekdays and weekends, respectively.
Finally, we compare the prediction power of the proposed method with the method in Zhu et al. (2017). Specifically, we divide the time interval into 40 sub-intervals with equal length, and use the grids (1, 3, . . . , 39) for model training and (2, 4, . . . , 40) for model testing. The number of eigenfunctions is selected by the BIC criterion, which leads to K = 3. The mean prediction errors measured in IMSE for the NAR and NFVC models are 1.303 and 0.448, respectively. This indicates that the proposed method can capture the dynamic patterns in the network more accurately than Zhu et al. (2017), due to its ability to consider the functional nature of user activeness.

Conclusion
In this article, we propose the NFVC model for analyzing functional data with known network structure information. Many  extensions are possible following our work, which can potentially generate interesting research topics for future studies. For example, we only consider a dense sampling scheme for time points in this work in the theoretical discussions. A natural extension is to consider a sparse data setting (Yao, Müller, and Wang 2005a). In addition, our univariate response case can also be extended to a multivariate response analysis (Zhu, Li, and Kong 2012). Thus, the network effects and regression relationships among several indices of interest can be modeled. Next, in this work we treat W as static. As an interesting extension, one may consider dynamic network relationships as discussed in Section C.2 in the supplementary material. Furthermore, time-varying exogenous variables can be employed to enrich the dynamic network structures. Last, in this work, we only consider one type of network relationship in the model. In practice, network users can be connected through more than one type of relationship. Consequently, it will be interesting to consider evaluating multiple network effects in the model.