Supplementary Appendices of “Spatial Modeling Approach for Dynamic Network Formation and Interactions”

We provide a few technical details, additional simulation and empirical results in this supplementary appendices. In particular, supplementary Appendix B provides the discussion on the identiﬁcation of latent variables and their coeﬃcients. Supplementary Appendix C outlines the detailed MCMC estimation algorithm for the proposed model. We explain the model selection criterion for the latent dimension in supplementary Appendix D. Empirical results from other network relationships and robustness checks of model speciﬁcations are reported in supplementary Appendix E. We studies a counter-factual policy simulation to examine the multiplier eﬀects from network interactions in Supplementary Appendix F. The goodness-of-ﬁt of our network formation model to the real data is examined in supplementary Appendix G, while supplementary Appendix H depicts the time evolution of unobserved latent variables. Additional empirical tables and ﬁgures are relegated to supplementary Appendix I. The MATLAB codes used in this study are available at https://sites.google.com/site/chihshenghsieh/research or available from the author upon request.

Based on the above three equations, we can identify coefficients in the degree heterogeneity component, ξ 2 1 and ζ 2 1 , and (δ 2 1 + δ 2 2 ) as a summation. To separately identify δ 2 1 and δ 2 2 we need to look at higher order moments of ϕ ijt . For instance, we may explore Cov(ϕ 2 ijt , ϕ ijt ), which is As we have assumed the distributions of z i1t 's and m i1t 's, and E(m 3 i1t ) can be calculated. So we obtain another polynomial equation on δ 2 1 and δ 2 2 . More polynomial equations can be derived from the fourth or higher order moments of ϕ ijt 's. These polynomial equations can identify δ 2 1 and δ 2 2 separately. After showing identification of the magnitudes of ξ 1 , ζ 1 , δ 1 and δ 2 , we follow the same argument in the previous case to argue identification of their signs.
and some other higher order moments of ϕ ijt 's.

B.2 Identification of unknown parameters in the SC-SDPD model
In this appendix, we discuss the identification of unknown parameters in the SC-SDPD model. We first eliminate the group-time effect α gt 's by a difference approach. Let J g be the corresponding (n g − 1) × n g difference matrix, namely, The variables Y gt , W gt Y gt , Y g,t−1 , W g,t−1 Y g,t−1 , X gt , W gt X gt , τ g , H gt and V gt are transformed to J g Y gt , J g W gt Y gt , J g Y g,t−1 , J g W g,t−1 Y g,t−1 , J g X gt , J g W gt X gt , J g τ g , J g H gt and The SC-SDPD model after the first differencing is J g Y gt = λJ g W gt Y gt +ρJ g Y g,t−1 +µJ g W g,t−1 Y g,t−1 +J g X gt β 1 +J g W gt X gt β 2 +J g τ g +J g H gt κ+J g V gt .
So all terms in Eq. (B.4) can be identified from the data. In particular, we can identify provided that P (W gt |H gt , H g,t−1 ) = P (W g |H gt ) and the parameters in P (W gt |H gt ) are identified (estimated) from the network function model, as discussed in Appendix B.1.
Dropping the subscript g means J t stacks observations across groups for t = 2, 3, · · · , T .
Denote J = (J 2 , J 3 , · · · , J T ) . The condition that J J has full rank will identify pa- when z igt 's and m igt 's do not present in the network formation model, i.e., when δ p 1 = 0 for p 1 = 1, 2, · · · ,p 1 , ξ p 2 = 0 and ζ p 2 = 0 for p 2 = 1, 2, · · · ,p 2 , the full rank condition would be violated because the conditional expectation of H * gt simplifies to E(H * gt |W gt ) = E(H * gt ) = J g E(H gt −H g,t−1 ) = 0. In this case h igt 's only take place as H gt κ's in the control function of the SC-SDPD model. Since H gt QQ −1 κ = H gt κ for g = 1, 2, · · · , G, with anȳ p ×p nonsingular matrix Q, H gt 's and κ are not separately identified. (It is possible that only Z gt 's or M gt 's (but not both) shows up in the network formation model. Then the full rank condition would also be violated and we only have the identification problem for Z gt 's and κ 1 , or M gt 's and κ 2 . In this case, the Procrustes transformation algorithm in this appendix can still be applied on Z gt 's or M gt 's.) Recall that, from distributional assumptions in Eqs. (6) and (7), and the normalization that σ 2 z 0 , σ 2 m 0 , σ 2 z and σ 2 m all being 1, we have h igt = (z igt , m igt ) ∼ Np(0, tIp) for t = 1, 2, · · · , T . So the distribution of the rotated individual unobservable Q h igt is N p (0, tQ Q). To keep the distribution invariant after rotation, we have Q Q = Ip, which restricts Q to be orthogonal. This providesp (p+1) 2 identification conditions. To determine Q, additionalp (p−1) 2 conditions are needed. This is very similar to the "rotational indeterminacy" problem in common factor models. See, among others, Bai and Ng (2013), Bai and Wang (2015) and Aßmann et al. (2016), for more discussions.
In this subsection, we utilize the orthogonal Procrutean transformation in Aßmann et al. (2012Aßmann et al. ( , 2016 to impose extra identification conditions on the posterior draws of h igt = (z igt , m igt ) . Letn = G g=1 n g , H t = (H 1t , H 2t , · · · , H Gt ) and H = (H 1 , H 2 , · · · , H T ) be the (nT ) ×p matrix of individual unobservables across all groups and periods. Denote H (s) as the posterior draw of H at iteration s. As suggested by Aßmann et al. (2012Aßmann et al. ( , 2016, the posterior sampler gives orthogonally mixing samples of H (s) 's. Without further restrictions, H (s) is subjected to anp ×p unknown orthogonal transformation of H (s) Q (s) .
So as long as we can pin down all Q (s) 's, identification is reached. Following the "ex-post" approach in Aßmann et al. (2012Aßmann et al. ( , 2016, we first determinep ×p orthogonal transformation matrices Q (s) 's at each iteration s, based upon some minimization criterion. Then we utilize Q (s) to transform the posterior draws of H and κ. The minimization problem considered is, In particular, we need to determine a set of orthogonal matrices Q (s) 's and a fixed point H (not dependent on s). Given some initial value ofH, the minimization is derived iteratively by a two-step optimization. Note that the above minimization problem gives the same solution if {Q (s) } S s=1 andH are transformed by the same orthogonal matrix. So identification is achieved up to orientation. With solution Q (s) , original posterior draws of H (s) and κ (s) can be transformed as "identified draws" H (s) Q (s) and Q (s) κ (s) . Below is the detailed algorithm.
Step 0: Set the initial values of H (s) 's to be their original MCMC draws at iteration s.
Let the initial value ofH be the last draw of the posterior sample of H.
Step 1: Conditional onH and H (s) , solve the following minimization problem for Q (s) : The detailed derivation of the solution to this orthogonal Procrustes transformation problem can be found in Schönemann (1966) and Borg and Groenen (2005), among others.
It can be implemented by the following sub-steps: Step 2: Conditional on Q (s) and H (s) , deriveH = 1 Note that Hoff et al. (2002) and Sewell and Chen (2015) also apply the Procrustes transformation to post-screen the posterior draws of latent positions z it 's because the Euclidean distance in their network formation model is invariant to rotation, reflection, and translation of z it 's. But the transformation algorithm they use is not an iterative one. They just fixH at some meaningful initial values, such as the MLE, and do one time Procrustes transformation.

C The MCMC algorithm
In this appendix we provide details of the MCMC algorithm for the SC-SDPD model with the general dynamic network formation model. The algorithm consists of sampling steps for parameters in the network formation equation, namely, Γ = (γ , Φ ) with Φ = (δ , ξ , ζ ) ; parameters in the SC-SDPD model, namely, θ = (Ψ , β , κ, σ 2 v ) with Ψ = (λ, ρ, µ) , α gt 's and τ g 's, and for latent variables H gt = (Z gt , M gt )'s. Recall from Eq. (14) in the main text that the posterior distribution takes the following form: The likelihood function of Y gt and W gt is for g = 1, 2, · · · , G and t = 1, 2, · · · , T .
Below we list the set of conditional posterior distributions required in the MCMC sampler. To simplify notation, exogenous variables X gt 's, c igt 's and c ijgt 's, lagged dependent variable Y g,t−1 's and initial values W g0 and Y g0 are suppressed from the conditional set. For each step, the full conditional is conditioned on the rest of parameters and latent variables with the most updated values at the current iteration.
By Bayes' theorem, At the q th iteration, we apply a M-H step to sample h igt 's.
By Bayes' theorem, At the q th iteration, we apply a M-H step to sample h igT .
Note that as we need to ensure κ > 0, the posterior distribution of κ turns out to be a multivariate truncated normal.
Step 6: , where c Ψ is chosen by the user. Check whetherΨ satisfies the stability condition implied by its prior. If not, redrawΨ until it meets those conditions.

D Derivation of the AICM
The conventional AIC (Akaike, 1973) is defined as where max is the maximum log-likelihood and d is the dimension of the parameters in the model. However, max is not directly observable in Bayesian estimation approach because max may not be reached during the MCMC sampling procedure. Raftery et al. (2007) propose the posterior simulation-based analogue of AIC, namely the AICM. Their key insight is that given the MCMC draws from the posterior and suppose that the loglikelihoods { s : s = 1, · · · , S} corresponding to the MCMC samplers are approximately independent, max − s would asymptotically follow where max is the maximum achievable log-likelihood, and d is the effective number of the parameters. The asymptotic distribution in Eq. (D.8) follows when the amount of data underlying the likelihoods increases to infinity (Bickel and Ghosh, 1990;Dawid, 1991). Note that the peer effects reported in Table 3 of the main text are estimated based on general friendships. Given that our data provides information on antipathetic relationship, study mate, and cram school mate, we also report the peer effects based on these other relationships. networks. This finding shows that peer effects on students' academic outcomes do not operate through antipathetic relationships and serves as a placebo test to support that the significant peer effects among friends found in Table 3 exist. When we focus on friends who study together, the peer effect obtained among study mates is slightly higher than the one in Table 3. By contrast, the estimated peer effect on academic performance decreases by 40% when focusing on friends who go to cram schools together. Our explanation for this finding is that students who go to cram schools are often the ones who fall behind in school learning and need additional tutoring. There is no much gain on own academic learning from interacting with poorly performing friends.
In the empirical results based on networks of other relationships, we can see that the peer effect estimated from study mates seems stronger than that from general friendships (comparing Table I.3 Column 3 with Table 3 Column (IV) in the main text). This specific friendship might capture more relevant peer groups. Thus, we further conduct the full model estimation with endogenous friendship formation to study peer effects from the study mates network. The full model estimates of the contemporary peer effect λ, the persistent effect ρ, and the temporal peer effect µ for the study mates network are reported in Table I.4. It shows a correction of the endogeneity bias after controlling for network formation. Similar to the results from friendship networks, the contemporary peer effect and the persistency effect are both significant in all specifications of latent dimensions ((I) to (III)). The contemporary peer effect (λ) in the study mates network is still stronger than that of friendship networks after correcting the endogeneity of network formation (comparing Table I.4 with Table 4 in the main text). Furthermore, the AICM selects the two-dimensional latent variables specification to be the best fit model for the study mates network data.
We further conduct two robustness checks for our model specification. First, we consider the non-row-normalized W gt specification in our SC-SDPD model. For rownormalized W gt , the coefficient λ can be interpreted as the local average effect, i.e., the influence of the average behavior of one's peers; while for non-row-normalized W gt , λ can be interpreted as the local aggregate effect, i.e., the influence of the aggregate behavior of one's peers (Liu et al., 2014). The left panel of Table I.5 shows the estimates of contemporary peer effect λ, time persistence effect ρ, and temporal peer effect µ under the non-row-normalized specification. Compared to the row-normalized version (see Table   4 in the main text), λ and ρ are both significant and positive in either specifications. We report the estimation results of λ, ρ and µ with such specifications in the right panel of Table I.5. Compared to the original specification (see Table 4 in the main text), we conclude that there are no significant differences between the two specifications in estimating λ, ρ and µ. In particular, according to the AICM, the original latent specification with two dimension is still selected as the best fit model to the data.

F Multiplier Effects from Policy Intervention
The advantage of our structural model is that it enables us to explore the detailed channels in which networks and economic activities affect each other. This model likewise enables us to simulate and evaluate the policy impact on network formation and interaction of economic outcomes with higher accuracy. We follow the empirical framework of this study and analyze a policy scenario where government (or school) agencies provide financial assistance to students' families who experience financial difficulties. From the empirical results, we determine that the variable "family in financial difficulty" has significant negative effects on both students' friendship formation and academic performance.
Consequently, we expect that the financial releasing policy (program) will assist students to improve their social networking and school academic performance.
When studying the policy impact of releasing families' financial difficulties, we focus on the multiplier effect generated through network interactions. In particular, the multiplier effect on academic outcome implied by our dynamic model is different from that of the cross-sectional social interactions model. It is jointly determined by the contemporary and temporal peer effects, subjected to endogenous network rewiring.
Note that the contemporary peer effect λ in the structural econometric model (8) in the main text is a function of both the social multiplier coefficient λ 1 and the social conformity coefficient λ 2 (see Appendix A Eq. (A.3)). Even though λ can be identified, λ 1 and λ 2 may not be separately identified (See also the discussion in Boucher and Fortin (2016)). However, when discussing the multiplier effects from policy intervention for our model, we are not aiming at determining the exact source of network interaction nor identifying the social multiplier. Instead, our focus is the multiplier effect that arises from the network interaction effect λ as a whole. As long as λ can be identified and estimated, the source of network interaction would be irrelevant.
We use the networks and academic outcomes observed at the last time period (T ) as bases to analyze the marginal effect (ME) of the policy on the one-period out-of-sample expected network outdegree, D g,T +1 (the network outdegree of individual i is the number of nominated friends of i), and academic outcome, Y g,T +1 . We compute as follows: ME D g,T+1 = E(D g,T +1 |W g,T , Y g,T , X treated g,T +1 , Γ) − E(D g,T +1 |W g,T , Y g,T , X untreated g,T +1 , Γ), (F.11) ME Y g,T+1 = E(Y g,T +1 |W g,T , Y g,T , X treated g,T +1 , θ) − E(Y g,T +1 |W g,T , Y g,T , X untreated g,T +1 , θ), (F.12) where ME D g,T+1 is the vector of marginal effects on the network outdegrees and ME Y g,T+1 is the vector of marginal effects on the academic outcome for school g; X treated g,T +1 refers to the case that we turn all non-zero financial difficulty dummies to zero at T + 1 and retain all other exogenous variables unchanged from T , and X untreated g,T +1 refers to the case where we retain the financial difficulty dummy as well as all other exogenous variables the same values as at T . We compute the marginal effect by the difference formula in Eqs. (F.11) and (F.12) because the policy instrument is discrete. For continuous instruments, one can follow Lee and Yu (2012) to compute the marginal effect from the SDPD model by the space-time multiplier. We take the parameter estimates in Column (II) of Table 4 in the main text and compute ME separately for each school. To focus on the policy effect, we assume that the group-time effects, individual latent variables and latents' coefficients are unchanged from T to T + 1. So the normalization constraints on the variances of those latent variables would not affect the interpretation of the marginal effects. For the marginal effect on academic outcome, we consider two scenarios -one fixes the network W g,T +1 as W g,T and the other enables the network W g,T +1 to be rewired. Our network formation model determines the endogenous network rewiring for W g,T +1 . In the network rewiring scenario, we report the ME as the average value of Eq. (F.12), which is calculated from 1,000 simulated networks.
We pool the ME values across all schools and plot their distributions for network outdegree and academic outcome in Figures I.2 and I.3, respectively. In both figures, Panel (a) focuses on the group of the 64 treated students, i.e., whose families experience financial difficulties; Panel (b) focuses on the rest untreated student; and Panel (c) combines the students of the two groups. For the network outdegree, Figure I.2 shows that the policy improves the students' social networking. Overall, each student will own an average of 0.66 additional outdegree links because of the policy. The policy effect on treated students is stronger. In particular, each treated student will own an average of 1.7 additional outdegree links and each untreated student will own 0.49 additional outdegree links.
For academic outcome, the multiplier effect of the policy intervention can be clearly identified in Panels (a) and (b) in Figure I.3. If no multiplier effects were present for the treated students, then the ME values in Panel (a) should concentrate at 2.504, which is the corresponding coefficient of the variable Lessmoney in Table 4. However, we observe a few ME values above 2.504. Similarly, if no multiplier effects were present for the untreated students, then all the ME values in Panel (b) should be zero. Instead, we observe many positive ME values for those who do not experience financial difficulty.
The comparison between the scenarios of the fixed and rewired networks reveals that network rewiring would change the distributions of ME for the treated and untreated students. In Panel (a) for the treated students, a minor change is shown on the mean values of ME (3.04 and 3.06, respectively) between the fixed and rewired network cases.
However, the standard deviation decreases from 0.75 in the fixed network case to 0.50 in the rewired network case. A similar pattern can be observed in Panel (b) for the untreated students. Although means are similar (0.44 and 0.46, respectively), a huge decline is observed from the standard deviation from the fixed network case (0.58) to the rewired network case (0.36). Thus, the network rewiring results in a mean-preserving contraction in the distribution of ME for academic outcome.
We also conduct a policy impact analysis under the potentially misspecified SAR model in Column (II) of Table I

G Network Goodness-of-fit Examination
To investigate whether the network formation model that we propose fits the observed network data well, we follow Hunter et al. (2008) to conduct a model goodness-of-fit examination. For illustration, we select the observed networks of periods 2 to 4 from school 7 as the examination benchmarks. The results from other schools are generally the same and are available upon request. We pass over the network of period 1 because it is assumed to be exogenously given. For each period, we simulate 100 artificial networks from our network formation model with the parameter estimates reported in Column (II) of Table 4

H Time Evolution of Unobserved Latent Variables
When researchers study network formation by the latent space models, they have an intention of visualizing the position of each node in the latent space. In a dynamic setting, one can further show the time evolving trajectory of the node positions (Sewell and Chen, 2015). For the current empirical study, we choose one school (school 7) to conduct this  Table 4, we do not obtain all the coefficients of the latent variables in the network formation model to be significant. In particular, the estimate of ζ 1 is insignificant. One may worry that a non-significant coefficient of the latent variables would cause an identification problem. Therefore, we apply the iterative Procrustes transformation (Aßmann et al., 2012(Aßmann et al., , 2016 on the posterior draws of z igt 's and m igt 's to secure their identifications from the potential rotation problem. To be explicit, we take the posterior MCMC draws from the estimation results and  Table 4). We similarly show the correspondence between the latent position of m igt and the number of friendship nominations that each student received (i.e., indegree) from the perspective of link receivers in Panels (i) to (l) of the third row. The plots show that the nodes that move to the top left corner of the space are in fact high-indegree students. This result is consistent with the negative estimate of ζ 1 and the positive estimate of ζ 2 in Table 4.
To analyze whether the non-significant ζ 1 in the network formation model causes    Note: This Monte Carlo study contains 100 repetitions. The mean and standard deviation of the point estimates across repetitions are reported. Column "Full-D2" refers to the true model that generates the artificial data, which has latent variables in two dimensions. Column "Full-D1" refers to the model that has only one-dimensional latent variable. Columns "Unobs. homo." and "Unobs. Deg." refer to the two encompassed network formation models with only unobserved homophily or unobserved degree heterogeneity. Column "SDPD w/ factor" refers to the SDPD model with the common factor structure where the number of common factors is set to 4. Column "SDPD" refers to the SDPD model which neglects endogenous network formation. Column "SAR" refers to the SAR model with individual and time effects.  Note: The dynamic network formation with latent variables of one dimension in Column (I), the two dimensions in Column (II), and the three dimensions in Column (III). The asterisks * * * ( * * , * ) indicate that its 99% (95%, 90%) highest posterior density range does not cover zero. We perform the MCMC sampling for 100,000 iterations, discard the initial 20,000 iterations for burn-in, and compute the posterior means and standard deviations from the remaining draws as point estimates. The posterior standard deviations are reported in parentheses. Note: We perform the MCMC sampling for 100,000 iterations with the initial 20,000 iterations discarded for burn-in, and compute the posterior means and standard deviations from the remaining draws as point estimates. The posterior standard deviations are reported in parentheses. The asterisks * * * ( * * , * ) indicate that its 99% (95%, 90%) highest posterior density range does not cover zero. Note: Dynamic network formation model with latent variables of one dimension is in Column (I); two dimensions in Column (II); and three dimensions in Column (III). Each model controls own and contextual effects of exogenous regressors. We perform the MCMC sampling for 100,000 iterations with the initial 20,000 iterations discarded for burn-in, and compute the posterior means and standard deviations from the remaining draws as point estimates. The posterior standard deviations are reported in parentheses. The asterisks * * * ( * * , * ) indicate that its 99% (95%, 90%) highest posterior density range does not cover zero. Note: Dynamic network formation model with latent variables of one dimension is in Column (I); two dimensions in Column (II), and three dimensions in Column (III). All regressions include own and contextual effects of exogenous X. We perform the MCMC sampling for 100,000 iterations with the initial 20,000 iterations discarded for burn-in, and compute the posterior means and standard deviations from the remaining draws as point estimates. The posterior standard deviations are reported in parentheses. The asterisks * * * ( * * , * ) indicate that its 99% (95%, 90%) highest posterior density range does not cover zero.  . Panels (a)-(d) draw z igt and the node size represents outdegree. Panels (e)-(h) draw m igt and the node size represents outdegree. Panels (i)-(l) draw m igt and the node size represents indegree.