Statistical Learning of Neuronal Functional Connectivity

Identifying the network structure of a neuron ensemble beyond the standard measure of pairwise correlations is critical for understanding how information is transferred within such a neural population. However, the spike train data pose significant challenges to conventional statistical methods due to not only the complexity, massive size, and large scale, but also high dimensionality. In this article, we propose a novel “structural information enhanced” (SIE) regularization method for estimating the conditional intensities under the generalized linear model (GLM) framework to better capture the functional connectivity among neurons. We study the consistency of parameter estimation of the proposed method. A new “accelerated full gradient update” algorithm is developed to efficiently handle the complex penalty in the SIE-GLM for large sparse datasets applicable to spike train data. Simulation results indicate that our proposed method outperforms existing approaches. An application of the proposed method to a real spike train dataset, obtained from the prelimbic region of the prefrontal cortex of adult male rats when performing a T-maze based delayed-alternation task of working memory, provides some insight into the neuronal network in that region.


INTRODUCTION
The capacity to simultaneously record spike trains of many neurons from behaving subjects has surpassed our ability to describe putative neural codes distributed across populations of neurons. Detecting the network structure or graph structure of a neuron ensemble beyond the standard measure of pairwise correlations is critical for understanding how information is transferred within such a neural population. In this application, the spike train data, which is typically collected as multivariate point processes, present statisticians with many exciting opportunities as well as significant challenges in data analysis and in interpretation of results, due to not only the complexity, massive size, and large scale, but also high dimensionality and nonstationary temporal/spatial dependence structure. First, neural spike trains of this type are generally nonstationary and exhibit a great amount of variability among repeated trials. Second, there is evidence showing that patterns of the neural spike activity among neurons may vary with different experimental conditions. Third, the high-dimensional nature of the spike train data from these experiments makes the analysis extremely computationally intensive.

Existing Work
Several existing methods have been proposed to identify functional connections within an ensemble of neurons. For example, some early works using nonparametric methods, such as the cross-correlogram (Perkel, Gerstein, and Moore 1967) and joint peri-stimulus time histogram (Gerstein and Perkel 1969), provide useful insights and are still commonly used for analyzing the interactions between neurons. However, these methods have some limitations. They focus on the pairwise relationship alone, ignoring either the possible connections with other neurons in the ensemble or the influences from external stimuli. Furthermore, correlation-based analysis is suitable for describing the linear aspect of connectivity, which might be inadequate for depicting neuronal spike activities. Recently, the model-based approaches draw a great deal of attention from researchers, in which a specific point process model is assumed, including the inhomogeneous Poisson process or more generally Cox point process (Cox and Isham 1980).
In the literature, the conventional GLM (McCullagh and Nelder 1989) has been widely used to analyze the spike point processes where the firing rate of a single neuron is modeled as a function of spiking history of concurrently recorded ensemble neurons (Brillinger 1992). This GLM approach has been successfully applied to spike train data from many different types of experiments and becomes a very powerful and efficient tool of neural encoding and decoding (Truccolo et al. 2005;Pillow et al. 2008). One significant advantage of the GLM approach is that the connections between every pair of neurons can be modeled as parameters of the GLM, followed by being analyzed simultaneously under the general framework (Okatan, Wilson, and Brown 2005). In this setting, the influence from one neuron to another one is described by a set of temporal interactive parameters whose sign and magnitude indicate how neurons interact with each other. A more comprehensive review can be found in Brown, Kass, and Mitra (2004) and Truccolo (2010). We will focus mainly on the GLM framework in the rest of our article; nevertheless, it is worth noting that other statistical models, such as the Cox model (Berry et al. 2012) and the dynamical Bayesian network (Eldawlatly et al. 2010), are also useful tools for inferring the functional connections among ensemble neurons.
It is well known that the maximum likelihood estimator (MLE) is generally applicable to estimate model parameters in GLM, but it is also known to be vulnerable to over-fitting problems, which frequently arise from analysis of high-dimensional data. For a typical real neural spike train dataset, the spiking rate is often very low and the design matrix could be very sparse; for example, more than 90% of its entries are zeros. In such cases, the MLE may not be reliable. Therefore, some regularization techniques can be applied to obtain well behaved solutions to over-parameterized estimation problems (Bickel and Li 2006). On the other hand, a sparse solution is desirable, since the neurons may not be connected to all other neurons in a large population and the sparsity could improve the interpretability. Since the standard MLE does not automatically produce a sparse solution and the estimated parameters using MLE are almost surely nonzero, some additional steps (e.g., multiple hypothesis testing procedures controlling either the family-wise error rate or false discovery rate as in Gerhard et al. (2011);Kim et al. (2011), andBerry et al. (2012)) are usually needed to select the significant parameters that correspond to those truly functional connections among neurons.
For above reasons, a class of nonsmooth penalties, including the L 1 penalty or Lasso (Tibshirani 1996), and the SCAD (Fan 1997), are introduced to achieve the goal of both obtaining a stable estimate and encouraging a sparse solution. The L 1 regularized GLM has been studied in the neuroscience literature (Kelly et al. 2010;Chen et al. 2011;Mishchenko, Vogelstein, and Paninski 2011;Zhao et al. 2012) due to its simplicity, easy implementation, and good performance. Furthermore, regularization or penalization can also be interpreted as imposing a prior on the parameters from a Bayesian perspective, and the regularized log-likelihood can be viewed as the log-posterior density of the parameters (Stevenson et al. 2009). Thus, the regularized maximum likelihood estimate is equivalent to the maximum a posteriori estimate from the Bayesian point of view.

Motivation for Our Work
However, to our knowledge, little published work on neural spike train data has fully considered the structural information of the parameter space, in which all parameters related to the interaction between one specific pair of neurons naturally form a group and the whole group could be better estimated together. Instead of treating all parameters individually, we wish to be able to make decisions jointly with the grouping information and promote the structured sparsity, that is, select the entire group of parameters, or make the inclusion of some parameters depend on the inclusion of other parameters. Some recent works in the statistics literature have explored this direction by the use of group or hierarchical norm penalties and illustrated some promising properties, particularly the group Lasso (Yuan and Lin 2006;Meier, Van De Geer, and Buhlmann 2008), the sparse group Lasso (Chatterjee et al. 2012;Simon et al. 2013), and others (Liu and Ye 2010).
In this article, we will study the spike train data by adopting the generalized linear model (GLM) framework from Brillinger (1992), which has shown the superiority in modeling the spiking activities of neurons (Truccolo et al. 2005;Devilbiss, Jenison, and Berridge 2012). However, we find that certain structural information of the parameter space in the GLM has not been fully used in the previous works. This motivates us to propose a "structural information enhanced" (SIE) regularization method for the spike train data to better investigate the functional connectivity within a neuronal network.
In this article, we propose a new "structured" regularization method for the spike train data to incorporate the structural prior information into the modeling procedure and guide the selection of parameters according to the underlying structure of the parameter space. Main contributions of our work are listed in the following parts.
• We introduce the SIE-GLM, a group-structured regularized GLM, in the context of multiple spike train data and demonstrate that our method performs better than existing approaches on simulated spike train data. • The proposed method can be shown to be estimation consistent. • A fast and efficient algorithm, called "accelerated full gradient update" (AFGU), is developed to efficiently handle the complex penalty in the SIE-GLM for large sparse datasets applicable to spike train data. • The prefrontal cortex (PFC) plays an important role in cognitive and behavioral processes. An application of the proposed method to a real spike train dataset, obtained from the prelimbic region of the prefrontal cortex (plPFC) of adult male rats when performing a T-maze based delayed-alternation task of working memory, provides some insight into the neuronal network in that region.
The rest of the article will proceed as follows. Section 2 introduces the proposed SIE-GLM in the context of multiple neural spike train data and proposes the AFGU algorithm for the group-structured regularization. Section 3 explores theoretical properties of the proposed method. Section 4 presents simulation results to assess our method. Section 5 describes a neural spike train dataset recorded from animals performing a cognitive task and illustrates the efficacy of our proposed method. Section 6 concludes with a brief discussion. Technical assumptions and detailed proofs are relegated to the appendix in the supplementary materials available online.

METHODOLOGICAL DEVELOPMENT
We start with a brief description of modeling neural spike train data in Section 2.1, followed by the proposed SIE-GLM modeling, parameter estimation, and implementation algorithm in Sections 2.2 and 2.3.

Background of Modeling Neural Spike Train Data
Consider an ensemble consisting of C neurons. The spike train data are usually collected as multivariate (i.e., C-variate) point processes, denoted by where J c is the total number of observed spikes by neuron c during the experiment, {u c,i } J c i=1 are the timestamps when neuron c spikes, and T is the time length of the experiment trial. Here, we will take a discrete-time approach, where the whole experiment time period (0, T ] is divided into n equally spaced time bins, τ k ≡ (t k−1 , t k ], each of length δ = T /n, that is, t k = kδ, k = 0, 1, . . . , n. Denote by N c (τ k ) the number of spikes fired by neuron c within the kth time bin τ k , and by N 1:C (τ 0:k ) the spiking history of all neurons up to the time point t k , where τ 0:k = (0, t k ]. We model the distribution of the Poisson-type variable, N c (τ k ), by means of the generalized linear model (GLM) for the conditional intensity. Here, the conditional intensity function of neuron c, denoted by takes into account several inputs at time t k or earlier and has the form, log{λ c (τ k | N 1:C (τ 0:(k−1) ), X(τ k ))} = γ c;0 + P p=1 γ c;p N c (τ k−p ) + i∈{1,...,C}\c Q q=1 γ c,i;q N i (τ k−q ) + X(τ k ) T β c , k = 1, . . . , n, • exp(γ c;0 ) with the intercept γ c;0 represents the baseline firing rate of neuron c.
• P p=1 γ c;p N c (τ k−p ) is the effect of the intrinsic spiking history of neuron c, up to P bins into the past with γ c;p representing the autoregressive parameter at lag p. • Q q=1 γ c,i;q N i (τ k−q ) describes the influence of neuron i on neuron c. Such influence is computed based on a spiking history window, up to Q bins into the past where γ c,i;q represents the interactive parameter at lag q. For example, if γ c,i;q is positive, neuron i would functionally excite neuron c after time lag of q bins. The larger the value of γ c,i;q , the stronger the excitatory drive.
• In (2.1), X(τ k ) = (X τ k ,1 , . . . , X τ k ,M ) T is an M-dimensional vector representing some extrinsic covariates measured during the kth time bin, τ k , while β c = (β c,1 , . . . , β c,M ) T represents the vector of corresponding parameters. For example, X(τ k ) includes several status indicators related to the performance of the animals in a behavioral task for electrophysiological recordings (described in Section 5; see, e.g., Table S6 in the online supplementary materials).
Model (2.1) enables us to simultaneously handle several factors, including the spike history, neuronal network structure, and other covariates. Note that we distinguish the influence of each neuron's own spiking history from the influences coming from other neurons, and allow the lengths P and Q of the history windows to be different to achieve greater flexibility.
Regarding the model assumption, we assume that the numbers of spikes randomly fired by different neurons are conditionally independent given the complete spiking history of the whole ensemble and covariates X(τ k ), that is, N i (τ k ) is independent of N j (τ k ) given N 1:C (τ 0:(k−1) ) and X(τ k ) for any 1 ≤ i, j ≤ C, with i = j and 1 ≤ k ≤ n. Under this GLM framework, the conditional likelihood of the observed spike trains is then written as The conventional MLEs of parameters in (2.1) can be obtained by minimizing the negative log-likelihood, is the negative log-likelihood of a single neuron c, and the vector collects all parameters for neuron c, including the baseline firing rate γ c;0 as the intercept. We shall see that minimizing (2.2) is separable for each neuron, and thus can be equivalently solved by minimizing c ( θ c ) for each neuron c. Therefore, we will analyze each neuron individually in the rest of the article.

Proposed SIE-GLM and Estimation
Regularization or penalization is a very useful technique aiming at obtaining well-behaved solutions in some largedimensional problems with sparse signals. Given the large dimensionality and large portion of zero entries in spike train data, the nonregularized estimation usually suffers from the overfitting problem and results in unstable solutions. Furthermore, the interactive parameters γ c,i;q representing the functional connection between neuron cells are assumed to be sparse. However, the nonregularized estimator, that is, the conventional MLE, fails to supply a sparse estimation of those parameters.
Here, we wish to propose an appropriate regularization for estimating parameters in model (2.1). Among numerous variable selection methods that are recently developed based on regularization, Lasso using the L 1 penalty (Tibshirani 1996) is one of the most popular methods due to its simplicity and good performance. The general form of a Lasso solution is defined as follows, where ( θ ) is usually the negative log-likelihood function or other appropriate forms of loss functions, θ is the vector of all parameters in θ excluding the intercept, η is a tuning parameter, and · 1 denotes the L 1 norm of a vector, that is, the sum of absolute values of all coordinates. Such L 1 regularized GLM has been studied in some recent work, including Kelly et al. (2010) and Zhao et al. (2012), and it shows a lot of advantages and provides significant improvement.
However, Lasso by design can only treat all parameters individually, but is not able to incorporate the structural information inherent in a particular dataset. As mentioned earlier, for any two different neurons, for example, neuron c and neuron i, the parameters {γ c,i;q : q = 1, . . . , Q} together describe one set of interactive parameters, which induces a natural "grouping" of the parameters in our problem. It is thus natural to design alternative regularization methods that can promote a structured sparse solution. Recently, Simon et al. (2013) proposed the sparse group lasso (SGL), which can perform the selection of parameters at both the group level and the individual level. The general form of a SGL solution is where θ is partitioned into G groups so that θ = (θ (1)T , . . . , θ (G)T ) T , p g is the length of parameters in θ (g) corresponding to the gth group, and · 2 denotes the L 2 norm of a vector. Here, both α ∈ [0, 1] and η > 0 are tuning parameters. The SGL penalty term is a convex combination of the Lasso and group Lasso penalties controlled by α. Smaller values of α promote the sparsity at the group level, while larger values of α encourage the individual sparsity. Motivated by the structured penalties mentioned above, we now formulate the regularized SIE-GLM in our setting as follows. Recall that parameters in (2.1) are grouped in two types, one corresponding to the autoregressive parameters, denoted by and another corresponding to the interactive parameters, denoted by C} \ c. (2.4) In the meantime, the parameters in for the covariates X(τ k ) are not grouped together and are treated as stand-alone variables. Following (2.3), denotes the structured penalty term. As a comparison, the L 1 penalty uses η c θ c 1 and is a special case of (2.6) corresponding to α c = 1. Note that the penalty terms can also be easily extended to adapt to more complex types of hierarchical structures; see, for example, in Liu and Ye (2010).

Tuning Parameter Selection.
We choose the tuning parameters η c and α c to minimize the BIC function is a bivariate function of (η c , α c ), since θ c and θ c both depend on η c and α c , that is, . The minimizer is found by the grid search. We select α c ∈ {0.1, 0.3, 0.5, 0.7, 0.9}, and for each given α c , the grid points for η c are {η max h i : } > 0} and h ∈ (0, 1) are constants with η max depending on α c . In practice, we may have to drop the small values of η c in case that the algorithm doesn't converge.
Note that AIC as another popular model selection criterion is not suitable for our purpose, because AIC usually performs better for prediction but not very well for variable selection. While the main objective of the study is to learn the functional connectivity between neurons, that is, nonzero interactive parameters, BIC is more capable of selecting truly nonzero parameters.

Proposed AFGU Algorithm for the SIE-GLM
Recall that Lasso and SGL can produce sparse solutions because the penalty terms are not differentiable at the point 0. However, this feature, on the other hand, also brings computational difficulty in solving the minimization problem. The standard convex optimization used to obtain MLE, like Newton-Raphson method, is not directly applicable to nonsmooth objective functions. There has been a tremendous amount of work on regularized optimization from both statistics and computer science perspective. Recently, the coordinate descent (CD) algorithm, rediscovered by Friedman et al. (2007), has gained lots of attention in regularized linear and logistic regression and was shown to have computational superiority. Friedman, Hastie, and Tibshirani (2010) and Simon et al. (2013) applied a similar idea and developed a block-wise coordinate descent (BCD) algorithm for SGL.
However, we also find that while the BCD algorithm is very fast and scales well in the linear regression model, it can be quite costly in the GLM when the sample size, that is, the number of bins in the spike train data, is very large. Based on our initial simulation study, the BCD algorithm could not well handle the large dataset whose size is similar to the real spike train data.
We develop an AFGU algorithm motivated from the previous work in Kim, Kim, and Kim (2008), Beck andTeboulle (2009), andWright (2012). Our proposed AFGU algorithm is based on the full gradient of log-likelihood function and a specific shrinkage-thresholding operator depending on the penalty function, combined with a Newton-based acceleration technique over active parameters. Compared with BCD, the AFGU algorithm could improve the performance in two ways.
First, we use the full gradient with respect to the parameter vector θ c instead of either the coordinate-wise or block-wise gradient to avoid evaluating the gradient at each coordinate one at a time. Starting from the current estimate θ (old) c , the full gradient update is to obtain θ (new) c , which minimizes where ∇ c ( θ c ) denotes the gradient vector of c ( θ c ) with respect to θ c , and ρ is some small positive constant to control the step size of the gradient update, which will be updated adaptively in the algorithm. The minimization in (2.8) is separable between groups and the sub-problem for each group can be analytically solved as in Simon et al. (2013). Second, for each iteration, after the gradient update step in (2.8), a Newton-type acceleration step is performed on the reduced parameter space that only consists of the coordinates with nonzero parameters at the current estimate. Let A be the index set of the nonzero elements of θ (new) c . Denote by ∇ A L c ( θ c ) and ∇ 2 A L c ( θ c ) the reduced gradient vector and Hessian matrix of L c ( θ c ) with respect to θ c,A , the sub-vector of θ c with indices in A. The reduced-Newton update for θ (new) c is then simply (2.9) To estimate θ c , we can perform the updates in (2.8) and (2.9) iteratively until convergence. The acceleration step in (2.9) generally yields a vast performance improvement over the full gradient update method in (2.8) and the BCD method, which both use only the first-order information, and the improve-ment is worth the cost of evaluating the reduced Hessian. When the true parameters are sparse, the reduced Hessian has a small dimension and its inverse can be easily computed. Therefore, the AFGU algorithm is expected to run faster and theoretically converges to the optimum point Qquadratically, which is better than either the simple gradient or the coordinate-wise gradient method with a Q-linear convergence rate (Wright 2012). Some fine adjustments are also made to efficiently handle the sparse neural data, which have a large portion of zero entries, and stabilize some matrix operations.
As mentioned earlier, Simon et al. (2013) used the BCD algorithm to solve the SGL problem for the linear and logistic regressions. To estimate the parameters, they proposed an objective function similar to (2.8) [see eq. (10) therein], and minimize it iteratively in a group-wise way without the second acceleration step. In our work, the idea of the BCD algorithm is applicable to solve problem (2.5). Specifically, BCD minimizes (2.8) with respect to the parameters in one group at a time and cyclically iterates through the groups. Therefore, the AFGU algorithm is computationally more efficient than BCD in the sense that it updates all parameters by (2.8) simultaneously and uses the reduced-Newton update in (2.9).

THEORETICAL PROPERTIES
In this section, we provide theoretical properties of our proposed method. Here, we allow the total number C of neurons to diverge slowly with the length T of the experiment, or equivalently, with the number n of time bins when δ = T /n is fixed. Let θ * c = (θ * 0 , θ * 1 , . . . , θ * d ) T be the vector of unknown true parameters, where d = P + (C − 1)Q + M.
Theorem 1 guarantees the existence of a consistent local minimizer of (2.5) and such minimizer is √ n/C-consistent.
Theorem 1. Assume Conditions A1-A3 in the supplementary appendix (available online only). If η c √ n = O(1) and C 4 /n = o(1) as n → ∞, then there exists a local minimizer θ c of (2.5) such that θ c − θ * We would like to point out that the proof of Theorem 1 uses the penalized Bregman divergence (BD) framework in Zhang, Jiang, and Chai (2010), but differs in two parts. First, we need to treat the BD more carefully, since samples in Zhang, Jiang, and Chai (2010) are iid, but samples in the spike train dataset, that is, {N c (τ k ) : k = 1, . . . , n}, are not iid. We will use the martingale version of central limit theorem (Theorem 7.4 in Durrett 2004) to establish the desired result in the proof. Second, the penalty terms used in our SIE-GLM are much more complex than the L 1 and those penalties in Zhang, Jiang, and Chai (2010) that are imposed on individual variables. Details can be found in the supplementary appendix available online.

SIMULATION STUDIES
To illustrate the application and performance of the proposed regularized SIE-GLM, we consider three types of network structures among neurons. The simple structure is addressed in Section 4.1, the complex structure is illustrated in Section 4.2, whereas the network structure mimicking the real data is presented in Section 4.3.
We compare the performance of the proposed AFGU algorithm for the SGL regularized method "SGL-P-Q" with two existing L 1 regularized methods "L 1 -short-P-Q" (Zhao et al. 2012) and "L 1 -P-Q" in different settings, with varying lengths P and Q of the history windows. Note that "L 1 -short-P-Q" uses a different parameterization as follows, where Q is chosen to be small and the interactive parameters {γ c,i;q : q = 1, . . . , Q} in (2.4) are simplified to a single parameter γ c,i;1 , that is, The corresponding parameter vector is The "L 1 -P-Q" method uses the parameterization as in (2.1) without simplification and imposes the L 1 penalty as follows,

Simple Network
We first simulate an ensemble of 10 neurons with the network structure displayed in Figure S1 (in the online supplementary materials). There are 10 connections in the constructed network, of which seven connections have type-A parameters (with thick red line in Figure S1, available online) and three connections have type-B parameters (with thin blue line in Figure S1, available online). For each neuron, the baseline rate was set at γ c;0 = −3, which gives an average firing rate at exp(−3)/0.1 = 0.5 Hz when the bin size is 0.1 sec. The lengths of the history window of the autoregressive parameters and the interactive parameters are set to be 10 bins in the generating process. The autoregressive parameters are modeled by setting the first few parameters γ c;p to be very negative and then rise to be slightly positive before going back to zero, in a way similar to Truccolo et al. (2005); see the left panel of Figure S2 (available online). Two possible forms of interactive parameters between neurons are also illustrated in the middle and right panels of Figure S2 (available online). For illustrative simplicity, the covariate term X(τ k ) T β c is not included in the simulation.
We simulate a spike train with n =10,000 and n =15,000 bins, which correspond to T = 1000 sec and T = 1500 sec when δ is fixed at 0.1 sec. The following five methods are compared. The tuning parameters are selected by BIC given in (2.7) for all methods. We also tried the SCAD penalty (Fan 1997), which performs similar to the L 1 regularized methods (and thus is not shown here). In practice, to calculate the parameter estimators, the group structure is used in the algorithm for methods III-V below, but, for methods I and II, the structural information is not incorporated. While method IV uses the SGL penalty with the true length of the history window, method III and V intend to investigate the performance of the SGL regularized method with an inexact length of the history window, since we do not know how long the interaction between neurons would last in practice. For a better comparison, we also carry out the simulation under the setting that the interactive parameter is 25% stronger, that is, the magnitude of parameters is 25% larger than those given in Figure S2 (available online). We simulate spike trains with different lengths to see how the result changes.
The results of all methods are summarized in Tables S1 and S2 (in the online supplementary materials), which also include comparisons with the method in Tandon and Ravikumar (2014) (abbreviated as the TR method). "Correct-All" represents the number of all connections correctly detected by methods where there is indeed a connection between two neurons in the true network. "detected-A" and "detected-B" are numbers of all detected connections that embrace the true connections represented as type-A and type-B, respectively. "Correct-NC" is the number of pairs correctly identified as no connections where there is actually no connection. For each of the four metrics, a larger value means a better performance. All values represent the average across 100 simulation runs for each length of spike train and strength of parameters.
From Tables S1 and S2 (available online), all methods can successfully detect the sparse structure of the network, that is, most truly zero-valued parameters are estimated to be zero, reflecting very good levels of specificity. However, the sensitivity of detecting significant interactive parameters is not as good as specificity. When the relative strength of interactive parameters increases, we can see that the sensitivity becomes better. Such an effect is reasonable, since stronger signals will result in bigger changes to the conditional intensity, which makes the interaction easier to be detected. When we increase the length of the spike train to 15,000 bins, we see improved performance from all methods, which would be expected since we simply have larger dataset and more information to study the network structure of the neuron population. Among all methods, the SGL regularized method with the exact length of parameters' history window has the best performance in finding the interactions between neurons. Interestingly, even when we misspecified the length of the history window of interactive parameters in methods III and V, they can still get the comparable results. All three SGL regularized methods outperform the other two L 1 regularized methods.

Complex Network
We simulate an ensemble of 60 neurons with the more complex network structure illustrated in Figure S3 (available online) and the corresponding connection matrix in Figure S4 (available online). There, there are 60 connections, among which 30 have type-A parameters and 30 have type-B parameters. The other settings are similar to the previous simulation except that we simulate with longer spike train (using n =20,000 and 25,000) to handle the larger network.
The results of this simulation are summarized in Table S3 (available online). Besides those patterns that are already discussed in Section 4.1, we can clearly see from Figure S5 (available online) that the SGL regularized methods perform considerably better than two L 1 based methods. For example, method IV is able to identify 12 more true connections on average and much fewer false discoveries than method I when n =20,000 and the relative strength is 125%. The estimated network structures by the SGL regularized method are already very close to the true simulation network in most runs.
For all methods, we also find that it is much harder to detect the connections of type-B interactive parameters than those of type-A interactive parameters. Since type-A parameter is more excitatory (more positive interactive parameters in γ (c,i) ) and type-B parameter is more inhibitory (more negative interactive parameters in γ (c,i) ), it is likely that the initially low baseline firing rate makes it difficult to detect inhibitory influences given the relatively short length of simulated spike train. Figure S6 (in the online supplementary materials) illustrates the estimated parameters of all type-A and type-B connections in a single simulation run by L 1 -10-10 and SGL-10-10, in which thick circles connected with dashed lines represent the true values and each thin dashed line represents the estimated parameters for one true connection with type-A or type-B. (Since L 1 -short treats the whole interactive parameters as one single parameter, it cannot provide the detailed structural form of parameters.) We find that the L 1 -10-10 method usually can only detect the high peak at the beginning, while the SGL-10-10 method can provide more informative estimation of the parameters.

Simulated Network that Mimics the Real Data
To mimic the setting we encounter in the real experiments, we will simulate the spike train data based on the estimated networks from the real data. First, obtain the estimated parameters and network structure by applying method SGL-30-10 to the neurophysiological data and denote by β c the estimate of β c . For the simulation study (with P = 30 and Q = 10 in model (2.1)), except for β c and γ c,i;q , the true parameters as well as the network structure are set as the estimated ones acquired previously. The true value of β c is 0.3 sign( β c ), and values of γ c,i;q are similar to those in Figure S2 (available online) but with relative strength 125%, where type-A corresponds to the positive values in the estimated network and type-B corresponds to the negative ones. The status indicators collected in the real experiments (given in Table S6, available online) are included in this section as the covariates.
For illustrative purpose, we only present the simulation results using the estimated networks (given in Figure S9, available online) from Experiments 1 and 4. The comparison among regularized methods is given in Table S4 (available online), which reveals that all methods perform similar in terms of correctly identifying more connections and SGL-30-10 well maintained low number of false detections. In contrast, the other two L 1 based methods both produce much more false alarms, which is not desirable in the real applications.

T-Maze Task of Working Memory
We aim to estimate the functional connectivity structure of neurons in the rat prelimbic region of the prefrontal cortex (plPFC) using the proposed SIE-GLM. Neural data were obtained from adult male Sprague-Dawley rats performing a Tmaze based delayed-alternation task of working memory (Devilbiss and Waterhouse 2004;Devilbiss, Page, and Waterhouse 2006;Devilbiss, Jenison, and Berridge 2012). In brief, animals were trained to navigate down the runway of the T-maze and choose one of two arms opposite to the one previously visited for food rewards (chocolate chips 0.8 g) delivered by the experimenter's hand. For each trial, the animal was placed in a start-box for a predetermined time (a delay) and released with the removal of a starting gate. The rat would navigate to the junction of the "T" and choose either the left or right arm, as seen in Figure S7 (available online). On a correct choice, the animal was rewarded and returned in the start-box, yielding a correct sequence of "left, right, left, right, . . .". On incorrect trials, the animal was removed from the incorrectly chosen arm and returned to the start-box without reward. Training continued in the T-maze task until performance accuracy reached an average of 90% on 10 trials (0 sec delay, one testing session per day). A restricted feeding schedule (16 g-20 g of standard chow) maintained motivation with quantities of food titrated for each animal to maintain motivation for each 40-trial session. Animals were then surgically implanted with recording electrodes and returned to ad lib feeding for the duration of recovery (7-10 days). Following recovery, restricted feeding was reinstated and training continued until animal performance was stable across 40 trials at 90%-100% accuracy with at least a 10 sec delay. Training/testing occurred at the same time each day. All procedures were in accordance with NIH guidelines and were approved by the University of Wisconsin Institutional Animal Care and Use Committee.

Neural Data Discrimination and Recording
Online Discrimination and Recording. On the day of a recording session, the animal was tethered to customized multichannel electrophysiological hardware (Plexon Inc., Dallas, TX). Neural activity was amplified, discriminated, time stamped, and recorded from putative single units of the plPFC. The animal remained in its home cage, placed above the T-maze, during this discrimination phase to allow the animal to habituate to the tether and the recording arena. Template matching algorithms were applied to neural activity to preliminarily discriminate action potentials exhibiting a 3 : 1 signal-tonoise ratio. Following discrimination of plPFC units, the animals remained tethered to the recording hardware for the remainder of the day. During each testing session, videotape recordings were made of the entire experimental session with a video counter timer providing time stamps (resolution = 0.0125 sec) synchronized to the multiunit recording systems.
Offline Unit Analysis. After each experimental session, preestablished offline criteria were used to verify that waveforms assigned to each online discriminated unit originated from a single neuron. These previously described criteria were based on unit waveform properties and spike train discharge patterns including: (i) variability of peak waveform voltage, (ii) variability of waveform slope(s) from peak to peak, (iii) separability of clustering of scattergram points from the waveform's first two principal components, and (iv) spike train autocorrelogram. Neurons that did not meet these criteria were excluded from the study. Neuronal waveform shape, discharge pattern (interspike interval), and response properties were further examined to verify that neurons were not recorded across multiple recording sessions. Further analyses of data from neurons recorded from multiple session days were limited to the first recording session of that neuron. Finally, putative pyramidal projection neurons were identified by the action potential shape. Essentially, the peak-peak (P-P) duration of waveforms from verified neurons were calculated. Neurons with P-P intervals greater than 200 μs were classified as "wide spike" (WStype) neurons. Following each experimental session, the behavioral intervals were visually identified by the experimenter from time-stamped video recordings and manually entered into each neural recording data file. These event-states were defined by events occurring as the animal performs the T-maze task that included • placement of the rat into the start-box, • removal of the start gate, • rat reaching the branch point of the "T", • the rat entering one of two goal arms (choice), • receipt of food reward, and • removal of the rat to begin another trial.
Each trial was further classified as a correct or incorrect trial as well as by chosen spatial goal (i.e., left vs. right arm). Dividing the task into these behavioral intervals should not be interpreted as representing individual and discrete cognitive functions. Instead, these are convenient delineations of different behaviors and task events where different PFC-dependent processes may be involved. Certainly, one can expect that some PFC-dependent processes may be involved in adjacent behavioral intervals, and like all delayed-response tasks, a clear temporal delineation of onset and termination of different cognitive functions is difficult.

Neurophysiological Data Analysis
The dataset comprises eight experiments, each having 40 trials (40 turns). Basic statistics are provided in Table S5 (available  online). Given the length of the spike trains and the relatively low firing rate on average, we bin the spikes at 100 ms (0.1 sec), which gives about 10,000-20,000 bins for each experiment. We set P = 30, that is, the spike history may affect its own spontaneous firing rate up to 3 sec; and Q = 10, that is, the influence of spikes from other neurons lasts up to 1 sec. Other choices of P and Q are also considered, which give similar results. Finally, we code the current state of the experimental animals into eight indicator columns that are listed in Table S6 (available online). Note that we only model the parameters associated with states as linear terms, which only change the overall firing rate and do not change the connectivity network structure. Then we applied the SIE-GLM SGL regularized method to recorded neurophysiological data for each of the eight experiments.
For the autoregressive parameters γ c;p , many of them are estimated to be zero. Most of the identified nonzero parameters are positive, which indicate that there are some self-exciting processes during a certain period of time. Figure S8 (in the online supplementary materials) gives the estimated autoregressive parameters for selected neurons (Nos. 14, 19, and 73) in Experiment 4, which illustrate some typical forms of autoregressive parameters. The refractory period of these neurons could not be observed in this study, given that spike train binning was 0.1 sec, well beyond the absolute refractory period of approximately 0.003 sec. Note that some hilly forms are observed from Neuron 14. This cyclic phenomenon is possibly caused by mutually excitatory connections between these two neurons, rather than solely by the individual self-exciting process. Such excitatory pair or clique can also be found in the network structure of other experiments.
Table S7 (in the online supplementary materials) summarizes the number of detected functional connections into two categories: excitatory and inhibitory connections. The type of detected connection from neuron i to neuron c is determined by the sign of the sum of the estimated interactive parameters, sign(1 T γ (c,i) ), where 1 is a vector of ones. We also compare the results with those from two L 1 based methods discussed in Section 4. We can see that among thousands of possible connection pairs, only a few of them are selected by regularized methods. The short-term L 1 method detects more connections than the other two methods, but based on the simulation results we may doubt that it could report several false discoveries. The long-term L 1 method reports the smallest number of connections, and the SGL should provide a good balance and better results as those in the simulation studies. For example, the estimated networks from Experiments 1 and 4 are given in Figure  S9 (available online).
Also we provide the colored maps of the estimated interaction matrix in Figure S10 (available online), whose color at the ith row and the jth column represents the influence from neuron j on neuron i. The Red color is for an excitatory effect, the Blue color is for the inhibitory effect on the target neuron from the trigger neuron, and the background Green color is for no effect, where the exact color is determined by the strength of the interaction, that is, γ (i,j ) 2 . (Experiments 5 and 8 are omitted in Figure S8 (available online), since too few connections are identified, which may be due to the low baseline firing rate and short experimental time.) The detected connections among neurons share some common characteristic across experiments. For example, there are several excitatory pairs and cliques along the diagonal lines, which suggests that the neurons in the self-exciting groups tend to be physically close in the plPFC. The number of detected inhibitory connections is fewer than that of the excitatory ones. This is likely, because only 20% of cortical neurons are inhibitory interneurons. We need to note that the term "functional connections" here refers to the statistical dependencies between spike trains (neurons) and does not necessarily imply the existence of an anatomical connection between the corresponding neurons (Okatan, Wilson, and Brown 2005); nevertheless results we observed would be still useful to guide further research.
Table S8 (in the online supplementary materials) summarizes the estimated parameters for eight state indicator covariates described in Table S6 (available online). Among all parameters, many are estimated to be insignificant, which means the spike activities are generally similar across different states of the T-maze trials. Among the eight state covariates, the "Delay", "Run", and "Decision" covariates have somewhat stronger differences in the number of neurons with positive and negative estimated parameters. For the "Delay" variable, there were fewer neurons with positive estimated parameters than those with negative parameters in Experiments 3-6, likely reflecting that delayrelated neurons represent approximately 20% of PFC neurons. On the other hand, the large number of neurons with positive estimated parameters for "Run" and "Decision" variables suggest that a large percentage of neurons coded decision-making processes and the behavioral response.

DISCUSSION
When analyzing simultaneously recorded spike trains, it is desirable to have some unified model to include several factors, such as the autoregressive effect on each neuron, the functional connections between pairs of neurons, as well as the experimental state indicators. In this work, we use the regularized SIE-GLM with Poisson-type neuronal responses to achieve such a goal. The SIE-GLM framework is able to deal with different types of factors that can affect the neuron activities; meanwhile, the appropriate penalty can force parameters of those insignificant factors to be exactly zero so that the functional connections between neurons can be inferred and separated from noises.
Here, our approach is based on the belief that among the huge amount of all possible connections, only a very small portion of them are really significant, so that a sparse network can be constructed using the estimated connections from the regularized method. Comparing with previous works (e.g., Zhao et al. 2012), we further propose the use of structured regularization, which provides more flexibility to incorporate the structural prior information of the parameter space and improves the performance. We give more rigorous theoretical results when considering the dependence within the spike trains. From the simulation results, the SGL regularized method can indeed improve the sensitivity for detecting the true connections without sacrificing the speci-ficity. The proposed AFGU algorithm is computationally very efficient under the sparse network assumption.
We then applied the proposed method to real spike train recordings from neurons in the prelimbic region of the prefrontal cortex (PFC) of adult male rats. Some interesting findings about the ensemble of neurons include the following.
• More excitatory connections are detected than inhibitory connections; network connectivity is suggestive of hubbased network architecture. • Several distinct neural cliques could be identified that included delay-related and behavioral response-related PFCrelated functions.
Although the statistically significant functional connection does not infer synaptic connections, it provides useful information to guide further research on the details of the interactions within neuronal network. In summary, our proposed method is applicable to a broad range of simultaneously recorded spike trains and expected to perform better than existing methods.

SUPPLEMENTARY MATERIALS
Online appendix: The appendix collects notations in Appendix A, detailed derivations of Theorem 1 in Appendix B, and Figures S1-S10 and Tables S1-S8 in Appendix C. (Technometrics ZCGGDZ online appendix.pdf, pdf file) Package for Matlab codes and real data: The Matlab script files, a readme file, and real data used for simulation studies and real data analysis in the article are given in the zipped file. (Technometrics ZCGGDZ online Matlab codes.zip, zipped file)