Estimation of Multiple-Regime Threshold Autoregressive Models With Structural Breaks

The threshold autoregressive (TAR) model is a class of nonlinear time series models that have been widely used in many areas. Due to its nonlinear nature, one major difficulty in fitting a TAR model is the estimation of the thresholds. As a first contribution, this article develops an automatic procedure to estimate the number and values of the thresholds, as well as the corresponding AR order and parameter values in each regime. These parameter estimates are defined as the minimizers of an objective function derived from the minimum description length (MDL) principle. A genetic algorithm (GA) is constructed to efficiently solve the associated minimization problem. The second contribution of this article is the extension of this framework to piecewise TAR modeling; that is, the time series is partitioned into different segments for which each segment can be adequately modeled by a TAR model, while models from adjacent segments are different. For such piecewise TAR modeling, a procedure is developed to estimate the number and locations of the breakpoints, together with all other parameters in each segment. Desirable theoretical results are derived to lend support to the proposed methodology. Simulation experiments and an application to an U.S. GNP data are used to illustrate the empirical performances of the methodology. Supplementary materials for this article are available online.


INTRODUCTION
The (r + 1)-regime threshold autoregressive (TAR) model, defined by where X (j ) t−1 = (1, X t−1 , . . . , X t−p j ), asserts that the autoregressive structure of the process {X t } is governed by the range of values of X t−d . The integer-valued quantity d is known as the delay parameter and the parameters −∞ = θ 0 < θ 1 < · · · < θ r < θ r+1 = ∞ are called the thresholds. These thresholds partition the space of X t−d into r + 1 regimes, in which the jth regime follows an AR model with order p j , coefficient parameter j and white-noise variance σ 2 j . Since its initial proposal by Tong (1978), the TAR model has attracted enormous attention from the nonlinear time series literature. It has also been applied to study real data behaviors in many areas, especially econometrics and finance. Excellent surveys on the TAR models can be found in Tong (1990Tong ( , 2011. For parameter estimation, Chan (1993) considered the least square estimation for a two-regime TAR model and established its large sample theory. Li and Ling (2012) extended the theory to a multiple-regime TAR models. In particular, it is shown Chun Yip Yau, Department of Statistics, Chinese University of Hong Kong, Shatin, N.T., Hong Kong (E-mail: cyyau@sta.cuhk.edu.hk). Supported in part by HKSAR-RGC Grants CUHK405012, 405113, and Direct Grant CUHK2060445. Chong Man Tang, Department of Statistics, Chinese University of Hong Kong, Shatin, N.T., Hong Kong (E-mail: s1010097819@sta.cuhk.edu.hk). Thomas C. M. Lee, Department of Statistics, University of California at Davis,Davis,CA 95616,. Supported in part by the National Science Foundation under Grants 1007520, 1209226 and 1209232. The authors are most grateful to the reviewers, the associate editor and the editor, Professor Xuming He, for their most constructive and helpful comments, which led to a much improved version of the article.
Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/r/jasa. that, when the number of thresholds is known, each estimated threshold is √ n-consistent and converges weakly to the smallest minimizer of a two-sided compound Poisson process. Also, the estimated thresholds are asymptotically independent. Moreover, if the AR order is known, the AR parameter estimates in each regime are √ n-consistent and asymptotically normal. Despite the well developed theoretical background of the estimation theory, the estimation procedure of TAR models demands a high computational cost due to the irregular nature of the threshold parameters; for example, see Li and Ling (2012). In particular, for a (r + 1)-regime TAR model, locating the global minimum of the least squares criterion requires a multi-parameter grid search over all possible values of the r threshold parameters, which is typically computational infeasible. To circumvent this difficulty, Tsay (1989) transformed (1) into a change-point model and proposes a graphical approach to help visually determine the number and values of the thresholds. Coakley, Fuertes, and Pérez (2003) used similar techniques to develop an estimation approach that is based on QR factorizations of matrices. When the number of thresholds r is unknown, Gonzalo and Pitarakis (2002) suggested a sequential estimation procedure for choosing r, under the assumption that all σ j 's are equal. We are not aware of any results for more general models. In particular, when the number of thresholds and the order of the AR model in each regime are unknown and the σ j 's are not assumed equal, no practical estimation method appears available.
In this article, motivated by the close connection between TAR model and the structural break model suggested by Tsay (1989), we propose an automatic procedure to estimate the number and values of thresholds, as well as to perform model selection in each regime. The procedure is based on an objective function derived from the minimum description length (MDL) principle, for which the best fitting TAR model is defined as its minimizer. A genetic algorithm (GA) is developed to provide an efficient computational method for corresponding minimization problem. We shall establish the consistency of our MDL estimates for the number and locations of the thresholds, the delay parameter and the AR orders in each regime. Moreover, the convergence rate of the thresholds and the AR coefficients are established.
For many real life time series data, especially when they are long, the stationarity assumption is often violated. One approach to handle this issue is to perform piecewise stationary modeling via structural break detection. For example, Davis et al. (2006) considered piecewise modeling of AR processes, and their method is generalized to piecewise modeling of GARCH and stochastic volatility models in Davis, Lee, and Rodriguez-Yam (2008). Lavielle and Ludena (2000) studied structural breaks in spectral distribution in stationary time series, and Aue et al. (2009) considered break detections in multivariate time series. For TAR models, Berkes et al. (2011) and Zhu and Ling (2012) developed tests for the presence of structural breaks and demonstrated their significance in time series of soybean and stock prices. However, there seems to be no structural break estimation method available for TAR models. As a second contribution of this article, we further extend the MDL methodology to estimating structural breaks in TAR models. In particular, we provide a computationally efficient procedure for estimating all the unknown parameters in the model, including the number and locations of the structural breaks. We also provide desirable theoretical results backing up our procedure.
The rest of this article is organized as follows. In Section 2, we first derive an MDL objective function for parameter estimation in TAR models, we then develop a GA to conduct the corresponding optimization, and finally present theoretical results that lend support to our methodology. In Section 3, estimation for TAR models with structural breaks will be discussed. Simulation studies and an empirical example are given in Sections 4 and 5, respectively. Concluding remarks are provided in Section 6, while technical details and additional material are delayed to the appendix and a supplementary file.

ESTIMATION FOR TAR MODELS
This section considers the problem of estimating the number and values of the thresholds, and also the AR order and parameters in each regime in TAR model (1). We first apply the MDL principle to derive an objective function for which the best fitting TAR model is defined as its minimizer. Then, we construct a GA for finding this minimizer. We also provide desirable theoretical properties to support our method.

MDL for TAR Models
The MDL principle was developed by Rissanen (1989Rissanen ( , 2007 as a general methodology for model selection. It defines the best fitting model as the one that enables the best compression of the available data. There are different versions of MDL, and we shall use the so-called "two-part MDL." See Hansen and Yu (2001) and Lee (2001) for basic introductions.
Let M be the class of TAR models satisfying (1). Any model from this class is denoted by F ∈ M. In two-part MDL the data {X t } are decomposed into two components: a fitted modelF and the corresponding residualsê. Notice that onceF andê are known, the data {X t } can be completely retrieved. Therefore, to store {X t }, one could instead storeF andê, and the hope is that storingF andê is cheaper, memory-wise, than storing {X t }. This leads to the following fundamental expression in two-part MDL: where CL(z) is a generic notation to denote the codelength for z, typically measured in bits. Equation (2) merely states that the codelength of the data CL({X t }) is the sum of the codelength of the fitted model CL(F) and the codelength of the residuals given the fitted model CL(ê|F). Note that CL(ê|F) can be interpreted as a measure of lack of fit while CL(F) can be regarded as a penalty for model complexity. The MDL principle defines that the best model is the one that minimizes CL({X t }).
To proceed, we first derive an expression for CL(F). Since a TAR model is completely specified by r, θ j 's, σ 2 j 's, p j 's, and j 's, CL(F) can be decomposed into CL(F) = CL(r) + CL(θ 1 , θ 2 , . . . , θ r ) + CL(p 1 , . . . , p r+1 ) where j := (σ 2 j , j ) = (σ 2 j , φ j 0 , φ j 1 , . . . , φ jp j ) is the parameter vector in the jth regime. In general, to encode an arbitrary integer I, approximately log 2 (I ) bits are required. Thus CL(r) = log 2 (r) and CL(p j ) = log 2 (p j ). To calculate CL( j ), the following result of Rissanen (1989) could be used: 1 2 log 2 (N ) bits are required to encode a maximum likelihood estimate of a real-valued parameter computed from N observations. Let n j be the number of observations in the jth regime. Then, each of the p j + 2 parameters of j can be viewed as the Gaussian likelihood estimate computed from n j observations, giving CL( j ) = (p j + 2) log 2 (n j )/2; see (5) below. Since each threshold θ j divides the domain of {X t−d } into regimes, they can be represented by the order statistics of the series {X t−d }. As a result, θ j can be encoded by the maximum of the n j observations in the jth regime. As the maximum value of a set of real number is the maximum likelihood estimate of the upper bound of a uniform distribution, CL(θ 1 , θ 2 , . . . , θ r ) = r j =1 log 2 (n j )/2. Putting these together, we obtain Next, we derive an expression for CL(ê|F). It was shown by Rissanen (1989) that the codelength ofê is well approximated by the negative log-likelihood of the fitted modelF. Given the thresholds θ j 's and the order of autoregressive models p j 's, the log-likelihood of the data can be approximated by the conditional log-likelihood l(θ 1 , . . . , θ r , 1 , . . . , r+1 ; x) t } are the observations in the jth regime, sorted in an ascending of X t−d and Y (j ) t−1 is a vector containing the p j previous observations of Y (j ) t . For example, if X a < X b are the two smallest observations greater than θ j −1 , then Y (1, X b+d−1 , . . . , X b+d−p j ). For simplicity, we take X t = 0 for t ≤ 0. It can be checked readily that minimizing the function inside the summation in (5) gives n j (log(2πσ 2 j ) + 1), wherê n is the least square estimator in Li and Ling (2012). Thus, the codelength of the residuals given the fitted model is Combining (2), (4), and (6), we obtain MDL(r, θ 1 , . . . , θ r , p 1 , . . . , p r+1 ) := CL({X t }) The best TAR model is then chosen by minimizing MDL(r, θ 1 , . . . , θ r , p 1 , . . . , p r+1 ) with respect to r, θ j 's, and p j 's. For simplicity we have assumed that d is known. In general, the estimation procedure can be repeated for various d to select the best model.

Minimization of (7) using GAs
Given the complexity of the TAR model (1), practical minimization of the MDL criterion (7) is difficult. Given the successes of Rodriguez-Yam (2006, 2008); Lee (2001);and Lu, Lund, and Lee (2010), we develop a GA to minimize (7) efficiently. However, we note that our GA has some critical differences from those developed in these references.
The basic idea of GAs can be described as follows. A first set, or sometimes known as population, of candidate solutions to the optimization problem is generated. These possible solutions, known as chromosomes, are typically presented in vector forms and are free to "evolve" in the following way. Parent chromosomes are randomly chosen from the initial population with probability inversely proportional to the ranks of their objective criterion values. Then, offsprings are produced by mixing two parent chromosomes through a crossover operation, so that the good features from the parents may be combined. Offsprings could also be produced from a mutation operation from a single parent chromosome so that all possible solutions may be explored. With crossover and mutation, a second-generation of offsprings is produced. This process is repeated for a number of generations until the an individual chromosome is found to be minimizing the objective function. Details of our implementation of the GA for minimizing (1) is given below.

Chromosome Representation.
When using GAs, a chromosome should carry complete information about the model under consideration. In the current context, this corresponds to the number of threshold r, the threshold values θ j 's and the AR orders p j 's. Notice that as long as these quantities are specified, one could uniquely calculate the maximum likelihood estimates of other model parameters, so these other model parameters do not need to be present in the chromosome. In our implementation a chromosome is a vector δ = [r, p 1 , (R 1 , p 2 ), . . . , (R r , p r+1 )] of length 2(r + 1), where Note that the number of observations in the ith regime is n i = R i+1 − R i + 1. We shall impose a minimum span constrain where R i+1 − R i > n A for some integer n A , so that we can avoid having too few observations in any regime. This constrain was also used by Rodriguez-Yam (2006, 2008) and Lu, Lund, and Lee (2010). However, we note that the chromosome representation adopted here is fundamentally different than those in Rodriguez-Yam (2006, 2008) and Lu, Lund, and Lee (2010): in their work the length of a chromosome is always fixed to be n, while ours is of variable length. Consequently, the method for initializing the first generation, and the definitions for crossover and mutation are necessarily different in our implementation.

Initialization of the First Generation.
First the number of threshold r is generated from the Poisson distribution with mean equals 2. Then, R 1 to R r are sampled from 1, . . . , n − d with uniform probability. Next, we find the set of R i 's that violates the minimum span condition with n A = 20. Delete one at a time randomly until the condition is satisfied. Finally, for j = 1, . . . , r + 1, the AR order p j for the jth regime is generated uniformly from the integers {0, 1, . . . , P 0 }. We use P 0 = 12 to allow sufficient flexibility and to capture possible seasonal effects.

Propagation of New Generations.
Once an initial population of random chromosomes are obtained, new chromosomes can then be produced by the crossover and the mutation operations. In our implementation, the probability for conducting a crossover operation is set to be π C = 0.9.
During crossover, the first step is to randomly select two parent chromosomes from the current population of chromosomes. These parents were selected with probabilities inversely proportional to their ranks sorted by their MDL values. Thus, chromosomes with smaller MDL will have a higher chance of being selected. From these two parents, a new offspring is produced as follows. First, the offspring's AR order of the first segment is chosen from one of the parents with equal probabilities. Then, we combine and sort the parents' threshold values and select each threshold and its associated AR order with probability 0.5. For example, consider two parents [2, 1, (105, 2), (333, 0)] and [3, 0, (212, 4), (349, 2), (788, 1)]. We first select the order of the first segment randomly form the set (1, 0), say giving 0. Then, the set of thresholds from the parents are combined and sorted to be (105,212,333,349,788). Next, each of the threshold value is selected with probability 0.5, say yielding (105,333,788). Then, the offspring chromosome is constructed as [3, 0, (105, 2), (333, 0), (788, 1)].
The goal of the mutation operation is to prevent the GA being trapped in local optima. It is done by introducing randomness to one parent to form one offspring. In our implementation, we first draw a parent from the current population of chromosome. Then, we generate a second random parent using the method described in Section 2.2.2. Finally, an offspring is produced by applying crossover to these two parents. To allow for a higher degree of mutation, the probabilities of selecting a threshold are, respectively, π P = 0.3 for the first parent and 1 − π P = 0.7 for the second random parent. After an offspring is produced, for each regime, with probability π N = 0.3 we randomly draw an AR order to replace the existing one. Of course, at the end of each crossover and mutation operation, the procedure that ensures the minimum span condition will be performed.
To guarantee the monotonicity of the algorithm, an additional step, the elitist step, is performed. That is, the worst chromosome of the next generation is replaced with the best chromosome of the current generation.

Migration.
To gain computational efficiency from parallel computing, we implement the island model that can speed up the convergence rate and to reduce the chance of converging to suboptimal solutions (e.g., Alba andTroya 1999, 2002). The basic idea is, rather than running GA with one single population, the island model simultaneously runs N I GAs in N I different subpopulations, or islands. The distinct feature is to first set up a migration policy and then the chromosomes are allowed to migrate periodically among the islands using this policy. Here, the following migration policy is used: at the end of every M i generations, the worst M N chromosomes from the jth island are replaced by the best M N chromosomes from the (j − 1)th island, j = 2, . . . , N I . For j = 1, the worst chromosomes are replaced from the best chromosomes from the N I th island. In our simulations, we used N I = 50, M i = 5, M N = 2, and a subpopulation size of 100.

Declaration of Convergence.
At the end of each migration, the overall best chromosome (i.e., the chromosome with the smallest MDL) is noted. This best chromosome is taken as the final solution to our optimization problem if it remains the same for 10 consecutive migrations, or if the total number of migrations exceeds 20. This convergence criterion works really well in all our numerical experiments; that is, different runs of the algorithm often converge to the same solution for the same dataset. Similar convergence criteria have been used by Rodriguez-Yam (2006, 2008).
Finally, we point out that further computational savings can be achieved by the following. In the GA, once the thresholds are given, there is no need to check the indicator function I (θ j −1 < X t−d < θ j ) for each t to categorize the observations into regimes. Recall that the thresholds are specified by (R 1 , . . . , R r ), the order of the X t−d 's. Thus, the observations in the jth regimes are those X t s with X t−d ranking between R j −1 + 1 and R j . Therefore, to categorize the observations into regimes, we only need to sort {X t−d } and arrange the corresponding values of X t and X t in the order of {X t−d }, see Tsay (1989). This greatly reduces the computational burden.

Theoretical Properties
In this section, we establish the consistency of the estimates defined by the MDL criterion (7). Let p = max j ∈{1,2,...,r+1} p j be the maximum AR order of the TAR model among all regimes and x n = (x n , x n−1 , . . . , x n−p+1 ) . By construction, {x n } is a Markov chain. Denote the l-step transition probability of {x n } by P l (x, A), where x ∈ p and A is a Borel set. We impose the following assumptions.

Assumption 1.
(a) The Markov chain {x n } admits a unique invariant measure π (·) such that for some K > 0, ρ ∈ [0, 1), for positive integer n and any x ∈ p , P n (x, ·) − π (·) ≤ K(1 + |x|ρ n ), where · the total variation norm and | · | is the Euclidean norm. (b) The error e t is absolutely continuous with a uniformly continuous and positive p.
The autoregressive function is discontinuous in the sense that for each j = 1, . . . , r + 1, there exist some z * = (1, z p−1 , z p−2 , . . . , z 0 ) such that ( j − j −1 ) z * = 0. (e) The number of thresholds r is unknown but has an upper bound R. The distance between any two thresholds is lower bounded by some θ > 0.
Assumption 1(a) to 1(d) are similar to that in Chan (1993) and Li and Ling (2012). Assumption 1(e) is an additional assumption required for consistent estimation for the number of thresholds. The proof of the following theorem can be found in the appendix. ). Let the corresponding estimators that minimize the MDL criterion (7) subjected to Assumption 1(e) ber n ,d n ,θ n ,p n, andˆ n , respectively. If Assumption 1 holds, then (r n ,d n ,θ n ,p n ,ˆ n ) P → (r 0 , d 0 , θ 0 , p 0 , 0 ) .
The above theoretical results are derived under the assumption that for all j = 1, . . . , r + 1, the AR model parameter j = (σ 2 j , j ) in the jth regime belongs to some compact spacē j ∈ p j +2 . These parameter constraints can be achieved by restricting the optimization in (7) in such a way that the characteristic roots of the resulting AR polynomial are all greater than 1 + c and the noise variance lies in the interval [c, 1/c], where c a small constant. In this way, explosive AR models with near unit roots are excluded. We note that our GA does not impose these constraints when optimizing (7). Nevertheless, as c can be arbitrarily small, say 10 −5 , it does not affect the results in most practical applications.

ESTIMATION OF STRUCTURAL BREAKS IN TAR MODELS
In this section, we extend the above methodology to the estimation of TAR models with (unknown number of) structural breaks. To be precise, we consider the fitting of the following model, which can be seen as several TAR models concatenated together at different break points: Here, m is the number of structural breaks, 0 = τ 0 < τ 1 < · · · < τ m < τ m+1 = n + 1 are the locations of the structural breaks that partition the time series into segments of TAR models and −∞ = θ i0 < θ i1 < · · · < θ i,r i < θ i,r i +1 = ∞ are the thresholds that divides the ith segment into r i + 1 regimes with delay parameter d j . Also, X t−1 = (1, X t−1 , . . . , X t−p ij ), e t is iid with zero mean, unit variance and independent of the past observations X t−1 , X t−2 , . . .. Finally, the parameters ij and σ 2 ij are, respectively, the AR coefficients and white-noise variance for the AR(p ij ) model in the jth regime of the ith stationary segment.
Given an observed series {X i } n i=1 , our goal is to, simultaneously, estimate the number of structural breaks m, the structural breaks locations τ 1 , . . . , τ m , the numbers of thresholds {r i } m+1 i=1 and the corresponding threshold values (θ i1 , . . . , θ ir i ) in each segment, and the underlying AR order p ij in the jth regime of the ith segment. In the following, we first derive an MDL criterion for this problem, and then develop a GA for its minimization. We also establish consistency of our parameter estimates.

MDL for TAR with Structural Breaks
Denote the TAR model for the ith segment by F i ∈ M, where M is the class of all TAR models. Note that, once the structural break locations (τ 1 , . . . , τ m ) are known, each of the m + 1 segments follows a standard TAR model as in (1). Therefore, to encode model (8), we need to encode the number of structural breaks m, the locations of structural breaks (τ 1 , . . . , τ m ), and the model information in each segments. Specifically, the total code length required is Recall that approximately log 2 I bits are required to encode an integer I, we have CL(m) = log 2 (m) and CL(τ 1 , . . . , τ m ) = m i=1 log 2 (τ i ). Moreover, CL({X t } τ i−1 ≤t<τ i ) is given by the MDL in (7) with threshold parameters θ i1 < · · · < θ i,r i , AR orders (p i1 , . . . , p i,r i +1 ) and number of observations (n i1 , . . . , n i,r i +1 ) in the threshold regimes. Thus, the explicit expression of the MDL criterion for TAR with structural breaks is given by MDL(m, r 1 , . . . , r m+1 , τ 1 , . . . , τ m , θ 11 , . . . , θ m+1,r m+1 , p 11 , . . . , p m+1,r m+1 +1 ) In (10), we have ignored the constant n/2 which does not affect the optimization. As before, the best fitting model is chosen as the one that minimizes (10) with respect to m, r i 's, τ i 's, θ ij 's, and p ij 's.

Optimization Using GA
This subsection provides the implementation details of the GA tailored for minimizing (10). We first remark that the GAs developed by Rodriguez-Yam (2006, 2008) for break point detection cannot be readily adopted here. It is because in Rodriguez-Yam (2006, 2008), a chromosome only needs to record the structural break locations and the order of the model in each segment, as parameters in each segment can be estimated via maximum likelihood once the structural breaks are given. For model (8), however, the estimation in each segment involves the fitting of a standard TAR model (1) which requires one execution of the GA proposed in Section 2.2, hence imposes huge computational burden. To overcome this issue, one possibility is to let the chromosomes contain both the structural break and threshold regime information. The details are as follows.

Chromosome Representation.
The chromosomes now should carry complete information about the structural breaks, threshold values, and the AR orders within each regime of each segment. We suggest using chromosomes of the form η = (m, δ 1 , (τ 1 , δ 2 ), . . . , (τ m , δ m+1 )), where δ j is the "TAR chromosome" given in Section 2.2 that specifies a TAR model (1).

Initialization of First Generation.
First, the number of structural breaks m is generated from the Poisson distribution with mean 2. Then, the locations of structural breaks are obtained by sampling m elements from the discrete uniform distribution on {1, . . . , n}, where n is the total length of the time series. Analog to the GA previously developed Section 2.2, a "minimum span" constrain has to be imposed to ensure sufficient number of observations in each segment. Here, we use the condition τ i+1 − τ i > n B with n B = 40. Again, we find the set of τ i s that violates the minimum span condition and delete one at a time randomly until the condition is satisfied. Once the locations of breaks are specified, the TAR chromosomes for each segment are generated from the procedure discussed in Section 2.2.

Propagation of New
Generations. Similar to Section 2.2, a new generation of chromosomes are produced by crossover and mutation with probability π C = 0.9 and 1 − π C = 0.1, respectively.
During crossover, two parent chromosomes are chosen from the current population in a similar fashion as before. From these two parents, we combine and sort their break locations and select each break with probability 0.5. This process is similar to combining the thresholds values in the GA for TAR models. Once the set of offspring breakpoints is obtained, we then assign each breakpoint with one TAR chromosome. In contrast to the GA in Section 2.2 where the AR order is inherited from the parent's threshold value, we employ an extra crossover step to obtain a new TAR chromosome: at each breakpoint in the offspring, we have two TAR chromosomes from the two parents correspond to the TAR model when the break occurs. Then, a crossover step is conducted between these two TAR chromosomes to give a new TAR chromosome. For example, given two parents η a = [3, δ a 1 , (95, δ a 2 ), (343, δ a 3 ), (900, δ a 4 )] and η b = [1, δ b 1 , (208, δ b 2 )], the sorted breakpoints set is (95,208,343,900). Suppose that after selection the locations of breaks are (95,208,343). Then, crossover operations will be conducted between the four pairs (δ a 1 , δ b 1 ), (δ a 2 , δ b 1 ), (δ a 2 , δ b 2 ), and (δ a 3 , δ b 2 ), giving four offspring, say δ o j , j = 1, . . . , 4, respectively. The offspring chromosome is then given by In the mutation operation, we follow the idea in Section 2.2: a new random parent is generated to crossover with a parent selected from the current population. As similar to before, the probability of selecting a breakpoint is π P = 0.3 for the selected parent and 1 − π P = 0.7 for the new parent. Once the breakpoints are selected, an mutation operation will be conducted for each associated TAR chromosome as discussed in Section 2.2.
Of course, at the end of each crossover and mutation operation, the procedure that ensures the minimum span condition is performed. Also, the elitist step is performed in the same fashion as before to guarantee monotonicity of the algorithm.

Migration and Declaration of Convergence.
The same island model, migration procedure and convergence criterion described in Section 2.2 are adopted here.
We close this section with the following remark. When both structural breaks and thresholds are present, the sorting of variables becomes more cumbersome. In this case, we associate each element in {X t−d } with a time index and a rank index. Note that the ranks R i 's in the TAR chromosomes are ordered according to all observations {X t } t=1,...,n . Suppose that a regime of a segment is specified by τ l < τ u and R l < R u . We first extract the group of ranks with time indexes in [τ l , τ u ), then within that group we extract the observations with ranks between R l and R u . Finally, the maximum likelihood estimates of the AR model for that regime can be computed from the extracted observations.

Theoretical Properties
Here, we establish the consistency of our parameter estimates defined by the MDL criterion (10). Similar to Davis and Yau (2013), we require the following extra assumption to ensure consistency estimation of the number and locations of structural breaks.
Assumption 2. The location of the ith structural break satisfies lim n→∞ τ i n = λ i ∈ (0, 1) for i = 1, . . . , m, where λ 1 < · · · < λ m . If Assumption 2 holds and Assumption 1 holds for each of the stationary segments, then (m n ,λ n ,r n ,d n ,θ n ,p n ,ˆ n ) The proof is delayed to Section S.2 of the supplementary file.

SIMULATION EXPERIMENTS
In this section, we evaluate the empirical performance of the proposed methodology via three simulation experiments. Additional simulations are provided in Section S.1 of the supplementary file.

TAR Model Without Break
In this first example a series is of length 2000 is generated from the four-regime TAR model where e t iid ∼ N (0, 1) and I (a,b) = I (a < x t−1 ≤ b). Figure 1 shows a realization of model (11). We applied the methodology in Section 2 to this realization and obtained three thresholds located atθ 1 = −0.8008,θ 2 = −0.3011, andθ 3 = 0.5013. Also, the proposed procedure correctly identified the AR orders (p 1 ,p 2 ,p 3 ,p 4 ) = (2, 1, 4, 3) for this realization.
To study the accuracy, we applied the same procedure to 500 realizations of the process (11). Table 1 lists the percentages of the estimated number of regimes. Note that the number of thresholds are correctly identified in all the 500 realizations. Table 1 also reports the sample means and standard errors of the three estimated thresholds. All estimated thresholds have small bias and standard errors. Table 2 summarizes the relative frequencies of the estimated AR orders in the four regimes. Of the 500 realizations, 94% of them correctly identified the model; that is, detected three thresholds with AR orders equal one in all regimes. From Figure 1. A realization generated from the TAR model (11). Tables 1 and 2, one can see that the proposed procedure performs very well for the TAR process (11), especially in locating the thresholds.

TAR Model With Structural Breaks
In the second example, we focus on a more complicated model with structural breaks and more regimes in the TAR models within the segments. The model is where e t iid ∼ N (0, 1) and I (a,b) = I (a < x t−1 ≤ b). A realization of (12) is plotted in Figure 2. Again, 500 realizations from the process (12) are simulated and estimated. Table 3 summarizes the results of the structural break estimation, including the number of breaks and the sample mean and standard deviation of the estimated breakpoints when the number of breaks was correctly specified. The number of breakpoints were correctly identified in all of the 500 realizations.
Finally, we investigate the results of threshold estimation for the cases where the number of breakpoints were correctly identified. Table 4 summarizes the estimation results for the TAR models in each of the three segments. Despite the complexity of the model, of the 500 realizations, 98% have simultaneously identified the correct numbers of structural breaks and the correct numbers of thresholds in all segments.

Nonlinearity and Nonstationarity
In this last example, we illustrate the behavior of the MDL model selection procedure on nonlinear and nonstationary time series. Consider the following two-regime TAR model  where e t iid ∼ N (0, 1). Figure 3(a) shows a realization generated from this model. Notice that this realization could be easily misinterpreted as generated from a nonstationary model with two breakpoints, although it was originated from the stationary pure threshold model (13). For this realization, the methodology in Section 2 detected the true two-regime TAR model. This is reasonable because a two-regime model requires coding one threshold and two AR models, which is simpler than the structural break model that requires coding two breakpoints and three AR models.
Consider another realization of model (13) displayed in Figure 3(b). It can be interpreted as a two-regime TAR model or a structural break model with one breakpoint. Recall that in the derivation of the MDL criterion, the codelength required to store a breakpoint is log 2 (τ i ), while the codelength required to store a threshold is 1 2 log 2 (n ij ), where n ij is the sample size of the corresponding regime. Note that τ i = n ij for this realization. Consequently, the penalty associated with a TAR model is less than the penalty associated with the structural break model. In other words, the MDL procedure favors a stationary threshold model than a nonstationary structural break model, which seems to be an attractive feature.
To study the accuracy, we applied the same procedure to 500 realizations of the process (13) with n = 1000. In all the 500 realizations, no break point was detected. Table 5 lists the percentages of the estimated number of regimes.
Note that 24.4% of the 500 realizations were misclassified as no threshold. These correspond to those cases where no regime switch was present in the entire realization (e.g., see Figure 3(c)). The same experiment was then repeated with sample size increased to n = 2000. With a larger n, the chance for regime-switching to occur within a realization increases (e.g., see Figure 3(d)), and therefore it can be seen from Table 5 that the percentage of correct model identification was increased substantially.  We also studied the pure structural break model We applied the MDL procedure to 500 realizations of this process with n = 1000 and the results are reported in Table 6. Although the realizations of models (13) and (14) could be very similar (e.g., see Figures 3(a) and 3(e)), the first and the third segment in (14) follow different models and thus they cannot be grouped into the same regime as in (13). It can be seen from Table 6 that the MDL procedure correctly identified the model 90% of the time. In conclusion, the MDL procedure performed very well in distinguishing between structural break models and threshold models with few regime switches. Figure 3. Realizations generated from the TAR model (13) and structural break model (14). In this section, we use the structural break detection procedure developed in Section 3 to study the growth rate of the quarterly U.S. real GNP data. This dataset consists of 261 observations covering the period 1947-2012, and was investigated by Li and Ling (2012) using a three-regime TAR model.
Let y 1 , . . . , y 261 denote the observed real GDP. The growth rate series is defined as The time series plot of the growth rate series {x t } is given in Figure 4. It can be seen that the series before and after index 150 clearly behave differently, which strongly suggests the presence of structural breaks in the time series.
The proposed procedure in Section 3 was applied to this dataset, and three structural breaks were found:τ 1 = 56,τ 2 = 91,τ 3 = 150. Moreover, the fitted results suggest that the first, second, and fourth segments are pure AR models, while the third segment has a thresholds atθ = 1.93. The details are given in Table 7.
In Li and Ling (2012), this dataset is regarded as a threeregime TAR model without structural breaks. The fitting of their model and the MDL TAR model fitting with breaks are compared in Figure 5. The estimated breakpoints from the MDL procedure appear to partition the time series into segments with similar behaviors. From the lower half of Figure 5, one can see that the residuals from both models are similar. In fact, both sets of residuals passed the Ljung-Box test statistic with p-values approximately equal 0.96, indicating that both models fit the data adequately. The MDL value corresponds to the three-regime model of Li and Ling (2012) is 295.74. In each regime of this TAR model, a high order AR model (up to order 10) is required to model the data, giving the total number of parameters in the model as 31. On the other hand, the MDL value of our estimated model is 274.13, with a total number of parameters of 21. Moreover, from Table 7, one can see that simple AR models are sufficient for the data. Thus, given the lower MDL value and few number of parameters, it appears that a TAR model with structural breaks is a better parsimonious alternative to describe the GNP data than a pure TAR model.
The estimated structural breaks corresponds to year 1960, 1969, and 1983, respectively. Looking into the history, these three breaks could be associated with substantial changes in the U. S. economy. For examples, between 1960S. economy. For examples, between -1969, the Vietnam war broke out and there were complex cultural and political disorder across the globe. In the 1980s, there were great changes in business and economic situations as the production industries migrated to newly industrializing economies.

CONCLUDING REMARKS
This article considered the problems of parameter estimation and structural break detection for TAR models. In particular, a new methodology that combines the MDL principle and GAs is proposed to handle these tasks. Desirable consistency properties of the parameter estimates are established. The good finite sample performances of the methodology is illustrated with simulation experiments and an application to a U.S. GNP dataset.
The next proposition shows that the estimated AR coefficients using a proportion of data inside a segment still converge to the true value.
Finally, the following proposition shows that each threshold can be identified by a threshold estimate around its n −1 neighborhood. Thus, each estimated regime must be nested in one of the true regime asymptotically. Also, the estimated AR coefficients converge to the true values at a n −1/2 rate.