A Novel Estimation Method in Generalized Single Index Models

Abstract The single index and generalized single index models have been demonstrated to be a powerful tool for studying nonlinear interaction effects of variables in the low-dimensional case. In this article, we propose a new estimation approach for generalized single index models with known but unknown. Specifically, we first obtain a consistent estimator of the regression function by using a local linear smoother, and then estimate the parametric components by treating as our continuous response. The resulting estimators of θ are asymptotically normal. The proposed procedure can substantially overcome convergence problems encountered in generalized linear models with discrete response variables when sparseness occurs and misspecification. We conduct simulation experiments to evaluate the numerical performance of the proposed methods and analyze a financial dataset from a peer-to-peer lending platform of China as an illustration.


Introduction
Generalized linear models (GLM) are the most popular and useful tool to investigate the relationship between a discrete response variable and covariates (Cox 1958;Hausman and McFadden 1984;Cox and Snell 1989;McCullagh and Nelder 1989;Klein and Spady 1993) with the conditional expectation being a monotonic function of a linear combination of covariates. Maximum likelihood is generally applied to fit the GLM and to make inference in the literature. The associated algorithm has been well developed and implemented in commonly used software such as Matlab, R, Splus, and SAS. As well established, the maximum likelihood estimators (MLE) of the parameters are asymptotically normal and efficient in the large sample sense, and the numerical performance of the algorithm is promising (McCullagh and Nelder 1989;Venables and Ripley 2002) when the model structure is correctly specified.
However, a disadvantage of GLM regression is the lack of flexibility. In many applications, the assumption of a linear dependency may not be accurate enough to reflect the changing patterns of the covariates. As a remedy, generalized additive logistic regression has been widely adopted and applied in exploratory of discrete data (Hastie and Tibshirani 1986). The linear combination is generalized to the addition of nonparametric functions of components. However, such a generalization still has its own limitation for their relatively special structures. For instance, they can be used only for the additive case and are unable to reflect interactions of two or more variables, which we may encounter in the analysis of complex data.
Single index models (SiM), another attempt to avoid misspecification of GLMs and to gain dimensional reduction, has attracted great attention for estimating a conditional mean CONTACT Hua Liang hliang@gwu.edu Department of Statistics, George Washington University, Washington, DC. Supplementary materials for this article are available online. Please go to www.tandfonline.com/UBES. function because they relax restrictive assumptions imposed on parametric models of conditional mean functions such as generalized linear models (Ichimura 1993;Härdle, Hall, and Ichimura 1993;Cui, Härdle, and Zhu 2011), and therefore, gain (Bai and Rao 1991) more flexibility. An advantage of the SiM and their various extensions over additive models is that they can take interactions of multiple variables into account, which are frequently encountered in the analysis of complex biomedical data such as in gene regulatory networks, while still enjoying dimension reduction. There are various estimation procedures for single-index models. See Horowitz (2009) for a comprehensive survey and various applications of single-index models.
Working on single-index models with either a continuous or discrete response, the estimation of the index parameter generally consists of two steps: (i) given a parameter to obtain an estimator of the nonparametric component using some smoothing method; and (ii) to substitute the nonparametric function by the estimator from step (i) in a likelihood or quasilikelihood function to obtain an estimator of the index parameter, for instance, Ichimura's (1993) semiparametric nonlinear least-square estimation or Klein and Spady's (1993) maximum likelihood method. This is essentially similar to the profile likelihood principle (Severini and Wong 1992;Severini and Staniswalis 1994) or backfitting principle (Härdle et al. 2004). These methods inherit the advantages and disadvantages of the maximum likelihood methods under certain conditions. For example, the estimators may be heavily biased for small samples and can be sensitive to the choice of starting values. The optimality properties maybe not apply for small samples. Furthermore, when the model is either poorly specified or behaves badly due to data sparseness in one or other populations, convergence problems may appear. In addition, the Newton-Raphson algorithm is generally adopted for implementation, but it is possible for a given estimate to overshoot the true root in such a way that subsequent iterations enter into a repeating cycle which never converges as well. These potential concerns become more severe for binary data because the maximization of the objective function needs extensive iterations.
We integrate the profile likelihood principle and minimum average variance estimation (MAVE, Xia et al. 2002) to obtain estimators of θ 0 , and call the estimation approach PLMAVE. As was pointed out before, such integration has been widely applied in semiparametric model literature (Carroll et al. 1997;Härdle et al. 2004;Yu, Wu, and Zhang 2017;Wang and Cao 2018;Cai and Wang 2019). Differing from the existing procedures, we first obtain a consistent estimator of the conditional expectation E(Y | θ X = θ x) := p(θ x) given θ by using a local linear smoother, say p(θ x). We then apply MAVE to estimate the parametric components by treating ψ −1 ( p(θ X i )) as our continuous response. So we work on p(θ 0 X i ) instead of Y i with the MAVE principle. This synthesization avoids to use the quasilikelihood as the objective function. We, therefore, have closed forms of the estimators, and the implementation is substantially simplified; the MAVE idea can directly be applied. This synthesization can improve numerical performance for potential data sparseness in one or other populations because we have more distinctions p(θ 0 X i ) than the values, for example, for the binary output, 0 and 1 given by Y i , and makes the calculation of PLMAVE more stable even if data are sparse in a population.
The rest of article is organized as follows. Section 2 describes the estimation procedures. Section 3 states the main results. Section 4 presents the simulation results. Section 5 illustrates the utility of the proposed method by analyzing a real dataset from an online peer to peer lending platform of China. Technical details are included in the Appendices and the supplementary material.

Estimation Procedure
To guarantee the identifiability of the index parameter θ 0 , it is generally assumed that the first element of θ 0 is positive and ||θ 0 || = 1 (Yu and Ruppert 2002;Xia and Härdle 2006;Liang et al. 2010;Li, Wang, and Carroll 2010), where || · || is the Euclidean norm. However, this assumption may raise problems when the first element of θ 0 is small (close to zero). Its estimated values may be small but nonzero, which results in an uncertain sign, and causes the estimated direction to differ inversely from the true direction of θ 0 . To avoid this occurrence, we assume that the first absolute largest element of θ 0 is positive and ||θ 0 || = 1 throughout the article.
We now describe the PLMAVE estimation in a three-stage procedure.
The first stage (to obtain an initial estimator of θ 0 ) Consider multivariate local linear regression in the following nonparametric model.
Let F ξ |η (y|x) be the conditional probability distribution function of random vector ξ given η = x. Naturally, we want to use all the information of X that contributes to Y to estimate parameters. In a linear space, we look for θ 0 such that Y ⊥⊥ X|θ 0 X, that is, Y and X are conditionally independent given θ X, which we suppose throughout the article. From the result of Cook (1998), Y ⊥⊥ X|θ 0 X is equivalent to F Y|X = F Y|θ 0 X , and the following proposition holds.
Proposition 1. Suppose that X and Y belong in a probability space ( , , P). For any nonzero vector , that is, θ 0 is common for all x. θ 0 should be estimated with an aim to minimize the approximation errors with all possible X j 's. Therefore, we suggest estimating θ 0 by minimizing with respect to α j and b j to estimate p(x) and ∂p(x)/∂x, respectively, where ) is a trimming function introduced for technical purpose to handle the notorious boundary points, ρ(·) is a bounded function, which has bounded continuous secondorder derivatives on R, and satisfies ρ In practice, s 0 can be very small and thus, no observations are trimmed off. Let S(θ 0 ) be the linear space spanned by θ 0 and hereafter A ⊗2 = AA for any a matrix or vector A and f X is the density function of X. It is easy to see 0 = E{ρ(f X (X))(p (θ 0 X)) 2 }θ 0 θ 0 ∈ S(θ 0 ). So θ 0 is an eigenvector of 0 , which has only one nonzero eigenvalue. Later we shall prove that n converges to 0 (Lemma A.4). This fact motivates us to estimate θ 0 by an eigenvector of n . Let λ and θ ini be the largest nonzero eigenvalue of n and the associated standardized eigenvector; that is, θ ini = λ −1 n θ ini , which should converge to λ −1 0 θ 0 , where λ is the nonzero eigenvalue of 0 .
The second stage (to obtain estimators of g(θ 0 x) and g (θ 0 x)) After obtaining an initial estimator θ ini of θ 0 , and confirming θ ini → θ 0 (see Lemma A.4). We approximate p(θ 0 x) by , H is a density function, and θ is an estimator of θ 0 based on the samples, for example, θ ini obtained in Stage 1.
Unlike the MLE principle, we estimate the unknown θ 0 , g(θ 0 x) and g (θ 0 x) by using the "data" {(X i , Z θi )} n i=1 , to give consistent estimators. A natural idea is to solve the above problem based on the following regression model.
(2.5) However, the random sequence {Z θi } n i=1 is not independent, and so are the random errors {ε i } n i=1 . In addition, the direct use of (2.5) cannot gain dimension reduction. In order to overcome these difficulties, we explore an alternative in such a way. In the proof of Lemma A.2 below, we will see that, under some regularity conditions, p(θ x) → p(θ 0 x) for any x ∈ R p a.s. due to θ → θ 0 , and then ε i = o a.s. (1). Therefore, (2.5) can be rewritten as is a sequence of iid random errors.
These arguments indicate that except for infinitesimal information loss, model (2.5) can be approximated by the single can be approximated by a sequence of iid random variables except for an infinitesimal.
Note that Based on model (2.6), for given θ , we propose to estimate g(θ 0 X j ) by minimizing is the density function of θ 0 X. Again we useρ θj in (2.7) to avoidf θ X (x) being too small as n goes to infinity. Takingθ = θ ini , the estimators of g(θ 0 x) and g (θ 0 x) can be written by (2.8) The third stage (to update estimators of θ 0 ) This estimation procedure is similar to that of the minimum average (conditional) variance estimation method (Xia et al. 2002). Since θ 0 , g(θ 0 x), and g (θ 0 x) are unknown, we replace them by their estimates obtained in stage 2 and consider the following minimization. (2.9) The minimization problem (2.9) can be solved by fixing (âθ (X j ),dθ (X j )), j = 1, . . . , n . Estimates Technically, PLMAVE can be obtained by solving three quadratic programming problems that have simple analytic forms; that is, to implement the following estimating procedures/steps.
Let θ (t+1) be the final value of θ (t,τ ) . If the convergence criterion is met, stop; otherwise set t := t + 1 and go to Step 1.
In our numerical experiments, we used the quadratic kernel K(u) = 15/16(1 − u 2 ) 2 I (|u|<1) , and the following method to choose bandwidth h. As bandwidth h affects both bias and variance estimate, there is a tradeoff between suitable bias and variance estimate. Bias correction requires the choice of a relatively small bandwidth, whereas variance estimate needs a large value of bandwidth. In principle, bandwidth selection is data-driven, and traditional methods such as cross-validation approach may be applied to select a proper bandwidth h based on available data. However, as pointed out in Fan, Heckman, and Wand (1995), this approach could perform poorly in some settings with a large magnitude of sample variation produced; hence, it is not regarded as a sensible bandwidth selection rule for practical use. Instead, the plugging-in method may be a promising candidate for bandwidth selection. Fan, Heckman, and Wand (1995) discussed this approach to handle local polynomial regression under the framework of generalized linear models. Ruppert, Sheather, and Wand (1995) explored this method of bandwidth selection with local least squares regression.
Along with the same line, we may derive an optimal bandwidth based on the asymptotic weighted mean integrated squared error (AMISE) of g(u) An optimal bandwidth is given by h opt = C × n −1/5 , where C assumes a complex form depending on several other estimates.
Remark 1. Because we use the Euclidean distance to measure the error, the sign of the parameter estimates can be considered, and the distance between the estimated value and the true parameter can reflect the accuracy of estimation. This is different from what was used by Xia (2006), who assessed the angle of the directions of the estimated vector and the true parameter vector. Such a measurement may raise identifiability concerns when the true direction is positive, while the estimated direction is negative. Specifically, the difference could be 180 • in directions.

Asymptotic Properties
In this section, we state the conditions and present the main results with discussions. In what follows, let f ξ (·) be the density function of the random vector ξ , and f ξ 1 |ξ 2 (·) the conditional density function of the random vector ξ 1 given ξ 2 . The matrix norm is denoted by || · ||; that is, ||A|| = tr(AA ) for arbitrary matrix A, which reduces to the Euclidean norm if A is a vector. D X ⊂ R p is an interior support of the random vector X, that is, for any v ∈ D X , there exists δ > 0 such that inf {y:|y−v||<δ} f X (y) > 0. For any random matrices A n (z) and a n (z), We need the following conditions for our main results.

Condition 1. K(v)
is a density function, which can be represented as a function of v v for any v ∈ R p , and its Fourier transforms is absolutely integrable. The support of can be presented as a function of t 2 and V(a) H(u)u 2 du = 1. K(y) and H(t) satisfy the Lipschitz condition on V 0 (a 0 ) and V(a), respectively.
Condition 2. g(v) has four bounded continuous derivatives. ψ(t) is a bounded and strictly monotone function with four bounded continuous derivatives. The inverse function ψ −1 (t) is bounded and satisfies the Lipschitz condition.
Condition 3. D X is a compact set in R p , and E||X|| 10 < ∞, f X (x) has up to the third-order bounded, and continuous partial derivatives on

Condition 4. [Conditional Mean] The condition expectations
, have bounded and continuous third-order derivatives with respect to v on R.

Condition 5. [Bandwidths for consistency] Bandwidths
To establish the asymptotic theory of parameter estimation, the smoothness of the kernel functions k(·), H(·), and ψ(·) and g(·) is required and listed in Conditions 1 and 2. The strict monotonicity assumption of ψ(·) ensures the unique existence of the link function, as shown in Condition 2. The density function of covariate variable X, the existence of moment and conditional moments, and their smoothness assumption play a key role in the establishment of the asymptotic theory. These conditions are given in Conditions 3 and 4. Condition 5 gives a bandwidth condition for the validity of the asymptotic theory, and guarantees that b t is optimal. These assumptions are generally imposed in the related literature like Xia (2006).
Theorem 1. Conditions 1-5 are satisfied, andθ 0 θ 0θ 0 is reversible. Let the bandwidth and the estimator of θ 0 in the final iteration beb andθ 0 , respectively. Choose the bandwidth b t such that δ θ (t) in the tth iteration, t ≥ 1. Then, the final estimatorθ 0 is consistent and has the following representation The first termb 4 + δ 1bb + δ 2 1b /b in (3.1) indicates the bias order, and the second term stands for the variance. If taking δ 1b = (log n/n) 1/2 andb = n −1/5 , we can easily verify that Similar results have been established for estimation of dimensional reduction directions (Theorem 3.1, Xia 2007).
In principle, we can establish confidence intervals or make inferences for the index parameter θ based on the asymptotic results of Theorem 1. Taking this strategy means that we need to estimate θ 0 and 0 , which contain several unknown quantities to be estimated. Given these complexities, it is intuitively better and easier to use the bootstrap to construct confidence intervals or make inferences (Liang et al. 2004).
It is worth mentioning that we can also establish the asymptotic normality for g(u) as follows.
Proposition 2. Under the conditions of Theorem 1, we have
To assess estimation accuracy and efficiency, we define our estimated error as where θ jk is the jth estimated value of the kth element of θ 0 , and calculate the error defined by Xia et al. (2002) and Xia and Härdle (2006); that is, to measure the error between the true parameter and the estimated values, in which θ j and θ 0 are standardized; that is, ||θ 0 || = 1 and ||θ j || = 1.
Example 1. We generated data from each of the following models: where x 1 and x 2 are independent and uniformly distributed on A model similar to (4.1) and (4.2) with the identity link function has been investigated by Xia and Härdle (2006) and Härdle, Hall, and Ichimura (1993). For their setting, Xia and Härdle claimed that their MAVE approach performs best. It is, therefore, of interest to compare PLMAVE with MAVE, logistic, probit regression. We also compare PLMAVE with the method of quasi-likelihood-based polynomial spline smoothing (Spline) proposed by Wang and Cao (2018), which has been claimed to be asymptotically efficient, numerically stable, and computationally expedient. In implementing PLMAVE, we chose logit as our link function. It is noteworthy that the parameter vector components are equal, whereas the current models have more complex link functions. For the Spline method, we select the knot using Wang and Cao's formula (5.3). Table 1 presents the results for models (4.1) and (4.2), respectively. These results show that PLMAVE yields the most accurate estimates and the errors become smaller as the sample size gets larger. Compared to PLMAVE, the estimated errors based on logistic, probit regression are huge. This may not be surprising because both logistic and probit regressions naturally treat the systematic parts as a linear combination of covariates, while the models have a quadratic form. The results based on MAVE are also substantially inferior to those based on PLMAVE. Part of the reasons is that MAVE can estimate the direction of the parameter components but ignore their directions. That is, MAVE treats two vectors with inverse directions equally. Consequently, the average of estimated values of an individual component, which may be positive or negative, becomes small and results in big errors, even though the right direction has been caught. Therefore, it is not fair to compare PLMAVE and MAVE generally. We will not present the results of error 0 based on MAVE in the following numerical experiments for this reason.
The small-sample performance of the Spline method is also inferior to that of PLMAVE, but its large-sample performance is outstanding and comparable with that of PLMAVE. It's worth mentioning that the error 1 values via the four methods are very small and very close. However, the estimated values obtained via logit and probit may not converge to the true parameter values because error 1 measures only the angle of two vectors, which may be parallel but far away. Additionally, error 1 =sin 2 ( θ , θ 0 ) may be large, yet error 0 is smaller. Such an occurrence can also be observed from Tables 2 and 3. Therefore, even thoughθ and θ 0 tend to be in the same direction, it is difficult, if not impossible, to assess how close these two vectors are using error 1 . The following experiments also observe the similar findings. Given the estimated curve p(θ x), we can also compare it with the true curve p(θ 0 x). Such a comparison is presented in Figure 1. Observing the figure, the estimated curves based on logistic and probit regressions largely deviate from the true curve. This may not be surprised as the misspecification problem. In regard to MAVE, the estimated curve of the nonparametric function closes to the true curve when the estimated direction is the same as the truth. In contrast, the estimated curve can reverse when the estimated direction is opposite to the truth. See Figure 1(a, c, e, and f). Overall, PLMAVE is superior to its competitors in all configurations in the estimation of the parameters and nonparametric functions.
Example 2. We generated data from the following single-index models.
In this example, we set the first element of the parameter vector to be zero on purpose. Since its estimated values can be positive or negative, under the assumption that the first element of θ 0 is positive (Yu and Ruppert 2002;Xia and Härdle 2006;Liang et al. 2010;Li, Wang, and Carroll 2010), there is no indication to classify its direction whether the estimation of the first element is negative. Even if the estimate is positive, it is possible to go in the opposite direction. PLMAVE can avoid such an uncertain problem. Table 2 presents the estimated results. The numerical performance of PLMAVE is still very promising. Even in very small sample sizes like n = 50, its estimated error gets to 0.0254. In contrast, the performance of logistic regression for (4.3) and loglinear for (4.4) is poor, though the errors rapidly and monotonously decrease as n increases. Their performance is still much worse than that of PLMAVE. The Spline method is comparable with PLMAVE except the convergence problems when sample size is 50. The superiority of PLMAVE is further confirmed by the results of the estimators of the nonparametric function presented in Figure 2, from which the same conclusions and observations as for Figure 1 can be drawn.
Example 3. We generated data from the following generalized linear models. In this example, the 4th element of the parameter vector is zero. The models are traditional generalized linear models. However, the designed setting brings us rare events, for example, the samples "y = 1" from the logit regression model (4.5) appear to be less than 11%.   Table 3 presents the estimated results. Again, PLMAVE consistently outperforms its competitors with significant superiority no matter small, moderate, or large sample sizes except the Spline for large sample sizes. It is worth mentioning that logistic and loglinear regression performs terribly in small sample sizes. Based on our explorations, such a deficit is mainly due to convergence problems. These findings further claim the deficiency of MLE for GLM with rare events, as pointed out in the Introduction section and noted in the literature.
Similar to Figures 1 and 2, the fitting curve based on MAVE has large fluctuations and may be opposite to the direction of the true curve. Use of logistic regression and loglinear regression to estimate the parameters of models (4.5) and (4.6), respectively, the estimated curves are supposed to be close to the true curves since there is no misspecification concern. This is true and is confirmed in Figure 3(b, c, and f). However, when the sample size is small (n = 50), the corresponding estimated curves based on the true models are still far away from the true curves, even it is difficult to display them simultaneously with other curves in Figure 3(a, d, and e). On the other hand, the estimated curve via PLMAVE is close to the true curve even for n = 50, though its estimated curves for n = 500 are not as close to the true curve as the estimated curves based on the logistic or loglinear regression.
A referee suggested to compare the proposed method with the existing estimation methods for generalized partially linear single index models. At the referee's request, we conducted a simulation study based on model (4.8), which had been studied by Wang and Cao (2018). A similar model with the identity link function has been explored in the literature (Carroll et al. 1997;Xia and Härdle 2006). We present the results (with additional results for bias, standard error, and mean squared error of the parameter estimates in the supplementary materials) as follows.  Model (4.8) was analyzed by Wang and Cao (2018). We added model (4.7) here because the focus of our article is on model (4.7) instead of model (4.8), which contains an additional "linear term" βz, and should be called generalized partially linear single index models. Table 4 suggests that the error 0 and error 1 based on PLMAVE are smaller than those based on MAVE and Spline for n = 200, and comparable for n = 500, 1000. Even The performance of PLMAVE is still promising in generalized partially linear single index models, following by Spline and MAVE.

A Real Data Example
This section gives an empirical application of the proposed model and method via analyzing a dataset from the peer-to-peer lending platform of China (https://www.wdzj.com), National Internet Finance Association of China, (https://www.p2peye. com). Online peer-to-peer (P2P) lending platform originated in the United Kingdom; it enables individuals to borrow and lend money directly from one to another and has drastically changed how individual lenders and borrowers meet and transact. Driven by the global financial innovation wave, many P2P platforms have emerged in China since 2007 and have attracted many investors and borrowers. With the rapid growth of the number of P2P platforms in China and the increasing scale of transactions, the risks of default and bankruptcy have also been increasing. From 2007 to the end of 2018, 5409 of 6430 Chinese P2P platforms have become bankrupt because they could not withdraw cash or their bosses absconded with money or were forced to shut down by the government, which has caused tremendous social problems, like vicious mass incidents and a seriously damaged financial environment. It is crucial to evaluate the bankruptcy risk of P2P platforms. Existing literature mainly studied the information asymmetry between borrowers and lenders (Lin, Prabhala, and Viswanathan 2013;Liu, Brass, and Lu 2015;Jiang, Ho, and Yan 2018). To date, little attention has been paid to the bankruptcy risk of P2P platforms in China.
This section examines a dataset with 1985 samples collected from the Chinese P2P market by applying PLMAVE, spline method (Wang and Cao 2018), logistic, and probit regression. We are interested in the relationship between bankruptcy risk and specific covariates. Let Y be the indicator of bankruptcy (Y = 1) or nonbankruptcy (Y = 0), respectively, x 1 the loan interest rate, x 2 the registered capital of a platform, x 3 the operating time before a platform is bankrupt, and x 4 running time of a P2P platform. The upper panel of Table 5 presents the estimated results, whereθ is the estimates of the unknown parameter θ 0 = (θ 01 , . . . , θ 04 ) . To further investigate whether these factors are significant, we apply the bootstrap principle to study the hypothesis: H 0 : θ 0j = 0 versus H a : θ 0j = 0, j = 1, . . . , 4. (5.1) The basic idea works as follows. Take m = 1000 bootstrap independent identically distributed samples From rth resampling dataset, the unknown parameter θ 0 can be estimated using the logit regression, probit regression, and PLMAVE, denoted by θ * r for 1 ≤ r ≤ m. Calculate θ * r −θ , 1 ≤ r ≤ m and obtain their order statistics θ * (r) −θ , 1 ≤ r ≤ m. We can construct the bias-corrected bootstrap confidence intervals based on MacKinnon (2010) and Theorem 1. That is, we obtain a bias-corrected estimate of θ by replacing θ by where θ * is the sample mean of the bootstrap-based θ * r . In this correction, we use the difference between θ * and θ to estimate the bias, and then we eliminate bias by subtracting the estimated bias.
To explore whether any particular observations significantly affect the proposed method, we also randomly select 1785 observations in each resampling process to obtain a bootstrap-based θ * r , and we bootstrap sample size m = 1000. The parameter estimates and the associated significance via bootstrapping confidence intervals are presented in the lower panel of Table 5.
The results in Table 5 indicate that the estimated coefficients based on PLMAVE under different configurations (different link functions and differently drawn samples) and Wang and Cao's (2018) spline method have the same sign and similar values, while there is a remarkable difference among the estimated coefficients based on the logit and probit regression along with different configurations. Table 5 shows that the estimated coefficients based on PLMAVE with the three different link functions and Wang and Cao's (2018) spline method are significant at level 1% for all cases (n = 1985, 1785). These results indicate that the interest rate and the operating time before the platform is bankrupt are positively correlated with the bankruptcy risk; and are also the most critical factors impacting the platform bankruptcy. Both the registered capital and the running time of the P2P platform negatively correlated with the risk, and the impact of the registered capital on the bankruptcy risk is weak.
On the other hand, the estimated coefficients based on the logistic and probit regression show that none of the four factors is significant. Further exploration reveals that we have more than 99% of estimates of P(Y i = 1| θ 0 X i ), 1 ≤ i ≤ n being 1 or 0 in the logistic and probit regressions; that is, more than 99% of the estimates of conditional probabilityp(θ 0 X i ) cluster around 1 or 0. This indicates the appearance of sparseness and may cause convergence problems. Furthermore, observing the estimated conditional probability curves based on PLMAVE presented in Figure 4 (middle and right panels), we notice that the three curves give a consistent monotonously nonlinear pattern along with different link functions. These observations may indicate that logit or probit regression is not a proper choice for analyzing this dataset.
We now evaluate the prediction performance of the proposed PLMAVE, the spline-based method of Wang and Cao (2018), and the logistic regression. We randomly sample 1785 observations without replacement, and repeat 1000 times. For each replication, we use these samples as the training dataset and the remaining samples as the testing dataset. We apply PLMAVE with logit link function on the training dataset to build up a model, which is used to predict for the testing dataset. We calculate true negative rate (TNR), true positive rate (TPR), the receiver operating characteristic (ROC) curve, which depicts the TPR against the false positive rate (FPR) at various threshold  settings, and area under the ROC curve (AUC), respectively.
The average values based on 1000 repeats are presented in Table 6. The results indicate that the performance of PLMAVE (logit link function) and the Spline method is comparable, and much superior to that of the logistic regression. The TNR based on the logistic regression is only 0.2185, though its TPR reaches to 0.9507. This may further demonstrate the advantage of PLMAVE.

Discussions
In this article, we study generalized single-index models in a very general setting. We integrate profile likelihood principle and MAVE to estimate the index parameters as well as the unknown function encumbered in the link function after we create synthesized "data. " The proposed estimators of the indexparameters are asymptotically normal. Both simulation studies and empirical data analysis show the proposed estimation methods work well in finite sample sizes. It is worth emphasizing that the proposed procedure can be easily implemented and remedies bad performance of MLE when data are sparse in one or other populations. This advantage is worth advocating and promoting since convergence warning messages are not unusually disseminated with modeling GLM regression in welldeveloped software like R, SAS, and Matlab. The proposed method may provide a valuable alternative in such situations. PLMAVE procedure can be applied to the logit and probit link function, and other cumulative probability distribution functions like exponential distribution function, F distribution function, Chi-squared and lognormal distribution functions. The theoretical results should be able to be established under similar assumptions. Guo, Zhou, and Ma (2021) recently proposed the following semiparametric logistic single-index coefficient model to estimate the heterogeneous treatment effect and select individualized treatment; that is, where g 1 (·) and g 2 (·) are two unknown functions, X stands for the baseline covariates, Z = 0, 1 denotes a binary treatment. The proposed method can be further applied to their model to estimate β 1 , β 2 , g 1 (·), and g 2 (·).
It would appear possible in principle at least to extend the proposed method to quantile regression for the studied generalized single-index models in the line of the major work on the single-index model done by Ma and He (2016). Our approach may be modified by taking the following two steps: (i) keeping the first and second stages the same to obtain an initial estimator of (β 1 , β 2 ) and estimators of g 1 , g 2 and their derivatives; (ii) replacing the weighted least square by the quantile in the third stage for quantile regression. There are closed forms in (i) yet in (ii), which would appear to be the major difficulty in extending our approach. Following Ma and He (2016), we would expect to obtain similar results to theirs. The detailed investigation of these issues is interesting, but beyond the scope of this article.
The proposed method may be extended to generalized partially linear single-index models (Carroll et al. 1997) without considerable additional difficulties. Recently, Cai and Wang (2019) and Chen et al. (2015) proposed a smoothed GEE in PLSiM for sparse and dense longitudinal data. Incorporating the proposed method into a longitudinal or functional structure would be interesting, though such incorporation may need extra effort.

A.1. Preliminary Lemmas
We present several preliminary lemmas first, whose proofs are included in the supplementary materials. Lemmas A.1-A.6 will be used to prove the remaining lemmas and the main results. Lemmas A.7 and A.8 are used to prove Theorem 1.
Because the proof of the main results is lengthy, we divide the proof into several steps. The key idea is to decompose θ − θ 0 into two terms, which will be handled in Lemmas A.7 and A.8, respectively.
Lemma A.1 establishes bounds for biases and weighted errors of multivariate kernel regression under various settings, which may appear in the subsequent lemmas and the proof of Theorem 1. Lemma A.2 plays a similar role to Lemma A.1 but focuses on the univariate kernel function. Lemma A.3 establishes the bias for nonparametric function estimation in the initial stage. This lemma is used to prove Lemma A.4, which builds the convergence rates of the estimators for the largest eigenvalue and eigenvectors. Lemmas A.5 and A.6 built based on Lemmas A.2 and A.4 are used to prove Lemmas A.7 and A.8. In the proof of Lemmas A.1-A.2, their settings can be formulated as specific empirical processes indexed by various classes, and the exponential probability inequality can be applied to establish these lemmas, though these formulations are not trial. Therefore, the inequality presented in Appendix A.3 is the cornerstone in establishing the theoretical derivation.
Lemma A.1. Suppose that K(·) satisfies Condition 1, f X has bounded, continuous partial derivatives up to second order and D X is compact. If h n ≥ cn −2/5 and h n = o(1), there exists a positive constant L such that , and δ ph is defined in Condition 5.
It is worth mentioning that, in the iterative calculation from Step 1 to Step 6, θ n as estimators of θ 0 , are a function of the samples. For simplicity, we simply write them as θ . The proofs will frequently apply an exponential inequality (See Appendix A.3) on function classes of the involved empirical processes, which we introduce now. Write v iθxb = θ X ix /b, v uθxb = θ (u − x)/b, where u, x, θ ∈ R p and A is a set, Lemma A.2. Suppose that Conditions 2 and 3 hold, and H(u) satisfies Condition 1. If b n > n −2/5 and b n = o(1), n → ∞ and δ θ n = O(b n ), then . . , X nx /h) , I n = (1, . . . , 1) ∈ R n , 1 ≤ i, j ≤ n. It is easy to obtain the solutions of (2.2) that can be written as Lemma A.4. Suppose that λ n is the largest eigenvalue of n in (2.3) and η n is the associated eigenvector of λ n with ||η n || = 1. Choose θ n = η 1st n . If the conditions of Lemma A.3 are satisfied, then where r ph = h 2 + δ ph and η 1st n are defined as before.
Lemma A.7. If the conditions of Lemma A.6 are satisfied, and B θ 0 is invertible, then Lemma A.8. If the conditions of Lemma A.7 are satisfied, then where ψ nθ 0 and 0 are defined in Theorem 1.

Supplementary Materials
The supplemental material contains the proofs of Lemmas A.1-A.8 and additional simulation studies in Section 4.