Corporate Probability of Default: A Single-Index Hazard Model Approach

Abstract Corporate probability of default (PD) prediction is vitally important for risk management and asset pricing. In search of accurate PD prediction, we propose a flexible yet easy-to-interpret default-prediction single-index hazard model (DSI). By applying it to a comprehensive U.S. corporate bankruptcy database we constructed, we discover an interesting V-shaped relationship, indicating a violation of the common linear hazard specification. Most importantly, the single-index hazard model passes the Hosmer-Lemeshow goodness-of-fit calibration test while neither does a state-of-the-art linear hazard model in finance nor a parametric class of Box-Cox transformation survival models. In an economic value analysis, we find that this may translate to as much as three times of profit compared to the linear hazard model. In model estimation, we adopt a penalized-spline approximation for the unknown function and propose an efficient algorithm. With a diverging number of spline knots, we establish consistency and asymptotic theories for the penalized-spline likelihood estimators. Furthermore, we reexamine the distress risk anomaly, that is, higher financially distressed stocks deliver anomalously lower excess returns. Based on the PDs from the proposed single-index hazard model, we find that the distress risk anomaly has weakened or even disappeared during the extended period.


Introduction
The corporate probability of default (PD) plays a crucial role in risk management and asset pricing (Altman 1968;Shumway 2001;Vassalou and Xing 2004;Campbell, Hilscher, and Szilagyi 2008;Ding et al. 2012;Tian, Yu, and Guo 2015).It is often a key metric for credit rating agencies, such as Moody's and Standard & Poor's, to deliver risk assessment for investors.The PDs are also tied to credit spread and loan rate calculations for corporate bonds.In response to 2008 financial crisis, the Basel Committee on Banking Supervision developed an international regulatory framework for banks, Basel III, under which the PD is an essential input to calculate banking's capital and liquidity.Indeed, the PD is a core risk parameter to conduct stress testing and comprehensive capital analysis and review (CCAR), overseen by the Federal Reserve regularly.A dramatic increase in corporate bankruptcies has already been witnessed since the beginning of the global COVID-19 pandemic in 2020.In the United States alone, there are more than 340 companies filed for bankruptcy including big names such as Hertz and J.C. Penney, citing financial distresses due to the COVID-19 (Scigliuzzo et al. 2020).The magnitude of bankruptcies, in terms of assets, has far surpassed the year of 2008, suggesting an unprecedented financial distress (Shen 2020).In the face of such large levels of financial stress, closely monitored stress testing and CCAR attain a vital importance that goes beyond the needs to meet the area to the left of the shaded window, which corresponds to the observations whose PDs fall below the 85 percentile based on CHS model.As the percentiles get larger, the three models deliver noticeably different PDs.In the shaded window, which corresponds to 85-99 percentiles in the PDs predicted by CHS, the predicted PDs from our proposed DSI model are the largest, followed by those from the DTM and then from CHS.To the right of the shaded window is the most interesting top one percentile of predicted PDs, where the order is completely reversed and the discrepancy becomes even more substantial.These interesting findings may suggest that for a majority of small predicted PDs at a given time point, the cash reserves based on the predicted PDs from CHS are similar to those from DSI and DTM.But for the highest PD estimates from the top one percentile, the cash reserves from the CHS model would potentially be calculated overly conservatively compared to those from the other two models.
For corporate bankruptcy prediction, a desirable model is usually evaluated in two dimensions: discrimination and calibration performance.While discrimination assesses models' ability to discriminate two dichotomous events, calibration evaluates the agreement between the predicted probabilities and the actual proportions of the event occurrence.Given the crucial role of PD estimates in capital requirement calculation under BASEL III, a model with good calibration performance is essential.In this work, we adopt Hosmer-Lemeshow (HL) goodness-of-fit test (Hosmer Jr, Lemeshow, and Sturdivant 2013) to assess models' calibration power.To the best of our knowledge, from our limited empirical analysis, we find that our proposed default-prediction single-index hazard model is the only model in the literature that passes the HL test for both in-sample and out-of-sample predictions across multiple time periods regardless of whether the years of 2008 financial crisis are included or not. 3 Our empirical results also demonstrate a superior discriminatory power of the proposed DSI model.
Additionally, we assess the economic value of different PD prediction models through a business lending practice, where different lenders use different PD prediction models for credit scoring and calculate credit spread for each borrower.Companies with the highest default risk are rejected.For the remaining companies, lenders offer competitive price based on their PD prediction model, and borrowers choose the lender with the best offer.Under a fixed loan market, profit of each lender can be calculated given prespecified values of loss given default (LGD) and credit spread of the highest quality loan, hence, the economic value of different PD prediction models.Our empirical results show that the lender who adopts the proposed DSI consistently generates the highest profit across out-of-sample periods, where the profit may be as much as three times comparing to the lender using CHS model.
The discrepancy observed from Figure 1 may be partly due to the potential violation of the linear assumption in the widely-regarded state-of-the-art CHS or Shumway's linear hazard model in finance.Nielsen et al. (1998) noted that the covariate effect is rarely specified precisely by a parametric model in many applications.In fact, based on the annual data of 17,862 publicly traded firms in the United States from 1980 to 2016, our empirical analysis finds strong evidence suggesting that the linear specification may be severely violated.More specifically, a V-shaped functional relationship (see Figure 2 and more details in empirical results) is unveiled by applying our proposed DSI model that has the following form, log h(t j |x i,t j−1 ) where α j is the baseline hazard at time point t j and η(•) is a flexible univariate function.With a penalized spline estimation, this flexible function turns out to be the aforementioned V shape for the corporate bankruptcy database we construct.The linear projection β T x i,t j−1 is the so-called single index, which maps x i,t j−1 , company i's financial characteristics in previous year t j−1 , to a univariate index.The hazard function, h(t j |x i,t j−1 ) is interpreted as the probability of default of company i in year t j given its financial characteristics observed in the previous year, that is, Pr(T = t j |T ≥ t j , X = x i,t j−1 ).  1) is that the flexible univariate function η(•) can capture potential nonlinear relationship, while the single-index coefficients β preserves some model interpretability.A nice by-product is that the single index may be itself of interest.In the bankruptcy application, it may yield an interesting "financial default index" for practitioners.To estimate model (1), we adopt a penalized-spline approximation for its computational stability (Yu and Ruppert 2002;Ruppert, Wand, and Carroll 2003;and references therein).Penalized splines can be viewed as a generalization of smoothing splines, and the penalty function helps to prevent overfitting.Based on a recent theoretical development by Huang and Su (2021), we establish a stochastic bound for the estimated function η together with the parametric components with a diverging number of knots.Asymptotic normality is shown for the single-index coefficients with the optimal √ n order.To the best of our knowledge, this is the first work that establishes the asymptotic results for penalized-spline likelihood estimators with a diverging number of knots under the semiparametric single-index hazard model framework.
The proposed default-prediction single-index hazard model for corporate PD prediction can be viewed as a discrete counterpart of the well-studied continuous single-index hazard models in survival analysis.To mention a few, Huang and Liu (2006) developed the single-index proportional hazards model using polynomial splines at pre-specified fixed knots.Lu et al. (2006) studied the partially linear single-index proportional hazard model using local linear fit adopting an algorithm similar to Carroll et al. (1997).Wang (2004) allowed some covariates to be time-dependent with local partial likelihood estimation along with some missing data imputation.
However, models studied in aforementioned works and other continuous survival literature cannot be directly applied to model corporate default probability.This is mainly because the use of calendar time in our study, while a common time origin is applied to all individuals under the framework of continuous survival analysis.In corporate default prediction problem, it is necessary to use actual calendar time because companies with the same financial characteristics still have different probability of default at different calendar time due to varying macroeconomic conditions (Ding et al. 2012).Therefore, companies entering the database at different time (depending on their initial public offering (IPO) schedule) should not share the common starting point.With the calendar time, the discretetime hazards model enjoys the "memoryless" feature that the conditional PD only depends on the latest observation, rather than the entire trajectory of covariate vectors in continuous survival models.In fact, the likelihood function of continuous survival models would be ill-defined for corporate bankruptcy studies as the cumulative conditional hazard integrates the entire trajectory of the time-varying covariates x(t), many of which are clearly not available for the bankruptcy data.
We summarize our contributions as following.First, our work contributes to the literature by unveiling an interesting V-shaped functional relationship (see Figure 2).Such a nonmonotone V shape contradicts to the common linearity assumption in the widely used CHS model or Shumway's model.We propose a default-prediction single-index hazard model with a flexible function η(•) along with a nice by-product of "financial default index" and apply it to a comprehensive corporate bankruptcy database we construct.
Second, we empirically show that the proposed DSI model achieves superior prediction performance in both calibration and discriminatory power, compared to benchmark models.
To the best of our knowledge, for the U.S. publicly traded companies the proposed DSI model is the only model that passes the Hosmer-Lemeshow goodness-of-fit test (Hosmer Jr, Lemeshow, and Sturdivant 2013) for both in-sample and outof-sample prediction with various prediction periods, including 2008 financial crises.This suggests that the proposed model is able to recover the PDs more accurately, which may possibly lead to important implications in economical capital reserve calculation in risk management.In addition to the calibration test, the proposed default-prediction single-index hazard model also yields superior prediction accuracy with respect to discrimination.We further assess the economic value of different PD prediction model through a lending practice under a fixed loan market.The lender who adopts the proposed DSI model for credit scoring consistently generates the highest profit throughout the entire out-of-sample prediction period.The highest economic value generated by DSI model again provides evidence of its superior performance in PD prediction.
Third, we revisit a vital asset pricing implication by applying our proposed default-prediction single-index hazard model to an extended sample period including 2008 financial crisis.The probability of default has also been directly linked to the asset pricing literature.Fama and French (1996) conjecture that the investors demand a positive premium to bear the distress risk.Instead, we find that the return premium associated with higher distress risk is negative.Specifically, the firms with higher distress risk earn lower excess return after controlling the common three factors of Fama and French (1996) or five-factors of Fama and French (2015) for asset pricing.This finding is inconsistent with the original conjecture by Fama and French (1996) but consistent with Campbell, Hilscher, and Szilagyi (2008), Ding et al. (2012), andGao, Parsons, andShen (2018).However, unlike documented in these studies, our study shows that the negative distress return anomaly, that is, higher financially distressed stocks deliver anomalously significant lower excess returns, has weakened and even disappeared in this extended sample period including 2008 financial crisis.
Last but not least, our work contributes to the statistical and econometric literature by establishing the asymptotic results for the penalized-spline likelihood estimators of the semiparametric single-index hazard models.The rest of this article is organized as following: Section 2 describes the corporate bankruptcy database we have constructed and used for our empirical study.Section 3 introduces more details of the default-prediction single-index hazard model along with an estimation algorithm and large sample property.The empirical results are elaborated in Section 4, followed by a simulation study mimicking the real data in Section 5.An asset pricing implication is investigated in Section 6.We conclude the article in Section 7. Online supplementary materials include additional tables and empirical results, technical details, and proofs of theorems.

Data
In this article, we construct a bankruptcy database consisting of all the U.S. publicly traded firms listed on the New York Stock Exchange, American Stock Exchange, and NASDAQ from 1980  (WRDS).In particular, we merge a company's quarterly updated accounting information from the COMPUSTAT database with its monthly trading data from the CRSP database.We carefully align the company's fiscal year to the calendar year and also lag all the annual accounting information by four months to ensure that the accounting information is available to the market at the time of prediction (Shumway 2001;Chava and Jarrow 2004).As a result, our constructed database consists of 189,037 firm-year observations and 1589 bankruptcy events during the sample period.A detailed frequency table is shown in Table S1 of supplementary materials.
For the firm-specific covariates, we follow the formation of Campbell, Hilscher, and Szilagyi (2008) and use the marketvalued total asset (MTA) rather than the book value to construct eight exploratory variables, LTMTA, NIMTA, CASHMTA, MBE, RSIZE, EXRET, SIGMA, and PRICE.In specific, LTMTA is defined as total liability over market-valued total assets.NIMTA is the ratio of net income to market valued total assets.CASHMTA is constructed by dividing cash and short-term assets by the market value of total assets.We obtain RSIZE as the logarithm of firm's relative size to the S&P 500 index value and EXRET as the annual log excess return against the S&P 500 index return.SIGMA is the volatility of firm stock returns in the past 3 months.Following Campbell, Hilscher, and Szilagyi (2008), we take the log of stock price as the PRICE variable and also adopt the 10% of the difference between market and book equity to the book value of total assets to construct the market-to-book ratio, MBE.In addition, we also winsorize all predictors at the 1st and 99th percentile in order to reduce the influences by extremely values.Table 1 provides the detailed summary statistics.One observation from Table 1 is that the bankruptcy group has quite different financial characteristics 4 Due to substantial delays of bankruptcy disputes as well as some delays in default status updates in COMPUSTAT database, following the literature, we end the sampling period of this study in the year 2016 to avoid inaccurate records.
from the nonbankruptcy group.The former tends to have higher debt and liabilities relative to their assets, smaller size in terms of their asset values and market capitalization, weaker profitability, and lower realized stock returns than that of the latter.It is also clear that bankruptcy firms are usually more volatile.The average market return volatility for the bankruptcy group is 1.170, while the volatility is only 0.607 for the nonbankruptcy group.The bankruptcy group also has a lower average trading log-price of 0.467, comparing to 2.268 for the nonbankruptcy group.

Default-Prediction Single-index Hazard Model
Let n be the total number of firms and x i,t j be the d-dimensional time-dependent covariate vector denoting company i's specific financial characteristics observed at time t, where t = t 1 , . . ., t j , . . ., t J are the J fixed discrete observation time in the whole sample period.For our annual data, these are end of year observations.Let A i denote the starting time and D i denote the end time that company i, i = 1, 2, . . ., n, is first and last observed in the database, respectively, during the sample period.Denote δ i the censoring indicator, where δ i = 1 if the ith company files bankruptcy at t = D i during the sample period; and δ i = 0 otherwise.A company may enter the database at different time depending on their initial public offering (IPO) schedule.A company may also exit the database at different time due to its bankruptcy status as well as other delisting reasons.In particular, for healthy companies whose IPO dates are prior to the sample period, their starting date A i = t 1 will be the same as our starting year 1980 in the sample period.For companies with IPO dates later than 1981, then their starting date A i is their first public trading year.For example, the initial public offering year of Amazon is 1997 at a price of $18 per share.The starting date for "Amazon" is the year of 1997, where A i > t 1 .This kind of different starting time in bankruptcy prediction is very different from the traditional survival analysis.On the other hand, the end time D i is subject to right censoring at the end of the sample period.If a company files for bankruptcy after the end of sampling period, then D i = t J .A healthy company may also exit the database through other delisting reasons such as merger and acquisition.For example, Bank One corporation merged with JPMorgan Chase & Co. in 2004.Here for Bank One, the end year D i = 2004 but the censoring indicator δ i = 0.A healthy company such as Bank One can exit from the database earlier than the end of sample period, through, for example, merger or acquisition, where D i < t J but δ i = 0.
Let T be the random variable of the calendar year when bankruptcy is filed.As companies do not share the same starting point, T is different from the survival time in the traditional survival analysis.Let or simply p i,t j = h(t j |x i,t j−1 ) be the conditional probability that company i files for bankruptcy at time t j given it has survived past time t j−1 .Our single-index hazard model (1) for corporate PD prediction can be rewritten as For model identifiability, we follow Yu and Ruppert (2002) to impose the constraints that the single-index parameter β 2 = 1 and the first element β 1 > 0.
The log-likelihood function of model ( 2) takes form where δ i,j = δ i I{D i = t j }.The mathematical derivation of (3) can be found in supplementary materials (Section B).The memoryless property of model ( 1) or (2) can also be seen from such mathematical derivation.One of the keys in estimating model ( 2) is to obtain a consistent estimate of the flexible function η(

•).
In what follows, we discuss penalized spline estimation in detail and establish its asymptotic properties.

Penalized Spline Estimation
Penalized splines or P-splines can be considered as a generalization of smoothing splines allowing a flexible choice of knots and penalty (see Ruppert, Wand, and Carroll 2003 for a review).Yu and Ruppert (2002) showed that P-splines approach to single-index models is advantageous over other approaches such as local methods (Carroll et al. 1997).Unlike careful knot-placement in regression splines, an appealing feature of Pspines estimation is that the smoothness is tuned by a single penalty parameter λ as in smoothing splines (e.g., Huang and Liu 2006).We illustrate P-splines using the truncated power basis for simplicity.Other bases such as B-splines can be easily adopted.The truncated power basis is defined as , where q is the polynomial degree, and v 1 , . . ., v K are K interior knots.The truncation function and 0 otherwise.Popular ways to place spline knots are equally-spaced or equally at sample quantiles.Any function η(u) with q − 1 continuous derivatives can be approximated by where γ = (γ 1 , . . ., γ q+K ) is the q + K-dimensional spline coefficient column vector.Note that the intercept term 1 in the spline basis B(u) as well as the corresponding constant spline coefficient term γ 0 are omitted due to the baseline constant α.
Assume that η(u) is defined on the interval [a, b], then the penalized-spline log-likelihood can be written as where 2 du is a general form of the penalty term, η (m) is the mth derivative of η for m ≥ 2, and λ ≥ 0 is a roughness penalty parameter that can be chosen by generalized cross-validation.For m = 2, the penalty can be expressed as [η (u)] 2 du = γ T Pγ , where P is symmetric and positive semidefinite.

Asymptotic Theories
To present asymptotic theories, we first introduce a few notations.Let ζ be the unconstrained parameter vector after reparameterization of β and J(ζ ) = ∂β/∂ζ the Jacobian transformation matrix.Denote J(ζ 0 ) by J 0 , where ζ 0 is the true value of ζ .For ease of notation, we replace t j with j and let is the negative log-likelihood for individual i at time j.We write the number of splines basis q+K simply as K for simplicity since K diverges with n.Without loss of generality, we consider interval [0, 1] for [a, b].Further, we define the "projection" We also need the following regularity conditions to establish our results. and are positive-definite matrices, where a ⊗2 = aa T for any (column) vector a. (A5) The splines degree q satisfies q ≥ max{p − 1, m}.
Remark 1. Assumption (A1) specifies the true model.For assumption (A2), boundedness of predictors is usually assumed for spline-based estimation, since spline functions are typically constructed on a finite closed interval.(A3) assumes the smoothness of the true function.The matrices in (A4) are related to the asymptotic covariance matrix of the index parameter.Some projection similar to that defined in ( 5) is often used in separametric models, which represents the effect of the nonparametric part on the parametric part.If J = 1, η takes the form of conditional expectation as in Liang et al. (2010).For the general case, there seems to be no simple characterization of η.
The following result shows consistency of penalized-spline likelihood estimators of the unknown function η, the baselines α, and single-index coefficients β.
Theorem 1.Under assumptions (A1)-(A5), and that 1), there exists a local maximizer of the penalized-spline (log-)likelihood (4) that satisfies Remark 2. The first two terms in the rate correspond to squared bias and the third is the variance term.To see that the rate obtained here can produce the optimal rate, we can choose K , which is referred to as "heavy penalization." The result below establishes asymptotic normality for the single-index coefficients β.
Proofs of both Theorems 1 and 2 are given in supplementary materials.

Algorithm
We propose an efficient iterative algorithm to maximize the penalized-spline log-likelihood function (4).The solution can be decomposed to two components: the single-index coefficients β and the rest coefficients γ α = (α, γ ).They can be estimated iteratively, or through the profile likelihood estimation (Liang et al. 2010;Yu, Wu, and Zhang 2017) using standard nonlinear optimization software such as nlm.However, large nonlinear optimization may be computationally expensive and unstable.Instead, we advocate an iterative algorithm, where β and γ α are estimated iteratively.Specifically, given the single-index coefficients β, the P-spline estimates, γ α,λ , can be obtained straightforwardly as γ α B α ( βT x i,t j−1 ) is linear with respect to coefficients γ α .Existing tools such as gam function in the R package mgcv can be readily used for the P-spline estimation.To estimate the single-index coefficient β, we apply the following linear approximation to the unknown function η, so that We further approximate η(β an estimated β for β 0 along with the spline approximation.Consequently, the highly nonlinear problem of maximizing (4) turns into a linear problem facilitating an efficient algorithm summarized below.
1. Obtain an initial estimate for the single-index coefficient β(0) , so that the problem reduces to a univariate smoothing problem.We maximize the penalized-spline loglikelihood objective function (4) with respect to γ α , and obtain the P-spline estimates γ α,λ .3. With the linear approximation (6), we update the singleindex coefficient β by maximizing the log-likelihood function (4) with respect to β. Reparameterize β such that β 2 = 1 and sgn( β1 ) = 1. 4. Repeat Steps 2 and 3 until all parameter estimates converge.

Connection to Existing Bankruptcy Prediction Models
Our default-prediction single-index models (DSI) shows great promises especially in calibration performance as demonstrated in the next section.It is the first model passing the Hosmer-Lemeshow test including 2008 financial crisis period on a comprehensive U.S. corporate bankruptcy database, to the best of out knowledge.Furthermore, it is flexible and semiparametric, encompassing the state-of-the-art CHS model for bankruptcy prediction.
Our approach is also partly motivated by the class of discrete transformation models (DTM, Ding et al. 2012), which is a parametric class of transformation survival models that yield better performance comparing to CHS model.In particular, DTM is derived by applying a class of monotonic transformation functions G(•), inverses of logarithm and Box-Cox transformations, to −log[S(t j |x t j )/S(t j−1 |x t j−1 )], where S(t j |x t j ) is the conditional survival function.That is, , ( 7) where S 0 (t j ) is the baseline survival function.The DTM model is motivated by its continuous counterpart (Zeng and Lin 2006), which has the following transformation relationship, where (t|x t ) = t 0 λ 0 (s) exp(β T x(s))ds is the cumulative conditional hazard function.It is important to note that ( 7) and ( 8) are fundamentally different as (7) essentially applies the transformation function G(•) to the difference of cumulative hazard, while (8) applies G(•) to the cumulative hazard and then takes derivative.The continuous survival model would not be applicable for corporate default prediction due to the use of calendar time, that companies do not share the common origin time, and that the conditional cumulative hazard function (t|x t ) would be ill-defined.Similarly, the continuous counterpart of our single-index hazard model (Wang 2004;Huang and Liu 2006;Lu et al. 2006) is not suitable for the bankruptcy prediction application.
We can rewrite the DTM model of transformation relationship (7) as where G * (u) = G −1 (log(1 + u)) and α * j is the transformed baseline that only depends on time.Interestingly, we further find that if G(•) in Equation ( 7) is chosen to be the inverse of a family of logarithm transformation considered by Ding et al. (2012) and Zeng and Lin (2006), which is defined as which is the same Box-Cox transformation considered by Zeng and Lin (2006) and Chen, Jin, and Ying (2002).
Our proposed DSI model (1) can be rewritten as p i,t j 1 − p i,t j = exp α * j + η(β Note that both DTM (model ( 9)) and our proposed DSI (model ( 10)) encompass the state-of-the-art CHS model for corporate bankruptcy prediction.However, DTM adopts a class of monotonic parametric transformations, Box-Cox transformations, G * (•), on the exponential term exp(α * j + β T x t j ), while DSI applies a flexible nonparametric function η(•) to the linear projection or single index β T x t j .Indeed, we discover an interesting V-shaped relationship, which is discussed next.

Empirical Results
We report empirical results of our proposed default-prediction single-index hazard model (DSI) on the bankruptcy data described in Section 2. In particular, we compare our DSI with popular benchmark models, namely, the state-of-the-art CHS (Campbell, Hilscher, and Szilagyi 2008) model in finance and the optimal discrete transformation survival model (DTM) (Ding et al. 2012).We document the estimation results based on the full sample data from 1980 to 2016.We then examine models' out-of-sample prediction performance through an expanding window.A robustness check over various prediction periods is conducted and included in online supplementary materials.

Estimation Results and Assessment of PD Prediction Accuracy
After fitting the default-prediction single-index hazard model on the bankruptcy database we built, an interesting nonlinear relationship is unveiled from the estimated unknown flexible function η.We visualize a V shape in Figure 2, where the shaded interval is the 95% confidence band.It is clear that the relationship depicted in Figure 2 is nonlinear, indicating a severe violation to the common assumption of linearity adopted by popular models such as CHS (Campbell, Hilscher, and Szilagyi 2008).This interesting finding may also provide some initial empirical evidence to the so-called "financial frictions, " where a firm with an extremely low index value may suffer from a similar risk of bankruptcy as a firm with an extremely high index value due to reasons like a restricted credit supply (Giordani et al. 2014).
In the binary prediction problem, the probabilistic prediction accuracy is commonly assessed in two dimensions: calibration and discrimination.A model with good calibration performance is crucial for PD estimation, especially in economic environments, like the one the world is currently in, where closely-monitored stress testing and CCA are vitally important.However, in bankruptcy literature, formal calibration testing is rarely performed.Table 2. In-sample and out-of-sample prediction assessment based on the p-value of Hosmer-Lemeshow (H-L) goodness-of-fit χ 2 -test, AUC, Pseudo-R 2 , and the decile ranking table for the proposed default-prediction single-index hazard model (DSI) for corporate bankruptcy prediction; a state-of-the-art bankruptcy prediction model in finance, CHS of Campbell, Hilscher, and Szilagyi (2008); and an optimal discrete transformation survival model (DTM) of Ding et al. (2012).
Panel A. In-sample (1980A. In-sample ( -2016) ) Panel operating characteristic (ROC) curve, known as AUC, where AUC = 0.5 for a completely random guess and AUC = 1 for a perfect discrimination.A higher AUC value indicates better discrimination.
In addition, decile tables and pseudo-R 2 are commonlyused metrics for assessing model performance in the corporate bankruptcy prediction literature.The decile table summarizes the proportion of observed events in cumulative bins that are constructed by categorizing the sorted predicted probabilities from the largest to smallest.For example, the cumulative bins can be 90%-100%, 80%-100%,…, 0%-100% of predicted PDs, and for each bin, the proportion of the bankruptcies out of total number of bankruptcies is calculated.Therefore, the calculated proportions monotonically increase across the cumulative bins, and it equals to 1 for the bin of 0%-100%.A larger number in the top bins is desirable.Pseudo-R 2 is defined as 1 − logL 1 /logL 0 , where L 1 and L 0 are the likelihood from fitted model and null model, respectively.
In Panel A of Table 2, we summarize the model estimation results on the full sample period.Importantly, we observe that our proposed default-prediction single-index hazard model is able to pass the Hosmer-Lemeshow goodness-of-fit test with a large p-value of 0.665.On the other hand, the Hosmer-Lemeshow test rejects both benchmark models, implying that their estimated PDs are clearly not well-calibrated.Furthermore, in the cumulative decile ranking table, DSI consistently shows a higher proportion of bankruptcy firms in the top bins than the other two models.For example, in the top 10% bin, 65.2% bankruptcy firms are captured by DSI, while CHS and DTM capture 63.6% and 64.0%, respectively.The reported pseudo-R 2 and AUC deliver a similar message, showing an enhanced insample performance of the DSI model over the CHS and DTM models.Taken together, we conclude that the proposed DSI model has substantially improved the accuracy for corporate bankruptcy prediction, especially in terms of the calibration performance for model fitting.
We further cross-compare the DSI model with the CHS model by investigating the two tail ends from each model.It is interesting to note that there are 14 bankrupted companies with their predicted PDs ranked in the top 10 percentile from DSI but ranked in the lowest or safest 10 percentile from CHS.However, we find none if the opposite filter is applied, that is, the lowest 10 percentile from DSI but highest 10 percentile from CHS.
This fact again echoes with the nonlinear V-shaped relationship discovered in Figure 2.

Out-of-Sample Prediction
We now shift our attention to the out-of-sample performance.To evaluate models' out-of-sample performance, we adopt an expanding window approach.Specifically, we start with training data from 1980 to 2005 and make predictions for the following year of 2006.We then expand our training data by one year (i.e., from 1980 to 2006) and make predictions for the year of 2007.We keep expanding this training sample window until the last year in our sampling period is used as the testing data.
We summarize the out-of-sample prediction results in Panel B of Table 2. Consistent with our discussion in the previous section, our proposed default-prediction single-index hazard model tends to outperform the other two benchmark models in all the commonly-adopted metrics in the bankruptcy prediction literature.For example, the DSI passed the Hosmer-Lemeshow calibration test with a p-value of 0.281 whereas neither CHS (Campbell, Hilscher, and Szilagyi 2008) nor the DTM (Ding et al. 2012) passed the test.Furthermore, in the top 10% bin, we observe nearly 10% more (73.0%vs. 64.5% vs. 65.2%)bankruptcy firms captured correctly by DSI, comparing to that of CHS model and DTM model.Consistent improvements in pseudo-R 2 and AUC demonstrate the advantage of our proposed DSI in providing better prediction performance.A robustness check across different periods are conducted, and the results are summarized in Table S2 of supplementary materials.
In summary, from our limited empirical analysis, an interesting nonlinear, nonmonotonic, V-shaped relationship is unveiled by our proposed DSI hazard model, while current most popular models in finance, based on the linearity assumption, would not have the ability to capture.Most importantly, to the best of knowledge, DSI is perhaps the only model that can pass the Hosmer-Lemeshow goodness-of-fit test and provide substantially better-calibrated PD estimates both in-sample and outof-sample.This may have potentially important implications in practice.As shown in Figure 1, the DSI, CHS, and DTM models yield similar predicted PDs for the majority of the lower percentiles in the CHS predicted PDs.However, the three predicted PD curves are noticeably different as the predicted PDs get larger.In the top one percentile of CHS predicted PDs, instead of Market share is the proportion of borrowers obligated to a lender out of all firms that are not rejected by all three lenders.Share of defaulters is the percentage of defaulted firms to which the loan is granted out of the total number of default.Average credit spread is the credit spread averaged over all firms obligated to a lender.Revenue = Market size ($100B) × Market share × Average credit spread, Loss = Market size ($100B) × Share of defaulters × Prior probability of default × LGD, where the prior probability of default is the sample failure rate in the same prediction year, and Profit = Revenue − Loss.
lining up along the 45-degree CHS line, the predicted PDs from the DSI model fall considerably under the CHS prediction curve with the DTM predictions in between.For the CHS predicted PDs between 85 and 99 percentiles in the shaded window, the order is completely reversed where the predicted PDs from the DSI model generally falls slightly above the 45-degree CHS line.These interesting findings may suggest that the cash reserves based on majority small predicted PDs from CHS are similar as those from the default-prediction single-index hazard model and the transformation survival model.But for higher predicted PDs, the cash reserves based on the CHS model may be possibly optimistic or overly conservative, especially for exceptionally high PDs.

Economic Value
A direct application of a PD prediction model is that it can be used as a credit scoring model for pricing during the lending practice.Specifically, in the loan market, different banks may adopt different credit scoring models to assess loans to individual firms, after which the banks make decisions of either reject or lend money with certain price.If not rejected, the borrowers then choose the lender who offers the lowest price (credit spread), which is mainly determined by the predicted value of borrower's default probability.Here we adopt the following expression of credit spread derived by Blöchlinger and Leippold (2006).
where LGD is loss in loan value given default, which is often prespecified; k is also a prespecified value that is the credit spread for the highest quality loan.Following a similar setup of Agarwal and Taffler (2008), we assume a 100 billion (USD) loan market and three lenders with each one using a different PD prediction model, namely DSI, CHS, and DTM.We refit all three models by excluding financial firms as potential lenders and use the expanding window approach to predict PD for each year from 2006 to 2016.All three lenders reject the loans in the highest 5% PD category.If at least two lenders offer the same price to a loan according to (11), the borrower will randomly choose one.
Table 3 shows the empirical results of economic value for each lender under the described lending practice.The results are averaged over years, where the left panel aggregates the entire prediction period (2006-2016) which includes 2008 financial crisis, while the aggregation period of the right panel (2011)(2012)(2013)(2014)(2015)(2016) excludes the financial crisis.It is clear to see that for both scenarios, the lender who adopts the proposed defaultprediction single-index model receives the highest profit and largest market share.The profits based on DSI and DTM are substantially larger than CHS, providing another possible evidence that the linearity assumption in CHS may be inadequate to characterize the actual relationship.The average credit spread is 44 base points for DSI in both panels, slightly higher than DTM but much lower than CHS.Credit spread relies on the predicted PD, and a higher credit spread leads to a higher revenue but lower market share.Thus, the predicted PD by DSI well balances such a tradeoff, leading to the highest profit.The results shown in Table 3 provide another empirical evidence that the proposed DSI model is more preferred than the other two benchmarking models in a lending practice.Additionally, we notice that the share of defaulters for DSI is higher than the other two.This does not contradict to the decile rankings in Table 2 as the latter is evaluated for each model independently while the share of defaulters is based on a shared market where each company is obligated only to one lender.

Simulation Study
We further conduct a simulation study, mimicking the real bankruptcy process.We simulate bankruptcy data largely based on the distribution of the real bankruptcy data.Specifically, at each time point j = 1, . . ., 36, we generate N j firms, where N j ∼ Poisson(N * j ), and N * j is a proportion of the number of new firms entered at time j in the real data. 5For each simulated firm that enters at time j, the covariates (bankruptcy predictors) are simulated from multivariate normal distribution, using the same mean and variance-covariance matrix as the real bankruptcy data.The probability of default of company i is calculated as 1/(1+exp(− αj −η( βT x ij ))), where η(u) = 5.5u+1.3u 2 −1.8u 3 is adopted to mimic the V shape shown in Figure 2. The timevarying baseline parameters are obtained from αj = log( pj where pj is the overall default rate at time j observed from our real bankruptcy data.The binary bankruptcy indicator is simulated based on the Bernoulli distribution with calculated Table 4. Mean squared error (MSE), bias, and standard error (S.E.) of coefficient estimates based on the simulated data, mimicking the real bankruptcy data, for the proposed default-prediction single-index hazard model (DSI); a state-of-the-art bankruptcy prediction model in finance, CHS of Campbell, Hilscher, and Szilagyi (2008); and an optimal discrete transformation survival model (DTM) of Ding et al. (2012) probability of default in previous step.Firms entered at time j stays until default happens.We generate 500 bankruptcy datasets and apply the three modeling approaches, the default-prediction single-index hazard model (DSI), the state-of-the-art discrete linear hazard model CHS, and the "optimal" discrete transformation survival model (DTM), on each simulated data.Table 4 compares the mean squared error, bias and standard error of coefficient estimates across the three models based on the 500 simulated datasets.It is clear that the DSI model provides the least MSE and bias in coefficient estimates.Table 5 further shows several metrics in models' performance assessment.Specifically, based on the 500 replicates, we calculate (i) the mean of absolute deviance of the estimated PD, that is, | p − p 0 |, where p 0 is the true PD and p is the estimated PD; (ii) the mean of AUC; (iii) the mean of pseudo-R 2 ; and (iv) the Hosmer-Lemeshow calibration test rejection rate.The advantage of adopting the proposed DSI model is noticeable.Specifically, the PD estimates from DSI model provide the minimum deviations from the true PDs.While neither CHS nor DTM model passes the Hosmer-Lemeshow test among any of the 500 simulation runs, only about 4.6% of the simulation runs reject our proposed DSI model, suggesting a strong calibration power of the DSI model in this simulation setting.In addition, we note that DSI model consistently provides the highest AUC values and pseudo-R 2 among the three models.Overall, through this limited simulation study mimicking the real bankruptcy process, we show that the proposed DSI model for corporate bankruptcy is able to provide superior calibration and discrimination performance in predicting the probability of default.

Asset Pricing
An important application of predicted PDs is to be used as the default risk for constructing investment portfolios.As the conjecture in Fama and French (1996), investors may expect a positive association between the expected return and default risk when holding stocks.Such a positive relationship has been confirmed by a number of studies (Vassalou and Xing 2004;Chava and Jarrow 2004;Aretz, Florackis, and Kostakis 2018).In contrast, some empirical studies document a negative relationship that holding stocks with high default probabilities harvests anomalous low returns (Dichev 1998;Griffin and Lemmon 2002;Campbell, Hilscher, and Szilagyi 2008;Da and Gao 2010;George and Hwang 2010;Gao, Parsons, and Shen 2018).
This puzzling default risk anomaly has attracted vital attention due to its challenges in both practice and theory.We revisit the puzzle by using the predicted PDs from our proposed DSI model in the extended sample period including 2008 financial crisis.In particular, we obtain firms' one-year-ahead predicted PDs as described in Section 4 using the expanding window approach starting from 1980 to 1982, and construct 10 decileportfolios by sorting the predicted PDs.These 10 portfolios are updated each year accordingly.We also construct three long-short (buy-sell) portfolios, LS9010, LS9505, and LS9901, to investigate the risk-return anomaly.That is, we suppose investor to hold the stocks associated with highest 10 (5, or 1) percent default risk in long position and those with lowest 10 (5, or 1) percent in short position for a number of years (1983 to 2016 for our study).If the positive relation between risk and return holds, such long-short portfolios would be expected to gain positive excess returns, which are the difference between stock and the S&P 500 index return.
Table 6 reports the asset pricing results.In Panel A, we show that the long-short portfolios LS9010, LS9505, and LS9901 yield monthly average excess returns of −0.17%, −0.27%, and −1.39%, respectively.The most extreme portfolio has significant excess returns with t-statistics of −2.12.This observation implies a weak anomaly, that is, the distressed stocks with high predicted PDs from our DSI model yield negative returns.
In addition to the mean excess return, we also attempt to offer some insights to an important theoretical asset-pricing question: "How can factor models explain the stock return?"Specifically, we regress each portfolio's monthly excess return on the factors and report the estimated intercepts (alphas).If a factor model can explain stocks' excess return, the estimated alpha is expected to be zero.We apply both Fama-French three-factor model (Fama and French 1996) and the five-factor model (Fama and French 2015), which are among the most agreed factor models.The three-factor model includes the market factor (Market), the size factor (Small Minus Big, or SMB), and the ).These factors are market factor (Market), size factor (SMB), value factor (HML), profitability factor (RMW), and investment factor (CMA).We sort all stocks based on the one-year ahead expanding window prediction of bankruptcy from our default-prediction single-index model and divide the stocks into 10 portfolios based on deciles.For example, 0 to 10th percentile is denoted as "0010" and 90th to 100th percentile is "9000." The long-short portfolios LS9010 or LS9505 go long with the 10% or 5% riskiest stocks and short the 10% or 5% safest stocks.The results for mean excess returns, alphas of two models are reported in Panel A. In Panel B, we show the Fama-French three-factor regression coefficients.We also report the corresponding values of t-statistics in parentheses.
value factor (High Minus Low, or HML), while the five-factor model has two additional factors: profitability factor (Robust Minus Weak, or RMW), and investment factor (Conservative Minus Aggressive, or CMA).
As shown in Panel A, the three-factor model cannot fully explain the abnormal negative excess return, as the estimated alphas are significant for the long-short portfolios LS9010 (−0.78%),LS9505 (−0.86%), and LS9901 (−1.77%).However, the five-factor model can explain the excess return with insignificant alphas.These findings imply that the distress anomaly as evidenced by negative alphas may still exist but has clearly weakened, especially under the five-factor model, in the extended sample period including 2008 financial crisis comparing to the earlier sample periods considered in Campbell, Hilscher, and Szilagyi (2008) or Ding et al. (2012).Panel B shows the factor loadings (estimated coefficients) for the three-factor models.The values are qualitatively consistent with those in the existing literature (Campbell, Hilscher, and Szilagyi 2008;Ding et al. 2012;Hou, Xue, and Zhang 2015).
We also report the portfolio characteristics in Panel C. The average relative size (RSIZE) of portfolios implies that the size decreases along with the increase of default risk.On the other hand, the firms in the two tails of predicted PDs are associated with a high market-to-book equity ratio (MBE).These two pieces of evidence are consistent with Campbell, Hilscher, and Szilagyi (2008) and Ding et al. (2012).In addition, one referee made an interesting suggestion that constructing a new distress factor using our calibrated default probability may also have important asset pricing implications.This is an interesting question to explore further in the future. 6n summary, our findings based on the period of 1983-2016 imply that the default risk anomaly has weakened or even disappeared using the predicted PDs from the proposed DSI model.The stocks with higher default risk no longer earn strong anomalously lower excess returns than those firms with lower default risk after adjusting for the risk factors.

Conclusion
In this article, we develop a flexible default-prediction singleindex hazard model (DSI) for corporate PD prediction, and asymptotic properties have been established for a penalizedspline likelihood estimation.Applying the proposed DSI model to a comprehensive corporate bankruptcy database we build, we have a number of interesting findings.First, we discover a V-shaped relationship of the systematic component with the "financial default" single index.This is in stark contrast to the popular linear assumption in the state-of-the-art bankruptcy prediction model CHS (Campbell, Hilscher, and Szilagyi 2008), indeed the Cox discrete hazard model (Cox 1972).The uncovered V shape suggests that the linearity assumption may be severely violated.Second and most importantly, the proposed DSI model passes the Hosmer-Lemeshow goodness-offit calibration tests while neither does CHS nor an optimal transformation survival model (DTM).In our empirical study, we observe that majority small predicted PDs from the three models are close to each other.However, for higher predicted PDs, the three models yield noticeably different predictions.These findings may have important implications in practice.For example, the cash or capital reserve calculations based on the DMH factor helps but still cannot fully explain the profitability anomalies.A more in-depth study is needed to explore the new distress factor and potential implications.
CHS predicted PDs may be optimistic or conservative, especially for extremely high PDs.Third, we examine the important asset pricing implication based on the predicted PDs from the proposed model.We find that the negative distress anomaly, that the higher is the distress risk the lower excess return even after controlling important factors, has weakened and even disappeared during the extended period including 2008 financial crisis.
Probability of default prediction has many applications in a variety of fields beyond corporate bankruptcy, such as credit card, commercial and residential loans, corporate bonds, mortgage borrowing, and foreclosure process.It is our hope that the developed flexible default-prediction single-index hazard model may be potentially adopted in these fields to deliver accurate prediction and high impact in practice.

Figure 1 .
Figure 1.Predicted PDs from a state-of-the-art bankruptcy prediction model in finance, CHS of Campbell, Hilscher, and Szilagyi (2008) (diagonal line); an optimal discrete transformation survival model (DTM) of Ding et al. (2012); and the proposed default-prediction single-index hazard model (DSI) for corporate bankruptcy prediction.
often referred to as "light penalization" where the complexity is mainly controlled by choosing the optimal number of knots.On the other hand, assuming m = p, choosing

Figure 2 .
Figure 2.Estimated unknown flexible function η(•) from the proposed defaultprediction single-index hazard model.The range of horizontal axis is the estimated "single-index" based on the full sample period.

Table 1 .
Summary statistics for bankruptcy predictors.
The in-sample model fitting is based on the full sample period from 1980 to 2016, and the out-of-sample prediction is based on expanding window predictions.

Table 3 .
Comparison of economic value between different PD prediction models.Left panel shows the results averaged over 2006 to 2016 while the right panel is aggregated from 2011 to 2016.

Table 6 .
Returns on bankruptcy risk-sorted portfolios.This table reports the average value-weighted excess returns and its regression of two models: Fama-French three-factor model (Market, SMB, HML), and Fama-French five-factor model (Market, SMB, HML, RMW, CMA