Modeling and predicting IBNR reserve: extended chain ladder and heteroscedastic regression analysis

This work deals with two methodologies for predicting incurred but not reported (IBNR) actuarial reserves. The first is the traditional chain ladder, which is extended for dealing with the calendar year IBNR reserve. The second is based on heteroscedastic regression models suitable to deal with the tail effect of the runoff triangle – and to forecast calendar year IBNR reserves as well. Theoretical results regarding closed expressions for IBNR predictors and mean squared errors are established – for the case of the second methodology, a Monte Carlo study is designed and implemented for accessing finite sample performances of feasible mean squared error formulae. Finally, the methods are implemented with two real data sets. The main conclusions: (i) considering tail effects does not imply theoretical and/or computational problems; and (ii) both methodologies are interesting to design softwares for IBNR reserve prediction.


Introduction
Insurance companies are divided in two groups: life insurance companies, which sell life insurances, annuities and pension products; and non-life insurance companies, which sell the remaining types of insurance products. Essentially, non-life insurances (also referred as property and casualty insurance or general insurance, cf. [16]) are contracts between the insurer and the insured, which include some different lines of business (e.g. motor/car insurance, marine insurance and property insurance). These contracts establish that the insurer receives a fixed amount of money (premium) for providing a financial coverage against the random occurrence  of well-specified events (cf. [23]). The right of the insured to such financial coverage constitutes a claim. A typical claim process consists of the following stages: an occurrence (the event), the act of reporting to the insurer, payments to the insured (settlement) and the claim's closing. Occasionally, it can be reopened for another payment(s) and a final closing (see [16]). These stages are depicted in Figure 1.
Usually, there is a time interval (delay or lag) between the occurrence date and the reporting date, in which the insurer, even though being liable for the claim amount, is unaware of the claim's existence. During this period, the claim is said to be incurred but not reported (IBNR). For such reasons, it is important for non-life insurance companies (insurance companies, hereafter) to make reserves; specifically, in this case, IBNR reserves (cf. [1,5,[10][11][12]16]). The latter are loosely defined as the economic-technical system under which an insurance company protects itself from taken risks of future accidents suffered by their clients. Since the acknowledgement of such IBNR accidents is only gathered in the future, efforts at a given present time must be towards the prediction (or estimation, or even forecast; these terms shall be used interchangeably from now on) of the corresponding reserves, along with a measure of uncertainty, generally the mean squared error.
This article focuses on improving two well-established approaches for predicting IBNR reserves. Both methods will be capable of forecasting the calendar year IBNR reserve, an amount of prime importance to any insurance company and to be defined later in this paper. The first approach is the traditional chain ladder under the stochastic assumptions considered by Mack [13]. This method shall be extended in the sense that some expressions for the predictor of the aforementioned calendar year IBNR reserve and for the corresponding mean square error will be derived. The second approach consists of linear regression analysis under ordinary and weighted least squares, both of which automatically assuming heteroscedastic errors. For this second approach, theoretical and feasible expressions for mean square errors of the IBNR reserve prediction are derived. These feasible formulae, which account for the estimation of unknown regression coefficients and error variances, are evaluated as regards its asymptotical properties. Additionally, a Monte Carlo exercise is provided in order to unveil the true capability of the latter properties of guaranteeing, for instance, that confidence intervals for the IBNR, based on finite and frequently small runoff triangle, are adequate. The second approach shall also recognize parcels of the IBNR reserve related to accidents that might occur and/or be reported in a remoter future. These parcels are termed the tail effect (or, simply, the tail) in the actuary literature.
The rest of the article is organized as follows. Section 2 offers, without claiming exhaustiveness, a review of the literature about different methods for estimating IBNR reserves. Section 3 sets the notations and definitions to be used elsewhere in the text; it discusses the runoff triangle for arranging IBNR data and three types of IBNR reserve (including the calendar year reserve). Section 4 is dedicated to the chain ladder and the corresponding expressions of estimators for several IBNR reserve types and associated mean squared errors. Section 5 adapts much of the theory of linear regression analysis to the runoff triangle framework and, under such scenario, offers developments for modeling and estimating IBNR reserves with tail effects. Section 6 is entirely planned for showing the results of the two methods, developed in former sections, with two runoff triangles previously tackled in the literature. Section 7 presents a final discussion concerning theory and practical guidelines, and suggests some themes for future research. Finally, a brief detailed information regarding a supplemental online material, which includes additional details regarding the results of Section 6 and some appendices containing the proofs of the theoretical results, is given.

Basic definitions
Generally, IBNR data are disposed in the runoff triangle (cf. [1,9,10,18]), which is depicted in Table 1 under a quite usual double-index notation. The first index is associated to the lines and indicates the accident years or years of origin. The second index, corresponding to the columns, gives the observed delay from the accident occurrence until the payment -such delay is commonly termed the development years. Summarily, a given value C ij in the runoff triangle represents the payment for an accident occurred in timei and reported in time i + j, the latter being the actual time instant under the usual time notion. Therefore, rigorously speaking, the data in the runoff triangle are time-ordered by the diagonals. When j = 0 (see the first column in Table 1), there have been no delays. It is worth mentioning that, even though the expression 'year' is being used here, the frequency of the data could be any other: annual, quarterly, and so on. Such choice depends on which market the insurance company acts in and the type of insurance contract being considered.
The prime objective of any method for IBNR reserve prediction is to 'fill' (that is: to estimate) the lower part of the runoff triangle -see Table 2. The values corresponding to this lower part are the unobserved reserves corresponding to the IBNR accidents. Given some specific year (recall: 'year' means the frequency unit, whatever it is in a given runoff triangle), the corresponding amount to be paid shall be the sum of the values of the corresponding diagonal.

Tail effect
If the prediction for a specific year is solely based on the data displayed in Tables 1 and 2, the insurance company might be underestimating the true value of the IBNR reserve by neglecting the fact that, possibly, there are some values corresponding to accidents occurred in given year i and reported at least J years ahead. Consequently, the insurance company shall be, to some extent, taking the risk of insolvency. This motivates the extrapolation/estimation of values corresponding to columns that are beyond the last column of the runoff triangle -these values are the tail effect; or simply, the tail (cf. [4,20]), and are illustrated in Table 3. The total reserve that the insurance company is supposed to make for the year J + 1 shall be obtained by summing up the estimated values of the corresponding diagonal. Besides, in this paper it is considered an extension of the tail effect for rows beyond the actual year J, which is also portrayed in Table 3. Such extension proposed here of this tail effect, and therefore of the concept of the tail effect Table 3. The runoff triangle with both the original column-wise and the extended row-wise tail effects.

Development j
Tail effect Accident itself, has not been discussed in any of cited references. Actually, the values of this row-wise tail effect correspond to 'future' IBNR data and are related to reserves for accidents that will occur in J + 1, J + 2, . . . and for some reason will not be immediately reported to the insurance company (see the discussion of this point in Section 1). Evidently, the estimation of these values is in the interests of the insurance company, because they make part of the total reserve for future time instants.

Accumulated triangles
Until now, the values C ij in the runoff triangle represent the reserve corresponding to accidents occurred in year i and reported with a delay of j years. These same data can be disposed in an alternative fashion by accumulating the columns, as depicted in Table 4. The values D ij represent the IBNR data, formerly displayed in Tables 1 and 2, aggregated by columns; that is, D ij = j k=0 C ik . Kremer [12], Mack [14] and Taylor [16] make use of this ordering for implementing the chain ladder (see Section 4). Table 4. The accumulated runoff triangle under the usual double-index notation.

Different types of IBNR reserve
In this paper, three types of IBNR reserve will be addressed: the partial, the total and the calendar year. The first type is considered by almost all the literature cited previously (cf. [1,3,5,12,13,15,16,19]). Given some line i of the runoff triangle, the corresponding partial IBNR reserve is associated with the accidents occurred that year of origin. The second type is considered, for instance, by England and Verrall [5] and Atherino et al. [1], and represents the reserve corresponding to the whole lower part of the triangle. Finally, the third type is defined as the reserve corresponding to the actual time instant t = i + j, for t = J + 1, J + 2, . . ., and, therefore, is related to a given diagonal of the lower part of the triangle. As formerly discussed, the expression 'calendar year' might represent any time unit.

The traditional chain ladder
To formally proceed with the theory behind the chain ladder, some notation and basic definitions must be set: (1) Let X be a n × 1 random vector defined on a probability space ( , , P) and some nonempty class of subsets of n . The inverse image of under X is the class of events (2) Let Y i the random vector of observations from the row i, i = 1, 2, . . . , J, of the runoff triangle described in Tables 1 and 2. The dimension of Y i is denoted by n i and the σ -field generated by (3) The Borel field of n (which is the σ -field generated by the open sets of n ) is denoted by B n . The Cartesian product between the Borel fields of n and m is denoted by B n × B m . Some care is needed here to not interpret the latter as the product σ -field, which is actually given by σ (B n × B m ). (4) A non-empty class of subsets of any non-empty set is a π -system whenever it is closed by finite intersections. (5) Let X and Y be two random variables with finite second moments and defined on the same probability space. Let ℘ be a sub σ -field, with respect to which Y is measurable. If X is supposed to be 'estimated' by Y , the conditional mean squared error corresponding to Y , Now, the hypotheses considered by Mack [13] -hereafter: Mack hypotheses -are enunciated below, in order to supply the chain ladder with a theoretical foundation. The reader is referred to the notation used already in Table 4: The chain ladder is rather simple to be implemented, as the following expressions of its IBNR reserve estimators unveil: where The conditional mean squared errors corresponding to the estimatorsD (i) , i = 1, . . . , J, of the partial IBNR reserves andD (T) of the total IBNR reserve, given -see Definitions 2 and 5 above -have been obtained by Mack (1993) from M1, M2 and M3. Their expressions are

Extended chain ladder: predicting the calendar year reserve
Using Mack hypotheses, it is possible to extend the chain ladder for obtaining a predictor for the calendar year IBNR reserve and a corresponding conditional mean squared error given . The result below, whose proof is given in Appendix 1 as supplemental online material, is crucial to what follows: From Lemma 1, it is possible to establish the following result (proof is in Appendix 2 as supplemental online material), which provides the essentials for estimating the calendar year IBNR reserve.
Theorem 1 Under the double-index notation of the runoff triangle (cf . Tables 1 and 2 Under the Mack hypotheses, it follows that: Formulae (i) and (ii) in Equation (7) have a special feature as compared with the traditional chain ladder for partial and total IBNR reserves: formula (i) is a conditional expectation given , which is the unbiased estimator of the calendar year IBNR reserve with minimum unconditional mean squared error -theoretically, the latter can be obtained by taking expectation of formula (ii).

Initial settings
An alternative frequently addressed in the IBNR literature is to suppose that the data arranged in the runoff triangle have been generated by the following linear regression model: where E(ε) = 0 and Var(ε) = diag(σ 2 1 , . . . , σ 2 n ) -or, to be even more general, Var(ε) = . In model (8), the X matrix represents dummy variables corresponding to row and column effects; recall Table 1. Clearly, if Var(ε) = diag(σ 2 1 , . . . , σ 2 n ), the model takes into account the possibility of heteroscedastic errors and, if Var(ε) = is non-diagonal, some sort of correlation structure is allowed as well.
According to Section 3.4, there are at least three types of IBNR reserve. This suggests a new notation for merging the runoff structure and the usual linear regression theory. Defining ≡ {t : Y t is not a missing value}, it becomes possible to express the partial IBNR reserve corresponding to the year of origin i, i = 1, 2, . . . , J, as Analogously, the total IBNR reserve is given by The calendar year IBNR reserve could also have its own corresponding expression like those in Equations (9) and (10).

IBNR reserve least squares predictors: theoretical expressions
As will be now discussed, for each type of IBNR reserve, ordinary least squares (OLS) provide unbiased forecasts, whenever model (8) adequately fits the data in the runoff triangle -this basic hypothesis is, from now on, assumed for the rest of this section. However, in cases where Var(ε) = σ 2 I, such approach will not yield minimum variance estimators; besides, the corresponding expressions for the variance matrix shall be modified for inferential purposes (cf. [7,Chapter 5]). For such general error structures, best linear unbiased estimator (BLUE) for IBNR reserves are obtained from generalized least squares (GLS) (cf. [7, Chapters. 5 and 6]). Therefore, general expressions for the IBNR reserve estimator and its corresponding mean squared error must be derived. Surely, these expressions shall result in specific forms suitably for each type of IBNR reserve. The next theorem solves the task for quite general situations. Appendix 3, available as supplemental online material, contains the proof.
Theorem 2 Consider the following linear regression model: where 1I = [I 1 , I 2 , . . . , To adapt Theorem 2 to the runoff triangle, proceed as follows: the random vector Y O contains the observed IBNR data (that is: the runoff triangle itself), whereas Y A is supposed to represent the absent (or missing) values that constitute the IBNR reserve to be estimated (that is: the lower part of the triangle, possibly including tail effects). Notice that Y S does not involve all the absent values necessarily; instead it would be defined to accommodate the C ij corresponding to the type of IBNR reserve supposed to be estimated, whatever partial, total, calendar year, or any other.
The particular but extremely important cases regarding uncorrelated error terms are covered below.

Corollary 1 (a) Under the same definitions and notations of Theorem 2 and considering
Although GLS estimators are more efficient than the OLS ones, the latter can also accommodate possible heteroscedasticity and correlation structures and have the advantage of being a simpler approach when these formulae being discussed so far are considered in real situations. Indeed: for the cases of uncorrelated errors, the GLS approach does require the estimations of Var(ε) = diag(σ 2 1 , . . . , σ 2 n ) for their entire theoretical expressions, whereas the OLS approach depends much lesser on these 'plug-in' procedures -therefore, one can suspect that the latter is theoretically more robust to small sample problems. This last point shall be addressed in more detail in Section 5.3.
Firstly, note that, if Var(ε) = , then whereβ is the usual OLS estimator. Theorem 3, whose proof is analogously to that of Theorem 2, offers closed expressions for the IBNR reserve OLS estimator and its corresponding mean squared error for a quite general Var(ε) = . The case of uncorrelated errors is considered in Corollary 2.
Theorem 3 Consider the following linear regression model:

IBNR reserve least squares estimators: feasible expressions
Equations (11)-(13), (15) and (16), albeit representing closed expressions, are not much useful yet, since they depend upon knowledge of the error covariance matrix, a quantity that in general must be estimated. Therefore, in order to make these formulae more adequate for practical purposes, one should replace Var(ε) = (or, to be more precise, V (β) and V (β)) by appropriate sample counterparts. Actually, this is the plug-in procedure previously mentioned in Section 5.2 when the GLS and OLS estimators were being briefly compared. The theoretical mean squared errors in Equation (11) are now turned feasible by being estimated as Notice that the IBNR reserve Y s is being estimated by a feasible GLS estimator given by (15) is, by its turn, estimated by From empirical evidences -part of which to be shared with the reader in Section 6 -, when the residuals are heteroscedastic, the main source of such heterogeneity is generally the column effect. This finding can be taken as an important stylized fact, in case of one has plans of making softwares for IBNR reserve estimation. Specifically, residual analyses with real runoff triangles commonly reveal that observations associated with the initial columns (payments occurring closer to the accident years) tend to have larger variability. In view of this, and assuming uncorrelated error terms, the error variances are set equal within a given column and, consequently, the estimator of this common variance, which shall be used for the IBNR reserve feasible GLS estimator and its mean squared error, is the sample mean of the squared OLS squared residuals from that same column. On the other hand, feasible expressions for the mean squared errors associated with the IBNR reserve OLS estimator -Equation (18) simplifies much the same way -are entirely based on the White covariance matrix estimator, which need not assume knowledge of the heteroscedastic structure (cf. [6,7,21]).
From now on, efforts shall be gathered towards the establishment of large-sample properties of the proposed IBNR reserve feasible least squares estimators. Consider the following hypotheses, always bearing in mind the notation and definitions used in Theorems 2 and 3: (1) (H3) There exist δ > 0 and > 0, such that E(|ε 2 i | 1+δ ) < .
From the runoff triangle standpoint, hypothesis H1 says that the error variances are possibly different between columns and necessarily equal within a given column. Since the number of columns is less than or equal to the number of rows, it is possible to express all the different error variances using a partition suggested in H1: even when n O −→ ∞ (the number of observation in the runoff triangle increases), the partition will remain finite and will describe adequately the variance structure for all the observations. As regards hypothesis H2, it is being stressed that the elements of X O cannot diverge -for the runoff triangle, this is trivially verified since all the explanatory variables considered are dummies. This very '0-1' structure for the regressors also justifies item (i) of this hypothesis. Item (ii) is demonstrated by White [21]. Another important point is that, shall H2 be assumed in the runoff triangle framework, the row effect shall not require additional dummies as the accident years grow in time -said in other words: the number of columns of X O are not increasing with n O . Finally, hypothesis H3 means that the tails of the error probability distributions are not too 'heavy', even for large n O .
The next theorem gives sufficient conditions for the desirable n O -asymptotic equivalence between the theoretical and feasible expressions for the IBNR reserve least squares predictors and their corresponding mean squared errors. The proof is given in Appendix 4 as supplemental online material. For conserving notation, some obvious dependencies on n O shall be omitted.
Theorem 4 Consider the conditions assumed in Corollaries 1(a) and 2 along with hypotheses H1, H2 and H3. Then, ifσ 2 i = (1/m k ) l∈I k e 2 l whenever σ 2 i =σ 2 k for some k, where e l = Y l − x lβ , l ∈ I k , it follows that: Adding the hypothesis of ε ∼ N, it also follows that Items (a), (b) and (c) of Theorem 4 imply that the feasible means squared error from GLS and OLS regressions analyses that assume heteroscedastic errors are 'consistent': these behave asymptotically quite the same way as their theoretical counterparts do. For the OLS case, the White covariance matrix estimator (X O X O ) −1 ( n O l=1 e 2 l x l x l )(X O X O ) −1 (cf. [21], [7, Chapter 6], and [6, Chapter 10]) plays its role for the stochastic convergence just achieved in item (c). Likewise, items (d) and (e) make attainable the construction of asymptotic prediction intervals for any type of IBNR reserve, shall proper residual analyses not give any strong evidence against normality. Once again, the White matrix is doing its job in item (e) along with the Central Limit Theorem and the Slutsky Theorem -see details in Appendix 4.

Simulation results
The results collected in Theorem 4 are purely asymptotic and, therefore, do not give any guarantee that the feasible expressions for IBNR reserve least squares prediction will work properly with real runoff triangles. In fact, as Table 1 correctly suggests, the last columns of any given runoff triangle generally contain only a few data, and when the number of rows equals the number of columns (one could think of this case as the rule rather than the exception), there will be necessarily only one datum in the very last column. As it might be argued on a first glance, these data set limitations, inherent to the problem at hand, may compromise the estimation of the variances from the last columns. However, the really relevant question here is: can this local lack of statistical information contaminate the global estimation of the theoretical mean square error by their feasible expressions in Equations (17) and (18)? In order to address such question and to evaluate the overall performance in finite sample situations of at least some items of Theorem -specifically: those related with the GLS regression analysis -, we have designed and implemented a Monte Carlo experiment using four different possible models with Gaussian errors. The focus was on the understanding of how the magnitude of the variances from the last columns shall be crucial to two different and relevant points: (1) the ability of the final feasible mean squared error suggested in Theorem 4 items (a) and (b) (see the final formula for MSE(Ŷ s FGLS ) in Theorem 4 item (d)) to estimate the real mean squared error given in Equation (12) of Corollary 1; (2) the quality of the Normal standard distribution approximation for the standardized IBNR reserve, suggested in Theorem 4 item (d).
We attempt to answer such queries for runoff triangles with 10 rows and 10 columns (that is: J = 10 in Tables 1 and 2), the same dimension of the real triangles considered in the applications of this paper (Section 6). For each of the four sets of parameters shown in Table 5 (each set defines a specific heteroscedastic linear regression model), we have generated 1000 independent triangles; and, with these latter, analytical and graphical performance measures have been calculated. The results are displayed in Tables 6 and 7 and in Figure 2.   Regarding the performances of the feasible formula of the mean squared error in Theorem 4 items (a) and (b), one can note that the Monte Carlo simulation results related to sets 1 and 2, whose last variances are relatively much lower than the first, are significantly better than those associated with sets 3 and 4. Indeed, following the information in Table 6, it is clear that the average of the 1000 feasible mean squared errors are closer to the corresponding true mean squared error and, in terms of mean absolute percent error, these 1000 Monte Carlo final values tend to deviate much lesser from the corresponding true mean squared error. Now, the usefulness of the asymptotic distribution in Theorem 4 item (d) can be grasped by looking at Table 7 and Figure 2. Despite of the suggestion that the probability distribution of the standardized total IBNR reserves associated with Gaussian runoff triangles with J = 10 tends to have thicker tails than those from the standardized normal distribution, the problem seems to be too less severe for model with small last variances, like those defined by sets 1 and 2.
Fortunately, much of empirical evidence with runoff triangles considered in the literatureincluding those studied in this paper in Section 6 (see footnote 5 of Table 5) -reveals that, regarding magnitude, the values corresponding to the last columns tend to be considerably smaller than the remaining; virtually could the same be said when comparing variances. Therefore, at least regarding total IBNR reserve estimation, the Monte Carlo simulation results suggest that Theorem 4 shall be reliable and therefore useful for runoff triangles having at least 10 rows and at least 10 columns.

Applications
In this section, the methods previously developed will be applied with two real runoff triangles. Essential results will be given, along with the corresponding analyses and interpretations. All the computational implementations have been done in the R language (www.r-project.org), using a Core 2 Duo processor with 2.0 GHz and 3.0 GB RAM.

First example: T1 triangle
The first runoff triangle considered here (T1 triangle, from now on) had been previously studied by Taylor and Ashe [17] and Verrall [18]. It is presented in Table 8. The data unit is millions of dollars.
The application of the chain ladder consists basically on direct use of expressions discussed in Section 4. The resulting goodness-of-fit statistics are disposed in the last column of Table 9. The in-sample measures do not consider the very first column of the T1 triangle in their computations, as the chain ladder does not allow in-sample estimation of the corresponding data, and the out-ofsample measures use the diagonal corresponding to the last calendar year of the triangle, except for its extremes C 10,0 = 344014 and C 1,9 = 67948.
As regards the regression analysis, the very first step is to estimate model (9) using OLS. Since there is no additional clue about the column and row effects in the tail, it has been supposed that the unobserved values to be estimated on the first year of the tail on the left of the T1 triangle has the same column effect of the last development year right before this 'eastern' tail; and, for the value of the first accident year with no delays (j = 0) of the tail on the bottom of the T1 triangle, the row effect of the last accident year is assumed. The remaining cells of the tail effects (see Table 3) are assumed null. Breush-Pagan heteroscedasticity tests (cf. [22,Chapter 8] and [7,Chapter 6]) have been applied using column aggregations as explanatory variables. As can be seen in the third column of Table 9, the null hypothesis of homocedasticity is being rejected for at least one of the tests at a 5% significance level. Therefore, the heteroscedastic model suggested in Corollaries 1(a) and 2 was estimated by feasible GLS. The feasible covariance matrix was diagonal with elements given by the mean of the squared residuals coming from a former OLS estimation, adapting the formula forσ 2 i in Theorem 4 to the runoff triangle framework. This model fitted the data very adequately, the second column of Table 9 has information that supports model basic assumptions: (i) the Breush-Pagan tests and mean and variance for the standardized residuals indicate that residual variance estimation has been appropriately done; and (ii) three normality tests provide no significant evidence against the normality assumption. One can also see that both OLS and feasible GLS estimations are at least slightly superior to the chain ladder in terms of IBNR estimation (see the first six lines of Table 9). Results of F tests show that both column and row effects are statistically significant even at a 1% significance level. Additionally, Akaike and Bayesian information criteria suggest superiority of the heteroscedastic model as compared with the homocedastic one. Table 10 offers complimentary information regarding possible serial dependence/correlation under four different data ordering of the runoff triangle: row-wise, column-wise, diagonal-wise 'from the bottom to the top' (diagonal 1, hereafter), and diagonal-wise 'from the top to the bottom' (diagonal 2, hereafter). The table reports Ljung-Box tests for the original scale standardized residuals and for their squares. As can be seen, there is no statistical evidence from the data suggesting serial dependence under the first three ordering here considered; and, albeit the Ljung-Box test for residuals in their original scale is slightly significant at a 5% significance level under the diagonal 2 ordering, this might have been a result from the very dependence of the residuals rather than a symptom of possible misbehavior of the actual unobserved error terms.
Finally, Tables 11 and 12 show the predicted IBNR reserves using the chain ladder and the least squares expressions of Section 5, respectively. The first point worth noticing is that the mean squared errors corresponding to the OLS estimation and using the White matrix (cf. Theorem 4(e)) are not always larger than their counterparts corresponding to the feasible GLS estimation (cf. Theorem 4(d)), as one can easily see by comparing the third and fifth columns of Table 12. Even though contradicting the fact that, theoretically, the GLS are minimum variance IBNR estimators, the natural explanation is that the feasible expressions for such accuracy measures are subject to sample variation. To alleviate such undesirable finite sample effects, Greene [6,Chapter 11] and Jonhston and DiNardo [7,Chapter 6] suggest corrections for the White matrix and might provide more confidence to the mean squared error calculations. Each of these corrections displayed in the last three columns of Table 12 simply consists of multiplying the White matrix by some inflating constant. For instance, the seventh column of Table 12 gives the feasible OLS mean squared errors using a White matrix previously multiplied by (n/n − p), where n is the number of observation in the runoff triangle and p is the number of coefficients used in the model. By their turn, the last two columns used h ii = x i (X X ) −1 x i and h ii 2 as inflating constants, respectively.
As a second point, the reader is advised to note that the column corresponding to the mean squared error of the OLS estimation without any heteroscedasticity correction device (fifth column of Table 12) is clearly wrong and should neither be interpreted nor used -recall: the homocedastic model has been already discarded by the data.
As regards the IBNR reserves themselves, observe that the estimates from OLS and feasible GLS are both smaller than those obtained by the chain ladder (compare the second column of Table 11 with the second and third columns of Table 12). Since previous analysis indicates moderate evidence against the chain ladder modeling framework as the 'true' IBNR data generator, in a hypothesized situation where the chain ladder has been used for estimating IBNR reserves with the T1 triangle, the insurance company could have probably overtaxed their clients as a consequence from overestimation of IBNR reserves -this might have limited the insurance company to be less competitive in the insurance market.

Second example: AFG triangle
The second runoff to be studied shall be termed the AFG triangle. This has been exhaustively revisited in the literature (cf. [1,5,8,13,14]). The data is presented in Table 13 and the unit is thousands of dollars. The negative value C 2,6 = −103, certainly strange at a first glance, might have been resulted from profits of the insurance company, which means that, instead of paying for the accident, the insurance might have earned 103 milliards of dollars (for a richer discussion on this point, see [5]). In what follows, such value will be considered as missing in the chain ladder and regression analysis calculations.   Table 14 summarizes the results from each of the methods considered in this article. Sticking to the regression analysis standpoint, the row effect is not statistically relevant even at a 10% significance level -see the F tests results in the fourth column of Table 14 -and, therefore, it has been dropped from the model. See also that the chain ladder proved to be quite inferior in terms of in-sample goodness-of-fit, as the first three rows of Table 14 clearly reveal a worst performance for such method. This should be taken as strong evidence against Mack hypotheses for this particular data set. Finally, the heteroscedastic model without the row effect regression terms proved to be the superior approach for adequately describe the data, since: (i) it has the better out-of-sample predictive capability; (ii) it is supplied with excellent standardized residuals (cf. heteroscedastic and normality tests; and sample mean and variance) and (iii) it presented the best Akaike and Bayesian information criteria.
Looking at Table 15, the interpretation of which is entirely analogous to Table 10, one can hardly argue in favor of serial dependence for the standardized residuals from the best of the estimated models (heteroscedastic regression model without the row effect). At the end, Table 16 has the estimated IBNR reserves and the squared root of the theoretical mean squared errors. One can observe that the chain ladder gives the lowest estimates of the IBNR total and calendar year reserves. Since such method has been previously considered as not adequate to the data set at hand, the insurance company that used it to constitute IBNR reserves might have seriously taken the risk of insolvency, as the reserve asset allocation should have been quite underestimated.

Conclusion
This paper has attempted to discuss and improve technology previously available for modeling and predicting IBNR data properly arranged in runoff triangles. Among the addressed issues were both the estimation of the IBNR calendar year reserve and the incorporation of tail effects in the final IBNR reserve estimates. These latter required the derivation of new theoretical results in the stochastic chain ladder theory and in linear regression analysis. To be specific: As regards the chain ladder framework, expressions for the estimator of the IBNR calendar year reserve and for the corresponding mean squared error have been obtained. These have an additional feature as compared with former chain ladder estimators for the total and partials IBNR reserves: for the calendar year reserve, the new chain ladder estimator is the conditional expectation given all the IBNR data available in the runoff triangle, in case of Mack hypotheses fit the IBNR data being analyzed; therefore, it is unbiased for estimating that specific IBNR reserve and has minimum unconditional mean squared error.
(1) In the scope of linear regression, quite general expressions for mean squared errors corresponding IBNR reserve estimation have been derived, including feasible formulae for models with or without heteroscedastic errors. The asymptotic equivalence (in terms of convergence in probability) between the original and feasible expressions has also been derived, discussed and evaluated by a Monte Carlo simulation. (2) Both methodologies have been tested with two real data sets. For both runoff triangles, the linear regression approach showed superior to the chain ladder for estimating IBNR reserves -as empirical measures of performance clearly revealed. Even though being far from sufficient to make one discard the chain ladder as a IBNR estimator for more general situations, such results suggest that, in case of the chain ladder be the sole method used for IBNR reserve estimation, and if the reserve is underestimated, the insurance company might seriously be taking risks of insolvency. On the other hand, overestimation of IBNR reserves implies that clients shall pay overpriced insurance contracts, a probable remedial measure adopted by the insurance company in order to increase its reserve asset allocation.
The closed expressions achieved in this paper shall make possible the development of softwares for modeling and estimating IBNR reserves. It is worth mentioning that the tail effect, at least in which regards its use for one development year beyond the last column of the runoff triangle, does not bring theoretical/computational difficult. Also, the fact that the heteroscedastic behavior usually has the column effect as its main source is would certainly be an aid for planning softwares.
The paper closes by suggesting two specific themes for future research. The first is the derivation of feasible expressions of the mean squared errors for the IBNR reserve estimator, under linear regression modeling, that allow correlation structure for the error term -there was already at least mild evidence supporting this possibility with the first of the two data sets used in this paper. The second is the extension of the tail effect for more than a development year beyond the last column, using information from the insurance company about the types of accident and insurance contract being considered.