Optimal calibrated weights while minimizing a variance function

Abstract The current investigation considers the query of assessment of estimators of population mean through calibration technique. We proposed new multi-variable calibrated estimator of mean in stratified sampling by employing the g multiple auxiliary variables. We introduce new variance function of the study variable in replacement to chi-square distance function under the assumption of known population variance of the study variable by some previous knowledge or past study as in case of Neyman allocation. It has been shown through simulation and numerical studies that the resultant estimators are much proficient than the usual combined mean estimator as well as combined ratio and regression estimators.


Introduction
Calibration, a conventional system used to revamp the survey estimates by using the supplementary information in form of the auxiliary variables. It modifies the indigenous design weights to increase the accuracy of the survey estimates. It uses different distance functions that fulfill a group of minimum restrictions that each distance measure obey in form of calibration constraints to generate a weighting system and producing a new estimator. A family of calibration estimators is generated via this system. The new generated calibration weights, which are nearer to the design weights by respecting a set of restrictions. These constraints state that any statistic of the weighted auxiliary variable must be equal to the known parameters for that auxiliary variable i.e., the outcome calibrated weights must show precise quantities while applying to each auxiliary variable. This steadiness check on a weighting system is used by many researchers.
A most familiar class of calibration methodology is the regression estimation. Under a regression super-population model, Cochran (1940) developed a ratio estimator where the auxiliary variable and the study variable are correlated with each other. Deville and S€ arndal (1992) were the first whom explore the calibration technique in the theory of survey sampling and since many investigators prolonged this field by using different calibration weights. For the combined generalized regression estimator (GREG), Singh (1998) also introduced the calibration approach for estimating the variance of general regression estimator. Alam, Singh, and Shabbir (2020) used the non-linear calibration constraints for estimating the population mean in simple random sampling, stratified sampling and probability proportional to size sampling. Kim, Sungur, and Heo (2007) instigated various calibration approach using the ratio type estimators in stratified sampling. Koyuncu and Kadilar (2016) and Nidhi, Singh, and Singh (2017) developed different estimators in stratified sampling design using calibration methodology. Since auxiliary information plays an important role in getting the precise survey estimates so Alam and Shabbir (2020) introduced the double use of auxiliary information for estimating the population mean in stratified sampling. Rao, Khan, and Khan (2012) launched the multivariate calibration estimation by using two auxiliary variables in stratified sampling. Rao, Khan, and Singh (2018) found the multivariate calibration estimator by using known mean and variance information from the multi-auxiliary variables. In stratified sampling, Koyuncu and Kadilar (2014) established different constraints of the auxiliary variable as well as Koyuncu (2018) proposed an estimator of population mean under ranked set sampling design.
In the current probe, we made an effort to minimize the known variance function as a distance measure. We propose a multivariate calibrated estimator by using the multiauxiliary calibration constraints when population variance r 2 yh is known. A simulation and empirical study is carried out to support the current investigation.
Consider a finite population U ¼ ðU 1 , U 2 , :::, U i , :::, U N Þ: For the i th unit, let y i and x ij be the values of the study variable Y and the auxiliary variable X j , ðj ¼ 1, 2, :::, gÞ for the i th unit. In stratified sampling, the population is divided into non-overlapping new groups for h ¼ 1, 2, :::, K such that P K h¼1 N h ¼ N: We draw a sample of size n h from each stratum such that P K h¼1 n h ¼ n:

Proposed multivariate calibrated estimator in stratified sampling
The ordinary stratified mean estimator is: The new calibrated estimator set forth as: where W Ã hðaÞ are the new calibrated weights gained by considering the g auxiliary variables i.e., X j ðj ¼ 1, 2, :::gÞ, by minimizing a new proposed variance function Here Q h are well chosen weights at the estimation stage and r 2 yh is known from any past survey. As our major interest is to minimize the variance which motivates the authors to develop this function. The variance function given in Equation (2.3) is minimized by using the ðg þ 1Þth calibration restrictions such as: The Lagrange function is formulated as: where k ia ði ¼ 0, 1, :::, gÞ are Lagrangian multipliers. Equating @D a @W Ã hðaÞ ¼ 0, gives (2.9) Substituting Equation (2.9) in Equations (2.4)-(2.7), a system of linear equations is structured as: We find the values of these Lagrange multipliers k 0a , k 1a , :::, k ga by solving the above system of linear equations. Substituting these multipliers in Equation (2.9), we get superlative multivariate calibrated weights W Ã hðaÞ and finally we get the estimator of population mean provided in Equation (2.2). For simplicity, we consider few cases in the following subsections:

Combined ratio type estimator in stratified sampling
If we take only one auxiliary variable i.e., g ¼ 1 as a calibration constraint and ignoring the fundamental constraint of sum of optimal calibrated weights is equal to the sum of ordinary design weights, the new multivariate calibrated estimator given in Equation (2.2) can be rephrased as: where W Ã hð11Þ are new weights produced by minimizing newly defined known variance function of Equation (2.3), rewritten as: where a ¼ 11, i.e., first estimator for case I and Q h are suitably chosen weights to form different types of estimators. The variance function given in Equation (2.15) is minimized by using the calibration constraint: The Lagrange's multiplier function is now defined as: Substituting the optimum calibrated weights of Equation (2.20) in Equation (2.14), we get our first calibrated estimator of the population mean as: which is the combined ratio type estimator of population mean Y in presence of one auxiliary variable.

Combined regression type-I estimator in stratified sampling
If we reconsider the fundamental calibration constraint P K h¼1 W Ã hðaÞ ¼ P K h¼1 W h and g ¼ 1 i.e., a single auxiliary variable, we get combined regression type-I estimator, so the second calibrated estimator of mean can be defined as: where W Ã hð12Þ are model renovated weights produced by minimizing a known variance function where Q h are appropriately selected weights. Minimizing the function D 12 by using two linear constraints, The Lagrange function is formed as: Differentiating D 12 conditional to W Ã hð12Þ and equating to zero, new calibrated weights can be written as: Substituting the value of Equation (2.27) in Equations (2.24) and (2.25), respectively, we get The system of linear equations i.e., Equations (2.28) and (2.29) can be written as: (2.30) Solving the above system of linear equations by Cramer rule, the values of k 012 and k 112 can be shown as: By substituting the values of k 012 and k 112 in Equation (2.27), the optimum calibrated weights W Ã hð12Þ can be obtained as: yh ! 2 2 6 6 6 6 6 4 3 7 7 7 7 7 5 (2.33) Finally, the optimum calibrated weights of Equation (2.33) produces second calibrated estimator as: (2.36)

Combined regression type-II estimator in stratified sampling
From proposed multivariate calibrated system of equations, consider g ¼ 2 i.e., two auxiliary variables, we get regression type-II estimator, so the third calibrated estimator of mean can be defined as: Optimum calibrated weights produced by minimizing a function, Minimizing the function D 13 by using three constraints Equations (2.4)-(2.6), can be redrafted as: The Lagrange function can be written as: Differentiating D 13 with respect to W Ã hð13Þ and equating to zero, we get, For a ¼ 13, the arrangement of Equations (2.10)-(2.12) can be written as: (2.43) Solving the system of equations (Equation (2.43)), we get After substituting the values of these optimum Lagrange multipliers in Equation ð2:42Þ, the appropriate calibrated weights can be mentioned as: (2.47) Our third calibrated estimator of mean can be obtained after putting back Equation (2.47) in Equation (2.37) as: which is the combined regression type II estimator, wherê

Simulation study
A replication inquiry is accomplished to observe the effectiveness of proposed multivariate calibration estimator as compared to other estimators exist in literature. The newly proposed estimators y a is contrasted with the ordinary mean, ratio and regression estimators of stratified sampling. The efficiency of these estimators are recorded in respect of percent relative bias (PRB) and mean square error (MSE), as given below: Â 100 a ¼ 0, 1, 2, 3, 4, 11, 12, 13, (3.1) and Following are the details for simulation study for all three different estimators,

Proposed combined regression type I estimator:
For proposed combined regression type I estimator, we again generated N ¼ 1200 observations in three different strata with sizes N h ¼ 300, 400, 500, h ¼ 1, 2, 3 respectively.

Empirical evidence
To measure the effectiveness of proposed calibrated estimators as compared to the combined ratio, regression and usual mean estimators in stratified sampling, we use two data sets. For ratio and combined regression type I estimators, we consider the data of amount of apple production in 854 villages of Turkey from Kadilar and Cingi (2003). We consider six different regions (Marmara, Agean, Mediterranean, Central Anatolia, Black Sea, East and Southeast Anatolia) as strata with stratum sizes N h ¼ 106, 106,94,171,204,173 respectively in each stratum. The variable of interest as apple production amount and the auxiliary variable as number of apple trees. To investigate various situations, we apply Box and Cox (1964) transformation on both variables of population as: Amount of apple production ð Þ t À 1 t and X i ¼ Number of apple trees ð Þ t À 1 t for different choices of t. For given values of 0:02 t 0:05 from a population of size N ¼ 854, we select 200 random samples of sizes 75 n 100 by using SRSWR.
For combined regression type II estimator, we consider the Abalone data from Asim et al. (2002) with N ¼ 4177 instances, divided into three different strata according to the sex of abalone as female, infant and male. Y is the whole weight (grams) of abalone, X 1 is shell weight (grams) which is the weight of meat and X 2 is the shucked weight (grams) which is measured after being dried. The stratum sizes are N h ¼ 1528, 1307, 1342 respectively for three strata. R ¼ 200 simple random samples with replacement of sizes 3% to 20% are drawn.

Theoretical mean square error
We can write the combined ratio type estimator y c11 as follows: (5.1) and combined regression type-I estimator y c12 is defined as: For simplicity, we let h h ¼ n h Q h r 2 yh and P K h¼1 W h ¼ 1: Let us define the following terms for the derivation of the mean square error of the estimators. : In a similar way one can derive the mean square error of the estimator y c13 :

Results and conclusions
In the current investigation, we proposed three different estimators such as combined ratio type estimator, combined regression type I estimator and combined regression type II estimator as compared with usual stratified mean estimator, combined ratio estimator and combined regression estimators by using single and double auxiliary variables.  Table 1 shows the results of simulated data for the proposed ratio type estimator. The maximum efficiency gained by the proposed estimator is 34.574% with respect to the usual mean estimator and 5.431% as compared to the usual combined ratio estimator. The relative biases of proposed estimator is decreasing by increasing the sample size from n ¼ 60 to 150 and same pattern can be observed for usual combined ratio estimator. Table 2 represents the conclusion about the proposed combined regression type I estimator for simulated data, 8% to 12% of the population size is taken as a sample size and it can be noted that the proposed estimator has maximum efficiency i.e., 245.152% proportional to the usual stratified mean estimator for n ¼ 115, 5.879% gain in efficiency for n ¼ 111 as compared to the usual combined ratio estimator and 4.717% reliant to the usual combined regression estimator for 9% of the population size. The relative biases are negligible and show random pattern due to variations in random sample. Table 3 refers to the precision of the proposed combined regression type II estimator with respect to the usual stratified mean estimator, usual combined ratio estimator and usual combined regression estimator by using two auxiliary variables. Biases exhibit random pattern which is unremarkable for all four estimators. The maximum PRE is 159.380% for n ¼ 200 in case of usual mean estimator where as the minimum gain attained by proposed regression type II estimator is 9.385% parallel to the usual combined regression estimator. As compared to the usual combined ratio estimator, proposed estimator is 33.472% more efficient in case of n ¼ 80. Table 4 based on real data, results sketched from six different regions of Turkey, based on Box-Cox transformation. All the relative biases are negligible due to fluctuations. The maximum  percentage relative efficiency is 276.747% for usual mean estimator and minimum gain in efficiency is 2.393% comparison to the usual combined ratio estimator for t ¼ 0.05 and n ¼ 100.
The maximum gain achieved corresponding to the combined ratio estimator is 85.846% with t ¼ 0.02 for n ¼ 75. Overall proposed combined ratio estimator has more impact than the usual mean and usual combined ratio estimator in case of stratified sampling. Table 5 outcomes the biases and efficiency of proposed regression type I estimator based on real evidence for different values of transformation factor t, ranging from t ¼ 0.02 to t ¼ 0.05. The minimum PRE of proposed estimator is 319.586% and maximum of 406.626% as compared to usual stratified mean estimator, minimum of 149.088% and maximum of 257.438% contrasted to the usual combined ratio estimator, and minimum of 108.427% and maximum of 237.253% for the usual combined regression estimator with the uni-auxiliary variable. Table 6 interprets the effectiveness of proposed combined regression type II estimator for real data set. Again the biases have no much effect and are nondescript but the maximum efficiency of proposed estimator is 1029.688% for n ¼ 460 contrasted with usual mean estimator, 1018.336% for combined ratio estimator and 717.202% for combined regression estimator by using two auxiliary variables. The overall minimum gain in efficiency of proposed type II estimator is 1.743% in case of usual combined regression estimator.
Eventually, we state that the calibration methodology is a best weighting technique of estimation of mean in stratified sampling. The efficiency of three proposed estimators Table 4. Performance of the proposed combined ratio type estimator with associated usual mean and usual combined ratio estimator for single auxiliary variable based on real data. can be observed from simulated as well as from the empirical evidence. All three estimators are more efficient than the usual mean, combined ratio and combined regression estimators in case of multivariate auxiliary information in stratified sampling. To best of our knowledge we are the first one, reporting biases in calibration estimation of mean in stratified sampling. The effect of biases is unremarkable as it exhibits fluctuations by increasing the sample sizes. Introducing a new distance type variance function is  Table 6. Performance of the proposed combined regression type II estimator in regard to the usual mean, combined ratio and combined regression estimator for two auxiliary variables based on real data.
n PRBð y 0 Þ PRBð y 3 Þ PRBð y 4 Þ PRBð y 13 Þ PREð y 0 Þ PREð y 3 Þ PREð y 4 Þ amazing and this idea can be extended when the population variance is unknown in all strata in stratified sampling. Overall efficiency of proposed estimator is stunning and phenomenal.