Generalized class of factor type exponential imputation techniques for population mean using simulation approach

This article introduces some efficient generalized class of factor-type exponential imputation techniques and their corresponding estimators using auxiliary information. Generalized ratio, product, and dual to ratio type exponential estimators are the special cases of our suggested imputation techniques. Biases and mean squared error expressions are derived up to the first order of large sample approximations. The proposed imputation techniques can be viewed as efficient extensions of the work of Singh and Horn [Compromised imputation in survey sampling. Metrika. 2000;51(3):267–276. doi: 10.1007/s001840000054], Singh and Deo [Imputation by power transformation. Statist Papers. 2003;44(4):555–579. doi: 10.1007/BF02926010], Toutenburg and Srivastava [Amputation versus imputation of missing values through ratio method in sample surveys. Statist Papers. 2008;49(2):237–247. doi: 10.1007/s00362-006-0009-4], Kadilar and Cingi [Estimators for the population mean in the case of missing data. Commun Stat Theory Methods. 2008;37(14):2226–2236. doi: 10.1080/03610920701855020], Singh [A new method of imputation in survey sampling. Statistics. 2009;43(5):499–511. doi: 10.1080/02331880802605114], Gira [Estimation of population mean with a new imputation methods. Appl Math Sci. 2015;9(34):1663–1672] and Singh et al. [An improved alternative method of imputation for missing data in survey sampling. J Stat Appl Probab. 2022;11(2):535–543. doi: 10.18576/jsap]. Our proposed estimators are compared with these estimators, including the mean, ratio, and regression imputation techniques. Thereafter, a numerical illustration and simulation study are conducted for a comparative study using real and simulated data sets, and the demonstration shows that our suggested estimators are the most efficient estimators.


Introduction
Non-response is a significant issue in the sample survey when analysing survey data.Survey researchers make every effort to get a response from every member of the sample that was chosen, but eventually, efforts must stop and data analysis must start.There will almost always be some units at this point for which no data have been gathered.This lack of response can have a variety of causes and patterns.It is widely acknowledged that if adequate information about the nature of non-response in the population is not known, inferences about population parameters may be incorrect.The negative consequences of non-response in sample surveys are minimized through the use of different imputation techniques.One method for adjusting the common statistical analytic model is imputation, which is the process of filling in missing data.When it's necessary to fill in any missing values in a survey with certain fabricated values, imputation is typically used.
In order to deal with missing attribute values, Kalton, Kasprzyk, and Santos [1] and Sande [2] presented imputation techniques that structurally complete an incomplete dataset and make its analysis straightforward.If an auxiliary variable is present in the framework, it can be used to implement imputation strategies.For example, in this case, Lee, Rancourt, and Sarndal [3] used relevant information from an auxiliary variable or variables to impute data.Singh and Horn [4] proposed a compromise imputation strategy.Ahamed et al. [5] proposed a number of new imputation-based estimators that make use of data from an auxiliary variate, and Singh et al. [6] proposed an estimator of the population mean that uses imputation techniques throughout sample surveys depending on the auxiliary information and is evaluated by comparing their estimators with both the mean as well as ratio methods of imputation.
The main goal of this work is to suggest a class of imputation techniques and the corresponding estimators.Rubin [7] suggested three methods for missing observations in survey sampling, namely Missing At Random (MAR), Observed At Random (OAR), and Parameter Distinctness (PD).The difference between MAR and Missing Completly At Random (MCAR) was discussed by Heitjan and Basu [8].Since MCAR allows for unbiased analysis, missing completely at random (MCAR) data have been taken into account.Data are referred to as being MCAR if the missing data is unrelated to any variable that has been observed in the dataset.Various authors, including Diana and Perri [9], Bhushan and Pandey [10], Prasad [11], Audu et al. [12], Singh et al. [13] and other worked on the MCAR techniques and suggested different imputation methods for efficient estimation of population mean of the main variable under the situation of missing observations.
In recent years, there has been a boom of research on the development of new imputation approaches for addressing missing data in survey and statistical analysis have been done.A significant body of work by researchers such as Singh et al. [14], Audu et al. [15], Bhushan et al. [16], Singh et al. [17], Shahzad et al. [18], Singh and Usman [19], Bhushan et al. [20], Prasad and Yadav [21], Singh et al. [22], Bhushan et al. [23], and Bhushan and Kumar [24] has made a contribution to the development of robust and efficient estimators to deal with missing data in various sampling scenarios.These studies give significant insights and approaches that increase the accuracy and reliability of statistical estimations when missing data are a prevalent obstacle in practical survey research.
In this study, three different class of estimators are used to estimate the population mean of the study character using the imputation method.These types of estimators make use of the auxiliary variate, which is available to all population units as a whole.To contrast the suggested strategy with the current strategies, a design-based approach is taken into consideration.Studies in both empirical methodology and simulation have been conducted using some real data.
This study presents a new and comprehensive class of factor-type exponential imputation techniques, together with their corresponding estimators, which make use of auxiliary information.To the best of our knowledge, no prior study of this kind has been carried out.Specialized instances such as ratio, product, and dual to ratio type exponential estimators are covered by these methods.This work presents equations for biases and mean squared errors up to the first degree of large sample approximations.Significantly, the approaches proposed in this paper are based on existing research and exhibit superior performance compared to existing estimators, as demonstrated by an in-depth evaluation involving both real and simulated datasets.The present study constitutes a notable progression in the respective academic domain, presenting enhanced strategies for effectively managing instances of missing data.
The remaining parts of the manuscript are structured as: In Section 2, we offered a formulation of existing imputation techniques as well as their point estimators and MSEs.In Section 3, we offered the proposed factor-type exponential imputation techniques for various combinations of auxiliary information.In Section 4, we derive an expression for the bias and MSE of the suggested imputation techniques and discuss their particular cases.In Sections 5 and 6, we conducted a numerical and simulation study considering real and simulated data sets.In Section 7, we presented analysis of tables and in the last Section, 8, we arrived at conclusions.

Existing imputation techniques
Let Z = {Z 1 , Z 2 • • • Z N } be the finite population of size N and v be the character under study.It is assumed that the auxiliary character u (which is the known population) is available at the starting of the survey.A simple random sample (without replacement) of size n is drawn at random from the population.Let r be the number of responding units out of n units, the set of responding units be denoted, by R and that of non-responding units is denoted, by R. For each unit i R, the value v i is observed.However, for each unit i R, the values ṽi are missing and imputed values are derived.We assume that imputation is carried out with the aid of a quantitative auxiliary variate u, such that u i , which is known for each i R. Now, let v .ibe the observation such that: where, ṽi is the imputed value for the i th non-responding unit and by the utlization of above data, the general point estimator under the imputation techniques takes the form: Some classical methods of imputation techniques, which are available and commonly used, are as follows:

Mean imputation techniques
After imputation, data in this technique have the following structure: We have resultant point estimator of V takes the form where, vr = 1 r r i=1 v i And variance of response sample mean vr , is calculated using the following: where vr and C v are the response sample mean and the coefficients of variation having the studied variable v, respectively.

Ratio imputation techniques
Following the notations of Lee et al. [25], in the case of single value imputation, if the i th unit requires imputation, the value bu i is imputed, where b = vr ūr .The study variate after imputation becomes Under the ratio method of imputation, the point estimator of the population mean is given by where, ūr = 1

Regression imputation techniques
Significant progress has been made in the field of sample survey data imputation, particularly in the regression method of imputation.Noteworthy contributions include the alternative estimator proposed by Singh and Horn [26] in 1998, as well as Singh's [27] the basis work in 'Advanced Sampling Theory with Applications' in 2003.These works serve as the basis for the regression-based imputation method.Singh and Deo [28] proposed a method of imputation based on power transformation, which was further enhanced by the contributions of Rueda, Gonzalez, and Arcos [29], Rueda and Gonzalez [30], Singh [6], and Singh, Maurya, Khetan, and Kadilar [31].The field was further advanced by taking advantage of Prasad's [32] ratio exponential type estimators with imputation and the utilization of higher-order moments by Mohamed, Sedory, and Singh [33].This series of improvements shows how data imputation in sample surveys has changed throughout time.Regression imputation method and subsequent estimator is defined by where b = S vu (r) Using this method, the resultant point estimator is given by The MSE of this estimator, is given by

Singh and horn imputation techniques
Singh and Horn [4] proposed the compromised imputation method, where the study character after imputation takes the form where α is a suitable chosen constant, such that the mean square error of the resultant estimator is minimum.In this case the information from imputed values for the responding units is also used in addition to that from non-responding units.Thus, the point estimator of the population mean under the above imputation method becomes Where β 1 is a constant.The optimum value of β 1 is ρ vu C v C u .Now, taking the optimum value β 1 in Equation 13, we obtained the minimum MSE of T 4 , given by

Singh and deo imputation techniques
Singh and Deo [28] proposed the imputation techniques by power transformation in sample surveys, where the study character after imputation takes the form Under this method, the resultant point estimators of the population means becomes Where β 2 is a constant.The optimum value of β 2 is ρ vu C v C u .Now, taking the optimum value β 2 in Equation ( 16), we obtained the minimum MSE of T 5 , given by

Kadilar and cingi imputation techniques
Kadilar and Cingi [34] have developed three estimators under non-response, are and where b is regression coefficient estimated using least square method.We obtained the MSE of T 6 , T 7 and T 8 given as (23) where R = V Ū and B = S uv S 2 u

Toutenburg et al. imputation techniques
We discuss two estimators proposed by the Toutenburg et al. [35] for the population mean V, are given by T 10 = vr + rv r n ūn ( ūn − ūr ) (25) thus, the estimators T 9 is same as the ratio estimator.The MSEs of estimators T 9 and T 10 are obtained as

Singh imputation techniques
Singh [6] suggested a different imputation method and there corresponding estimator as where γ is a constant.The optimum value of γ is ρ vu C v C u .The minimum MSE of the estimator T 11 , given by

Gira imputation techniques
Gira [36] suggested a ratio type imputation method and there corresponding estimator as where φ is a constant.The optimum value of . Now, the minimum MSE of the estimator T 12 , given by

Singh et al. imputation techniques
Singh et al. [13] proposed the new method of imputation to estimate the population mean.Under this proposed method of imputation, the resultant point estimators of the population mean V becomes where m( = 0) is some real number.The optimum value of m is . Now, utlizing this optimum value of m, we get the mimimum of MSE(T 13 ), given by

Proposed imputation techniques and corresponding estimators
Following Shukla and Thakur [37] and motivated by the work of Prasad [11], we proposed three strategies of imputation techniques such that the resultant point estimators belong to the class of estimators for estimating the population mean V.
Strategy I: When Ū and ūr are used, the first proposed method of imputation take the form: The resultant point estimator (2) of the population mean V under the first proposed method of imputation is given by where, ) is real constant and α are suitably choosen constants, such that the MSE of the our proposed class of estimators is minimum, a = 0 and b are either some constants or funcation of populations parameters of the known auxiliary variable u, namely standard deviation (S u ), coefficients of kurtosis (β 2(u) ), coefficient of variation (C u ) and correlation coefficients (ρ vu ).For suitable values of (a = 0, b), some members of proposed estimators are given in Table A16.
Particular cases : It is mentioned that which is the generalized ratio type exponential estimator.For suitable values of (a = 0, b), some members of proposed estimators T R g1 are given in Table A17.
(ii) When δ = 2 in the values of P, Q and R, T * g1 becomes which is the generalized product type exponential estimator.For suitable values of (a = 0, b), some members of proposed estimators T P g1 are given in Table A17.(iii) When δ = 3 in the values of P, Q and R, T * g1 becomes which is the generalized dual to ratio type exponential estimator.For suitable values of (a = 0, b), some members of proposed estimators T D g1 are given in Table A18.(iv) When δ = 4 in the values of P, Q and R, T * g1 becomes which is the generalized ratio type exponential estimator and similar to that suggested by Prasad [11].For suitable values of (a = 0, b), some members of proposed estimators T E g1 are given in Table A18.
Strategy II: When Ū and ūn are used, the secand proposed method of imputation take the form: where, The resultant point estimator (2) of the population mean V under the secand proposed method of imputation is given by For suitable values of (a = 0, b), some members of proposed estimators T * g2 are given in Table A19.
Particular cases : It is mentioned that which is the generalized ratio type exponential estimator.For suitable values of (a = 0, b), some members of proposed estimators T R g2 are given in Table A20.(ii) When δ = 2 in the values of P, Q and R, T * g2 becomes which is the generalized product type exponential estimator.For suitable values of (a = 0, b), some members of proposed estimators T P g2 are given in Table A20.(iii) When δ = 3 in the values of P, Q and R, T * g2 becomes which is the generalized dual to ratio type exponential estimator.For suitable values of (a = 0, b), some members of proposed estimators T D g2 are given in Table A21.(iv) When δ = 4 in the values of P, Q and R, T * g2 becomes which is the generalized ratio type exponential estimator and similar to that suggested by Prasad [11].For suitable values of (a = 0, b), some members of proposed estimators T E g2 are given in Table A21.
Strategy III: When ūn and ūr are used, the third proposed method of imputation take the form: where, The resultant point estimator (2) of the population mean V under the third proposed method of imputation is given by which is the generalized ratio type exponential estimator.For suitable values of (a = 0, b), some members of proposed estimators T E g3 are given in Table A24.

Properties of the proposed imputation techniques T * g1 , T * g2 and T * g3
To obtain the bias and MSE of imputed estimator, we write Hence we have v2 are the population coefficents of varaiation of auxiliary and study variables, respectively, ρ vu = S vu s v s u is coefficients of correlation between the auxiliary and study variable and S 2 v , S 2 u and S vu are the population mean squares and population covariance between V and U, respectively.

Theorem 4.1:
The estimator T * g1 in terms of e 1 and e 2 , up to first order of approximation, could be expressed as: where where ξ = φ 1 − φ 2 (1) Bias of T * g1 up to first order of approximation, could be expressed as: (2) MSE of T * g1 up to first order of approximation, could be expressed as: We can obtain the optimal values of α by differentiating Equation ( 44) with respect to α and equating them to zero, we get Which is similar to the MSE of Simple linear regression estimator.

Properties of the particular cases of the proposed imputation techniques T
Bias of the estimator T R g1 : MSE of the estimator T R g1 : Optimal values of α by differentiating equation ( 48) with respect to α and equating them to zero, we get Minimun MSE of the estimator T R g1 is given as

in the values of P, Q and R, T * g1 becomes T P g1
Bias of the estimator T P g1 : MSE of the estimator T P g1 : Optimal values of α by differentiating equation ( 52) with respect to α and equating them to zero, we get Minimun MSE of the estimator T P g1 is given as Bias of the estimator T D g1 : MSE of the estimator T D g1 : Optimal values of α by differentiating equation ( 56) with respect to α and equating them to zero, we get Minimun MSE of the estimator T D g1 is given as: Bias of the estimator T E g1 : MSE of the estimator T E g1 : Similarly,

Theorem 4.2:
The estimator T * g2 in terms of e 1 and e 3 up to first order of approximation, could be expressed as: (1) Bias of T * g2 up to first order of approximation, could be expressed as: (2) MSE of T * g2 up to first order of approximation, could be expressed as: We can obtain the optimal values of α by differentiating equation (63) with respect to α and equating them to zero, we get

Properties of the particular cases of the proposed imputation techniques T
Bias of the estimator T R g2 : MSE of the estimator T R g2 : Optimal values of α by differentiating equation ( 67) with respect to α and equating them to zero, we get Minimun MSE of the estimator T R g2 is given as

in the values of P, Q and R, T * g2 becomes T P g2
Bias of the estimator T P g2 : MSE of the estimator T P g2 : Optimal values of α by differentiating equation (71) with respect to α and equating them to zero, we get Minimun MSE of the estimator T P g2 is given as Bias of the estimator T D g2 : MSE of the estimator T D g2 : Optimal values of α by differentiating equation ( 75) with respect to α and equating them to zero, we get Minimun MSE of the estimator T D g2 is given as Bias of the estimator T E g2 : MSE of the estimator T E g2 : Theorem 4.3: The estimator T * g3 in terms of e 1 , e 2 , and e 3 up to first order of approximation, could be expressed as: (1) Bias of T * g3 up to first order of approximation, could be expressed as: (2) MSE of T * g3 up to first order of approximation, could be expressed as: (3) We can obtain the optimal values of α by differentiating equation (82) with respect to α and equating them to zero, we get

Properties of the particular cases of the proposed imputation techniques T
Bias of the estimator T * g3 : MSE of the estimator T R g3 : Optimal values of α by differentiating equation ( 86) with respect to α and equating them to zero, we get Minimun MSE of the estimator T R g3 is given as Bias of the estimator T P g3 : MSE of the estimator T P g3 : Optimal values of α by differentiating equation (90) with respect to α and equating them to zero, we get Minimun MSE of the estimator T P g3 is given as Bias of the estimator T D g3 : MSE of the estimator T D g3 : Optimal values of α by differentiating equation (94) with respect to α and equating them to zero, we get Minimun MSE of the estimator T D g3 is given as Bias of the estimator T E g3 : [Source: [37]] [Source: [38]] [Source: [39]] [Source: [39]]

Numerical illustration
To examine the performance of the proposed estimators T * g(i) , (i = 1, 2, 3.) with respect to other existing Mean (T 1 ), Ratio (T 2 ), Regression (T 3 ), Singh and Horn (T 4 ), Singh  Four different data sets are considered for numerical illustration to show the application of our study.The parameters of the considerd datasets are presented in Table 1 and discriptions are given below.
Data-set: A We have used the data of Shukla and Thakur [37] having population sizes N = 200 and different sample sizes n = 66, 72, 80 with different response rate (r).
Data-set: B We are considering COVID-19 deaths data in India.COVID-19 data were retrieved from WHO websites (download link: https://covid19.who.int/WHO-COVID-19-global-data.csv)[38].A total of 943 days' data (from the period of 01-February-2020 to 31-August-2022) were taken to examine the impact of mortality in India.When the correlation coefficients between new cases and new deaths in the COVID-19 data are calculated, we can see that these two variables have a strong correlation, with ρ vu = 0.7393576.
Data-set: C is considered from the apple fruit data used by Kadilar and Cingi [39], where, We take into account 477 villages in 1999, categorising them as (1: Marmarian), (2: Agean), (3: Mediterranean), and (4: Central Anatolia).X = Number of apple trees in 1999.Y = level of apple production in 1999.Data-set: D is considered from the apple fruit data used by Kadilar and Cingi [39], where, X = Level of apple producation in 1998.Y = Level of apple production in 1999.
Here, we have considerd the sample sizes (n) between 33% to 40% and the response rate (r) between the 60% to 92% with different correlation coefficient.

Simulation study
In this section, we carried out a simulation study with the R [40] statistical computing software.For this manuscript, we have generated data sets by using the function 'genCorGen' available in the package 'simstudy' [41].For the study and auxiliary variables, we generated data sets from the Normal distribution with given parameters and correlation coefficients.
The model under which the populations are generated for the normal distribution is given below: To have more enhancement and clear results we have generated Population of size 5000 using the same parameters of the real population Apple data (i.e.similar to Data set-D) utlized by Kadilar and Cingi [39] as: The simulation study is carried out similarly to that of Singh et al. (see [42]), and we consider only one combination for simulation.(Forexample, a = β 2 and b = ρ vu ).
The following steps have been used for the simulation of the required population: Step-1 Using the statistical computation software R [40], a data set with the normal distribution of variables U and V of size N = 5000 is generated using the 'genCorGen' function.
Step-2 For this artificial population of size N = 5000, the parameters have been calculated.
Step-4 Sample statistics that is sample mean, sample variance and the values of the introduced and competing estimators are calculated for this sample under imputation techniques.
Step-6 The MSE of every estimator is calculated through the formula, MSE(T * g Step-7 The PRE of the each estimators are calculated using the formula described in previous section Numerical illustartion. The PRE of the suggested estimators T * g(i) , (i = 1, 2, 3.) with respect to other existing estimators T j , (j = 1, 2, 3, . . .13.) for simulation studies are presented in Tables A13-A15.

Analysis of tables
The following interpretations is drawn from Tables A1-A15.
Dataset-A (a) From Table A1, the PRE i in the precision of considered estimators T * g1 with respect to the other suggested estimators T i , (i = 1, 2, 3 • • • 13.) remains between 263.26% to 650.24% for the sample sizes 33% to 40% and resposnse rates lies between 60.60% to 91.25%.(b) From Table A2, the PRE i in the precision of considered estimators T * g2 with respect to the other suggested estimators T i , (i = 1, 2, 3 • • • 13.) remains between 101.88% to 458.64% for the sample sizes 33% to 40% and resposnse rates ranging between 60.60% to 91.25%.(c) From Table A3, the PRE i in the precision of considered estimators T * g3 with respect to the other suggested estimators T i , (i = 1, 2, 3 • • • 13.) ranging between 100% to 246.98% for the sample sizes 33% to 40% and resposnse rates lies between 60.60% to 91.25%.
It has been found that the suggested estimators have the highest PRE for all real and simulated populations.These results are presented in Tables A1-A12 for real and in Tables A13-A15 for simulated populations, respectively.As a result, it is clear that the introduced estimators are the most efficient among the class of all V estimators competing under the imputation techniques.

Conclussion
In this manuscript, we present three new classes of estimators of V under imputation techniques.The bias and the MSE of the introduced family of estimators have been derived.We have shown that our proposed estimators T * g (1) , T * g (2) and T * g (3) are more efficient than the mean, ratio, regression, Singh and Horn [4], Singh and Deo [28], Toutenburg et al. [35], Kadilar and Cingi [34], Singh [6], Gira [36] and Singh et al. [13] estimators.
The particular cases of our proposed estimators T R g (1) , T P g (1) , T D g (1) , T E g (1) , T R g (2) , T P g(2) , T D g (2) , T E g (2) , T R g (3) , T P g (3) , T D g (3) and T E g (3) for different valuse of (a, b) are presented in Tables A16-A24 and also more efficient than existing estimators considered in this article.
As the proposed estimators are the most efficient, they are therefore recommended for use in different areas of applications, including agricultural sciences, biological sciences, commerce, engineering, economics, fisheries, medical science, social science, and others.

Table 1 .
Discriptions of data sets.

Table A16 .
Some members of the proposed class of estimators T * g1 .