Identifiability of parameters in longitudinal correlated Poisson and inflated beta regression model with non-ignorable missing mechanism

ABSTRACT The identifiability of a statistical model is an essential and necessary property. When a model is not identifiable, even an infinite number of observations cannot determine the true parameter. Non-identifiablity problem in generalized linear models with and without random effects is very common. Also it can occur in such models when the response variable has non-ignorably missing. Since the structure of the beta regression model is similar to that of the generalized linear models and identifiability of many commonly used models such as the beta regression model has not been investigated in the literature, we establish a study about identifiability of some types of the beta regression models such as beta regression model with non-ignorable missing mechanism, zero and one inflated beta regression model, zero and one inflated beta regression model with non-ignorable missing mechanism, longitudinal beta regression model, longitudinal zero and one inflated beta regression model, longitudinal zero and one inflated beta regression model with non-ignorable missing mechanism, and longitudinal correlated bivariate Poisson and zero and one inflated beta regression model with non-ignorable missing mechanism. We construct estimators for the parameters in all mentioned models based on the EM algorithm and the likelihood-based approach. Simulation results and two applications of the Facebook network and FBI datasets are also presented.


Introduction
Identifiability is a property of the model, not of an estimator or estimation procedure [1]. A model is identifiable if the true parameter can be found by the infinite number of observations (by the strong law of large numbers). The following definition is given by [2].
If the model is not identifiable, it is not reasonable to estimate the parameters and doing inference by it since if θ 1 = θ 2 and θ 1 , θ 2 ∈ but P θ 1 = P θ 2 , then it is impossible to distinguish two vector of parameters and knowing the true value of θ . Identifiability is deeply related to consistency and estimability property such that it is a necessary condition for the existence of a consistent and asymptotically unbiased estimate [3]. Consequently, the identifiability problem received a lot of attention recently. Wang [4] proposed the identifiability of the covariance parameters in the linear mixed effects models. He considered some probable problems in the software outputs of the non-identifiable models such as non-convergence of the numerical algorithms, zero or extremely large standard errors, failure of confidence interval construction and unreasonably wide confidence interval bounds, unreasonably estimations, and so on. On the other hand, statistical software, after fitting non-identifiable models, do not usually indicate the problem and show invalid outputs. So, it is necessary to check the model identifiability before fitting. Bahrami Samani [5] derived conditions for the identifiability of the covariance parameters in the latent random effect model for the mixed correlated continuous and ordinal longitudinal responses. Another situation in which the identifiability is often problematic is the model with non-ignorable missing mechanism. Miao et al. [6] focused on the identifiability of normal, normal mixture, and t mixture models under missing not at random (MNAR) mechanism. The work of Miao et al. [6] was extended to the exponential family by [7].
Several researchers from different scientific fields are interested in investigating the effect of some independent variables on a response variable which assume values in the open unit interval (0, 1) called proportions. Some examples of proportions include the proportion of household income spent on food, the proportion of crude oil converted to gasoline after distillation, the proportion of sleep hours in a week, etc. There are various classes of regression models for proportions which have been introduced in the literature such as beta regression model [8,9]. In the beta regression model, a suitable parametrization of the beta distribution in terms of its mean (μ) and a precision parameter (φ) has been applied as follows: where 0 < μ < 1, φ > 0, and (.) is the gamma function. Note that f BE (.; μ, φ) approaches infinity at values 0 or 1 for some values of μ and φ. So some authors exclude the end points of the support of beta distribution and some other authors include them since it is consistent with the definition of the probability distribution function [10]. Sometimes in real data, there are variables which assume values in the unit interval [0, 1], (0, 1], or [0, 1). In other words, they contain zeros or ones (called fraction). Some examples of fractions include the fraction of murders involving firearms in each counties of a specific country, the fraction of deaths caused by traffic accidents, the percentage of qualified nurses in a test, etc. Now, if one wants to apply beta regression model for fractions, the assumption that the support of beta distribution is [0, 1] should be applied. One solution for overcoming the problem that f BE (.; μ, φ) approaches infinity at zeros or ones is using some linear transformations such as y new = ( /2) + ((1 − )/2)y for a small > 0 [11] and y new = ((n − 1)y + o.5)/n [12], where n is the effective sample size and y ∈ [0, 1].
In a different method, one can replace 0 by and 1 by 1 − , leaving the other observations unchanged [11]. These methods have some shortcoming which can be found in [13]. Another way to model the data contain zeros or ones is inflated models. Ospina and Ferrari [14] supposed that the support of beta distribution is (0, 1) and then presented a zero-or-one-inflated beta regression model using the mixture of a beta distribution and a degenerate distribution at the point zero or one. Note that the word inflation is used to emphasize that the probability mass at some points exceed that allowed under a standard parametric family of distributions [15]. Therefore, with the assumption that the support of beta distribution is [0, 1], one can check the validation of assumption of the beta distribution as a goodness of fit by some tests like the test introduced by Papadopoulos and Li [16]. Then if the beta distribution did not provide a satisfactory description of the data and the number of zeros and/or ones was notable, a mixture distribution of beta and a degenerated at zero or one or Bernoulli distribution may be a suitable model for the data. In such situation that there are observations which can be generated from both distributions, the EM algorithm is an appropriate method for finding the maximum likelihood estimates of the model parameters [17].
In this paper, we address the problem of identifiability when the response variable is assumed to take values in the unit interval. Because there has been some research in the identifiability area of generalized linear models with and without random effects, and the structure of the beta regression model is similar to that of the generalized linear models, we focus on different types of the beta regression models. Also, since there is some literature considering non-identifiability problem in some statistical models with responses non-ignorably missing, we discuss the question of model identifiability within the context of some types of the beta regression models when the response variable is non-ignorably missing. Also, we think that whether the models under study are of interest to a general statistician or not, the techniques for establishing identifiability used here can be applied to other statistical models. When the parameters of these models are identifiable, we estimate the parameters of interest under both assumptions about the support of the beta distribution by the EM algorithm and full likelihood-based approach. In addition, two real data applications from Facebook network and FBI are presented.
The rest of this paper is organized as follows: we review some most applicable types of beta regression models as the target models in Section 2. The sufficient conditions for the identifiability of the target models are established in Section 3. In Section 4, the finite sample properties of the target models via a series of simulations are evaluated. The simulation results show advantages of these models for fitting complex data. The results of applying zero and one inflated beta regression to model the responsiveness rate [18] in Facebook are presented in Section 5. Also in Section 5, we investigate the effects of poverty and education stand on the fraction of all crimes involving violent crimes (murder and non-negligent manslaughter, rape, robbery, and aggravated assault) in the counties of the unite states during 2015-2016 by using a joint model of longitudinal correlated Poisson and zero and one inflated beta responses with non-ignorable missing values. The proofs of the identifiability theorems for two main models are presented in the appendix. We relegate all the technical details containing the likelihood functions of the mentioned models, proofs, and the estimation procedure of all models to the supplementary materials.

Target models: some types of beta regression models
Here, we review several types of beta regression models which are the most applicable in the literature. First, consider an advanced type of a cross-sectional beta regression model as follows: (1) Inflated beta regression model with non-ignorable missing values Suppose Y 1 , . . . , Y I be independent random variables, where each Y i for i = 1, . . . , I follows zero and one inflated beta distribution with parameters α i , λ i , μ i , and φ [18]. Also, suppose that the response variable, Y i , is non-ignorably missing. To manage the missing mechanism issue, we use the missingness indicator R i such that R i is the missing indicator of Y i with R i = 1 indicating Y i is observed. This model can be written as where representing the whole set of K covariates is completely observed covariate vector and θ = (γ α , γ λ , γ μ , γ η , β, φ) is unknown parameter vector in the model.
Two important submodels of the model (2) that their identifiability issues have not been investigated in the literature are: (a) beta regression model with non-ignorable missing mechanism in which Y i |x i ∼ Beta(φμ i , φ(1 − μ i )) and R i |y i , x i ∼ Bernoulli(η i ) with parameter vector (γ μ , γ η , β, φ) and (b) zero and one inflated beta regression model in is completely observed and the parameter vector is (γ α , γ λ , γ μ , φ) .
(2) Longitudinal joint modelling of Poisson and Inflated beta responses with nonignorable missing values . . , Y iT ) be the longitudinal Poisson and {0, 1}inflated beta response vectors, respectively. Also N i and the covariate vectors are completely recorded for all units (individuals) while Y i are not recorded for some units in each time t. Also, R it denotes the missing indicator of Y it . For analysing the correlated multivariate Y i and N i responses with non-ignorable missing mechanism, the joint model is , . . . , U P iT ) denotes the whole set of P covariates related to the distribution of R it |y it , u it . Also, the vector of the unknown parameter is θ = (θ (y,r) , θ n , θ (y,r)n ) in which θ (y,r) = (γ α , γ λ , γ μ , γ η , φ, β) is the vector of the parameters just belonging to the separate models for (Y i , R i ) responses, θ n = γ ν is the vector of the parameters just belonging to the separate model for N i response, and θ (y,r)n = σ is the parameter belonging to the both separate models. Note that the two separate models for (Y i , R i ) responses and N i responses for i = 1, . . . , I and t = 1, . . . , T are as follows, respectively: To build a longitudinal joint model of Poisson and zero and one inflated beta responses with non-ignorable missing values one can use different parametrized covariance matrix structures for the random effects. As an example, consider the different form of model (3) as follows: i are independent. Again, three important submodels of the model (3) that their identifiability issues have not been yet investigated in the literature are: (a) longitudinal beta regression model in which Y it |b i ∼ Beta(φμ it , φ(1 − μ it )) is completely observed and the parameter vector is (γ μ , φ, σ ) , (b) longitudinal zero and one inflated beta regression model in is completely observed and the parameter vector is (γ α , γ λ , γ μ , φ, σ ) , and (c) the separate models for (Y i , R i ) responses with the vector of unknown parameters (γ α , γ λ , γ μ , γ η , φ, β, σ ) .

Identifiability
In this section, we present some theorems showing that the identifiability in the target models can be guaranteed under some mild conditions. (1) The vector of coefficients γ α , γ λ , γ μ , γ η , and γ ν do not contain any intercept.
(2) All the covariates X k it for k = 1, 2, . . . , K and U p it for p = 1, 2, . . . , P take all values in S X k and S U p , respectively, where S X k ⊆ R contains at least one interval and zero and S U p ⊆ R contains zero.
The proofs of two main Theorems 3.1 and 3.2 are relegated to the appendix. The main idea in the proof of identifiability of the model (3) parameters with different parametrized covariance matrix structures for the random effects is the same as the main idea given in the proof of identifiability of θ in model (3). In the following, the sufficient conditions for the identifiability of the mentioned submodels of the models (2) and (3) can be found. The following theorems are some steps on the path to proving two Theorems 3.1 and 3.2.

Theorem 3.3: The parameter vector in a beta regression model with non-ignorable missing mechanism is identifiable, if there exists at least one continuous covariate taking all values in
, where x L is known and x L ∈ R or real line, without loss of generality, denoted by X i and the sign of γ η 1 and β are known.

Theorem 3.5: The parameter vector in a longitudinal beta regression is identifiable, provided the following two conditions are fulfilled:
(1) The vector of coefficients γ μ does not contain an intercept.
(2) All the covariates X k it for k = 1, 2, . . . , K take all values in S X k , where S X k ⊆ R contains at least one interval and zero.

Theorem 3.6: The parameter vector in a longitudinal zero and one inflated beta regression model is identifiable, provided the following two conditions are fulfilled:
(1) The vector of coefficients γ α , γ λ , and γ μ do not contain any intercept.
(2) All the covariates X k it for k = 1, 2, . . . , K take all values in S X k , where S X k ⊆ R contains at least one interval and zero.

Theorem 3.7:
The parameter vector in a in longitudinal zero and one inflated beta regression model with non-ignorable missing mechanism is identifiable, provided the following two conditions are fulfilled: (1) The vector of coefficients γ α , γ λ , γ μ , and γ η do not contain any intercept.
(2) All the covariates X k it for k = 1, 2, . . . , K and U p it for p = 1, 2, . . . , P take all values in S X k and S U p , respectively, where S X k ⊆ R contains at least one interval and zero and S U p ⊆ R contains zero.
The proofs of the Theorems 3.3, 3.4, 3.5, 3.6, and 3.7 are provided in the supplementary materials.

Simulation studies
Here, we report the main simulation results of the proposed estimates (via the EM algorithm) proposed in the supplementary materials for the models (2) and (3). All the simulation results of the submodels discussed in Section 2 are summarized in Table 1. Estimate (Est.), standard error (S.E.), lower and upper bound (L.B. and U.B., respectively), coverage rate (C.R.), Relative biases (R.B.), and mean squared errors (MSE) related to θ in the model (2) to perform these simulation studies. In all data generating models, we use one covariate. So, the covariate vector X i has one element denoted by X i in the model (2). In the model 3, we let K = P = 1 and T = 2. So, the covariate vectors X it and U it have one element denoted by X it and U it , respectively. The covariates X i , (X i1 , X i2 ) , and (U i1 , U i2 ) are designed to take values in R, R 2 , and R 2 and follows the normal distribution such that X i iid ∼ N(0, 1), The data generating models are identifiable by the theorems in Section 2. We use the stopping criteria [| log L(θ (p+1) ) − log L(θ (p) )|]/| log L(θ (p+1) )| < τ, where log L(.) is the incomplete-data log-likelihood. The "solnp" function of "Rsolnp" package in software "R" is used to maximize Q(θ|θ (p) ) in terms of parameters. Also, "fdHess" function Table 2. Estimate (Est.), standard error (S.E.), lower and upper bound (L.B. and U.B., respectively), and coverage rate (C.R.) related to θ in the model (3). is used to gain observed information matrix in the M-step. The results are summarized in Tables 1 and 2. Note that we prefer to use the assumption that the support of beta distribution is (0, 1). Clearly, The estimates given by all mentioned models result very close to the true values. As a standard procedure for calculating confidence interval, we use Wald method and adjusted Wald method to compute confidence interval for the parameters which have range of (−∞, +∞) and [0, +∞), respectively. Note that if the lower Wald confidence limit of a parameter which has range of [0, +∞) is obtained negative, the lower adjusted Wald confidence limit will be set zero. The 95% confidence intervals for the model parameters are calculated for two effective sample sizes 100 and 500. We highlight in bold the parameters and give their 95% confidence intervals which contain the true values. As can be seen from Tables 1 and 2, almost all the coverage rates of the confidence intervals are close to the nominal level 95%. Also, they show the bigger the sample size, the smaller the MSE which indicates the consistency property of MLEs.

Real data analysis
We provide two real examples of the beta regression models (2) and (3) described in Section 2 using data from Facebook network and FBI. The first example shows a complete data form of the model (2) [the zero and one inflated beta regression model as a submodel of the model (2)] applied to responsiveness rates in the social networks collected in an study by Viswanath et al. [19]. It demonstrates the performance of a finite mixture of betas in a relatively simple setting. The second example is an almost complicated analysis of a mixed data from FBI. The incomplete data model (3) is applied for fitting to this dataset.

Complete data example: facebook new orleans network
The following example is used primarily to demonstrate the application of one type of beta regression models in social network data sets. It contains an application of the zero and one inflated beta regression model (complete data form of the model (2)) to real data. Viswanath et al. [19] focused on the New Orleans regional network in Facebook for a study. The dataset was released here: (http://socialnetworks.mpi-sws.org/data-wosn2009.html).  Based on the definition of the responsiveness rate in social networks in [18], the response variable Y i is the responsiveness rate of the ith pair of users for i = 1, 2, . . . , 840. For an example, the first pair is (45, 30). So, y 1 is the rate of the 45th node's responsiveness to the 30th node. Also, we calculated how long every distinct pair of users (840 distinct pairs of users) have been friends on Facebook till Tuesday, 23 December 2008, 03:06:45. Let L i be the length of acquaintance of the ithe pair for i = 1, 2,.., 840. According to max(l i ) and min(l i ) for i = 1, 2,.., 840, The longest standing Facebook friends in the dataset have been friends for 72434123 seconds (2 years, 3 months, 16 days, 13 hours, 29 minutes, and 41 seconds) and the earliest Facebook friends in the dataset have been friends for 5484 s (1 h, 31 min, and 24 s). We convert seconds (L i ) to months (X i ) and use X i as a explanatory variable in the study. The main purpose is to investigate the effect of the length of acquaintance on the responsiveness rate [17,18] in Facebook.
The frequency histogram of the response variable is presented in Figure 1. The vertical bars at zero and one represent that the proportion of zero and one values for the response variable in the sample is equal to 0.32 and 0.61, respectively. It seems that the distribution of the data is an 'U-shaped' distribution.
The beta distribution may be an appropriate distribution for fitting to the data. The Kolmogorov-Smirnov test for the validation of assumption of the beta distribution as a goodness of fit test is appropriate [20]. To implement this test, It needs to estimate the beta parameters. they can be estimated, using the method of moments, with the first two moments (sample mean and sample variance). Since ties should not be present for the Kolmogorov-Smirnov test, here, exact p-values are not available. So, We prefer to use another approach to goodness of fit test based on method of moments described by [16]. The test statistic under H 0 , the sample Y 1 , . . . , Y 840 are from a beta distribution, is ). It follows from Theorem 1 in [16] that under H 0 , T a ∼ N(0, 1). The observed value of the test statistic is 11.38. So, |T| > z 1−0.05/2 and this leads to reject H 0 . Finally, according to Figure 1 (the percentage of zeros and ones in the dataset is 92%) and the mentioned test, a zero and one inflated beta distribution may be a suitable model for the data. The model is specified as complete data form of the model (2) with one explanatory variable X i (X i = (1, X i ) ). In the optimization algorithm, we need the specification of initial values, . We determine them in two steps. In the first step, we fit three separate models as follow: (1) A beta regression for individuals with 0 < y i < 1. It gives us some estimates for γ μ 0 , γ μ 1 , and φ. (2) A binomial regression for individuals with y i = 0 or y i = 1. It gives us some estimates for γ λ 0 and γ λ 1 .
(3) A binomial regression for Z i . It gives us some estimates for γ α 0 and γ α 1 . In the second step, the method of moments has been applied to find θ (0) with the first seven raw moments. Note that E( Replacingx with x i in α i , λ i , and μ i , It is a system of seven non-linear equations with respect to θ (0) . The function "nleqslv" in software "R" is used to solve this system of non-linear equations with either a full Newton method. We use the founded initial values in the previous step as the first argument in the function "nleqslv" (the first argument is an initial guess of the root of the equations). The θ (0) is (2.48510, −0.00037, 0.59863, 0.00193, 0.16312, −0.00169, 11.50222). The results of using the zero and one inflated beta regression model are presented in Table 3.
The parameters γ α 1 , γ λ 1 ,, and γ μ 1 are not significant at 0.05 level. So, Based on this dataset, the length of acquaintance does not affect the responsiveness rate in Facebook.

Incomplete data example: crime in US (FBI dataset)
The second application uses a dataset of the crime in US in 2015−2016. The FBI's Uniform Crime Reporting (UCR) Program collects the number of the offences that come to the attention of law enforcement for violent crime and property crime. Violent crimes are defined in the UCR Program as those offences that involve  Figure 2 shows that the histograms have the "U" shapes in the presence of zeros and ones. In this example, we consider the association between the percentage of violent crimes, Y it , and the number of police officers, N it (as the count response, available at https://ucr.fbi.gov/crime-in-the-u.s/2015/crime-in-the-u.s.-2015/police-employee-data/ police-employee-data or https://data.world/), in each counties at 2015 and 2016. Figure 3 illustrates some features of the number of police officers data.
Also, we consider as covariates: • the percentage of people in poverty living in each counties of the U.S. at 2015 and 2016 (X i1 and X i2 ) with sample means 0.08 and 0.04, respectively, • the percentage of adults with a bachelor's degree or higher in each counties of the U.S. during 2013−2017 (U i ) with sample mean 21.38.
These reports are available at https://www.census.gov/programs-surveys/saipe/ data/datasets.All.html (or https://data.world/dconc/2003-2016-nc-county-level-povertymedian-household-income). In order to do a joint analysis, we consider the longitudinal joint model of Poisson and zero and one inflated beta responses with non-ignorable missing values (3) for Y it and N it with the above covariates. To specify initial values for the   N i1 , N i2 ) which leads to gain initial values for γ ν 1 and σ . (2) Two binomial regression models for individuals with y i1 = 0 or y i1 = 1 and y i2 = 0 or y i2 = 1 which leads to gain two initial estimates for γ λ 1 . The average of these two numbers is our initial values for γ λ 1 .
(3) Two binomial regression models for Z i1 and Z i2 where Z it denotes an indicator variable taking on two possible values of 1, indicating Y it is from Bernouli(λ it ), and 0 indicating Y it is from Beta(μ it , φ). It gives us two estimates for γ α 1 . The average of these two numbers is our initial values for γ α 1 . (4) Two beta regression models for individuals with 0 < y i1 < 1 and 0 < y i2 < 1 which leads to gain initial values for γ μ 1 and φ. (5) Two binomial regression models for R i1 and R i2 to gain the initial values for γ η 1 and β. In the steps (2), (3) and (4), we use a random sample generated from the standard normal distribution and X it as the covariates. So, in these steps we gain some initial estimates for σ . The average of all estimates for σ in the steps (1) to (4) is the initial value of σ . The θ (0) is (−0.15, −0.12, −0.10, 2.01, 7.77, 41.53, 0.07, 0.78). The results of using the zero and one inflated beta regression model are presented in Table 4.
From Table 4, the response variables Y it and N it are dependent according to the significance of the parameter σ . All covariates are significant with respect to the usual significance level of 5%. So, as we expect, the percentage of people in poverty and the percentage of educated adults have significant on the expectation of the percentage of violent crimes and the number of police officers in each counties. According to the negative values of γ α 1 , γ λ 1 , and γ μ 1 , the increase in the percentage of people in poverty may cause the decrease in the percentage of violent crimes. Note that the percentage of violent crimes in each county is defined as the number of violent crimes divided by the number of violent crimes plus the number of property crimes. So, the negative values of γ α 1 , γ λ 1 , and γ μ 1 means when the percentage of people in poverty increases, it increases the number of violent crimes, but not as many as the number of property crimes. In other words, the increase in the percentage of people in poverty may cause the increase in the percentage of property crimes. According to the positive values of γ ν 1 , with increasing the percentage of people in poverty, the government employs more number of police officers. Also, the significancy of β coefficient is an evidence that the missing mechanism is non-ignorable.

Concluding remarks
Sometimes, when a non-identifiable model has been fitted, the statistical softwares like R reports some problematic outputs such as non-convergence of the numerical algorithms, zero or extremely large standard errors, failure of confidence interval construction and unreasonably wide confidence interval bounds, unreasonably estimations, and so on. On the other hand, it is possible that any problematic outputs are not reported by softwares but the outputs is invalid and the inferences will be wrong. So, it is necessary to check the identifiability before fitting. In this paper, we present some theorems that, under sufficient conditions, the parametrizations of some types of the beta regression models are identifiable. We hope that the techniques for establishing identifiability used here can be applied to other statistical models. Other related topics deserving further attention are as follow: • Investigating the identifiability of generalized linear mixed models; • Finding the necessary and sufficient conditions that the mentioned models are identifiable; • Investigating the identifiability of some mentioned models having intercepts; • Investigating the identifiability of some mentioned models including discrete covariates.

Disclosure statement
No potential conflict of interest was reported by the author(s).
(b) θ n is identifiable, if it is identifiable in the separate model for N i response. (c) θ (y,r)n is identifiable, if it is identifiable in at least one of the separate models for (Y i , R i ) and N i responses.

Proof of Theorem 3.1:
Without loss of generality, we assume that there exists one continuous covariate denoted by X i such that logit( Note that when there exist more than one covariate X i , we can use the same argument to prove identifiability, we just need to assume that other covariates are constants and add them to the intercept parameter, but let X i varies. Let h(z) = log plogis(z), where plogis(z) is the cumulative distribution function of standard logistic distribution. Notice that for all possible values of y i , r i , and x i , then θ = θ * . Note that the joint observed density function for the joint model (2) is presented in the supplementary material. As mentioned in [22], with noticing that one can imply θ = θ * by the following argument. For y i ∈ (0, 1), Taking logs on both sides of according to the i−th individual density function ( Equation (5) of the supplementary material), gives (A1) For simplicity of notation, we suppress the index i of y i , r i , and x i . Applying operations ∂ 3 /∂x∂ 2 y, ∂ 4 /∂ 2 x∂ 2 y, ∂ 5 /∂x∂ 4 y, and ∂ 6 /∂ 2 x∂ 4 y in both sides of (A1) yield ( A 2 ) For simplicity of notation, we suppress the index i of y i , r i , and x i . Applying operation ∂ 3 /∂x∂ 2 y, ∂ 4 /∂ 2 x∂ 2 y, ∂ 5 /∂x∂ 4 y, and ∂ 6 /∂ 2 x∂ 4 y on both sides of (A2) gives and where h (n) (.) and plogis (n) are the n−th derivative of these functions. Below we show by contradiction that γ With regard to the matter that 0 is not a root for h (4) (.) and h (6) (.), If z = 0, (A4) and (A6) will give us (γ and thus Under the condition that knowing the sign of γ  Figure A1. It is a harmonic function which is strictly decreasing on R − and R + . Because for a > 0, the function (h (4) Applying operation ∂/∂y and ∂ 2 /∂x∂y in both sides of (A9) yields and We use a proof by contradiction to show that γ (A12) Note that the function g(u) = (2u − 1)/(u(1 − u)) for 0 < u < 1 and plogis(.) are strictly increasing. Also we know that composite of strictly increasing functions is strictly increasing. So the function goplogis(.) is strictly increasing. Thus, Equation (A12) leads to γ μ 0 = γ μ * 0 and so by (A10), , the equality (A1) reduces to which leads to (γ α 0 , γ α 1 ) = (γ α * 0 , γ α * 1 ). Now, let y = 1 in f Y,R|X (y, r = 1|x; θ) = f Y,R|X (y, r = 1|x; θ * ) according to the i−th individual density function in (5) of the supplementary material. Under

Proof of Theorem 3.2:
By Lemma A.2, it is enough to prove that θ (y,r) is identifiable in the separate model for (Y i , R i ), θ n is identifiable in the separate model for N i , and θ (y,r)n is identifiable in at least one of the two separate models. In the following, first, we prove that θ (y,r) and θ (y,r)n are identifiable in the separate model for (Y i , R i ) or in other word, a longitudinal zero and one inflated beta regression model with non-ignorable missing mechanism (Theorem 3.7) is identifiable. Without loss of generality, we assume that T = 2 and there exists two covariates denoted by X it and U it for the i−th individual  Under σ = σ * and β = β * , the above equality reduces to Multiplying both sides of (A16) by y 1 (1 − y 1 ) and then integrating over (0, 1) with respect to y 1 yields It results at φ = φ * . Now, under σ = σ * , β = β * , and φ = φ * , letting x 1 = 0 in (A14), we have plogis(γ 1 and β = β * result at η t = η * t for t = 1, 2. Substituting η t = η * t in f (y 1 , y 2 ; θ) = f (y 1 , y 2 ; θ * ), Equation (11) of the supplementary material gives us and (A17) is the same equality as f Y i (y i ; θ) = f Y i (y i ; θ * ) in a longitudinal zero and one inflated beta regression model (Theorem 3.6 proved in the supplementary material). So, using the same argument in the proof of the Theorem 3.6, leads us to γ α 1 = γ α * 1 , γ λ 1 = γ λ * 1 , and γ μ 1 = γ μ * 1 . It remains to show that θ n is identifiable in the separate model for N i . Let f N i (n i |x i ; γ ν , σ ) = f N i (n i |x i ; γ ν * , σ ) for