Fairness-Oriented Learning for Optimal Individualized Treatment Rules

ABSTRACT There has recently been a surge on the methodological development for optimal individualized treatment rule (ITR) estimation. The standard methods in the literature are designed to maximize the potential average performance (assuming larger outcomes are desirable). A notable drawback of the standard approach, due to heterogeneity in treatment response, is that the estimated optimal ITR may be suboptimal or even detrimental to certain disadvantaged subpopulations. Motivated by the importance of incorporating an appropriate fairness constraint in optimal decision making (e.g., assign treatment with protection to those with shorter survival time, or assign a job training program with protection to those with lower wages), we propose a new framework that aims to estimate an optimal ITR to maximize the average value with the guarantee that its tail performance exceeds a prespecified threshold. The optimal fairness-oriented ITR corresponds to a solution of a nonconvex optimization problem. To handle the computational challenge, we develop a new efficient first-order algorithm. We establish theoretical guarantees for the proposed estimator. Furthermore, we extend the proposed method to dynamic optimal ITRs. The advantages of the proposed approach over existing methods are demonstrated via extensive numerical studies and real data analysis.

Typically, an optimal ITR is estimated by maximizing the potential average performance (assuming a larger outcome is desirable) if all patients in the population were to receive the treatment recommended by the decision rule.Due to the patients' diversity in their responsiveness to treatment and vulnerability to adverse effects, the treatment effects are often heterogeneous.A notable limitation of the previous work in this area is that the estimated optimal ITR may be suboptimal or even detrimental to a certain disadvantaged subpopulation.To provide a concrete illustration of this consequence, we consider a simple yet illustrative example in Section 2. In this example, applying the standard optimal ITR is actually harmful to a significant portion of the population.In the setting of recommending a medical treatment, this severe consequence demands careful attention in order to protect the vulnerable.Motivated by this concern, we propose a new fairness-aware framework that aims to estimate a mean-optimal ITR under the constraint that its induced potential outcome distribution has a lower quantile above a given threshold.For example, we may maximize the average treatment benefit for the whole population while requiring that 95% of the patients benefit from the treatment (say, by requiring the 5th percentile of the potential outcome distribution to be above a prespecified threshold).
Although the proposed fairness-aware optimal ITR (F-ITR) is conceptually intuitive, its computational and statistical theories are highly nontrivial as the optimal F-ITR corresponds to a solution to a nonconvex optimization problem.We consider estimating the optimal F-ITR within a class of stochastic decision rules indexed by a Euclidean parameter.However, we do not require to specify an outcome regression model.Hence, our approach belongs to the category of model-free policy search methods.We study both the static and dynamic ITRs.We derive the asymptotic convergence theory of the proposed estimator using empirical process techniques.Considering the class of stochastic treatment rules alleviates some aspect of the computational challenge.Moreover, we show that doing so will lead to an optimal decision rule as good as the optimal rule within the corresponding class of deterministic decision rules.We prove that the estimated optimal decision rule satisfies the quantile constraint asymptotically, and that its value function converges at a O P (n −1/2 ) rate, where n is the sample size.We further develop a new first-order dual algorithm to efficiently compute the estimator.The new algorithm and theory are of independent interest and can be useful for other optimality criteria, such as the composite criterion in Luckett, Laber, and Kosorok (2017) for balancing multiple and possibly competing outcomes and the robust criterion in Xiao, Zhang, and Lu (2019) for achieving robustness against skewed, heterogeneous, heavy-tailed errors or outliers in data.
We point out that the proposed F-ITR framework is also closely related to robust methods for deriving the optimal ITRs.There are some robust methods for deriving optimal ITRs.In particular, Wang et al. (2018a) study the quantile optimal treatment regimes.Linn, Laber, and Stefanski (2017) and Qi et al. (2019a) propose quantile regression approaches to indirectly and approximately maximize the quantile of outcomes over some classes of decision rules.Wang, Fu, and Zeng (2018b) study estimating the mean-optimal treatment regime under the constraint on the mean objectives of risk outcome.Qi, Pang, and Liu (2019b) propose a general decision-rule based risk measure for individualized decision making.However, none of the work above considers both mean and quantile objectives simultaneously, though both objectives can be important in practice.
Article Organization.The rest of this article is organized as follows.In Section 2, we present the fairness-oriented individualized treatment regime.In Section 3, we present our estimator.In Section 4, we derive the asymptotic result.We extend our framework to dynamic treatment regimes in Section 5. We conduct extensive numerical studies in Section 6, and we conclude the paper and discuss future directions in Section 7.

Notation and Setup
In this article, we consider the setting of a binary treatment.We denote the treatment for patient i as A i ∈ {0, 1}.Let Y i be the corresponding outcome.Without loss of generality, we assume that a larger outcome is preferable.To define the optimal ITR, we adopt the counterfactual (or potential) outcome framework (Rubin 1978;Splawa-Neyman, Dabrowska, and Speed 1990) in causal inference.Specifically, let Y * i (0) be the outcome of patient i had this patient receive treatment 0, and let Y * i (1) be defined similarly.Since each patient can only be assigned to one treatment, for patient i, we observe either Y i (0) or Y i (1), but not both.
We assume that the observed outcome for patient i is In other words, the observed outcome is the outcome corresponding to the treatment the patient actually receives.In causal inference, this is referred as the consistency assumption.Also, we assume that the potential outcome of one patient should not be affected by treatments assigned to the other patients, or the stable unit treatment value assumption (Rubin 1986).

A Motivating Example
For illustration, we consider a heteroscedastic outcome regression model where the covariate X i ∼ Unif[0, 1], the noise ε i ∼ N(0, 1), and the treatment A i = 1 if patient i receives the treatment, and A i = 0 if patient i is in the control group.We consider the following seven decision rules: (1) 6) A i = 1 for all i, and (7) random assignment P(A i = 1) = 0.5.We evaluate the performance of the seven decision rules on a large independent sample of size 10 6 .
Table 1 summarizes the 0.10-th quantile (Q 0.10 ) and the mean of potential outcome distribution of each of the seven decision rules.This example was initially considered in Wang et al. (2018a) to illustrate the quantile-optimal ITR, which maximizes Q τ {Y * (f )}.In this example, the mean optimal treatment regime is Regime 2, and the optimal 0.10th quantile treatment regime is Regime 4.However, Regime 4 only achieves a mean outcome 2.00, which is about 20% lower than the optimal mean outcome achieved by Regime 2. As an example of the F-ITR, we consider (1) with the 0.10th quantile being constrained to be nonnegative.This leads to Regime 3 as the desired choice among the seven.We observe that the mean of Regime 3 is 2.37 and is very close to the unconstrained optimal mean value 2.40 given by Regime 2. This example demonstrates that in some applications, it is sensible and beneficial to go beyond optimizing a single criterion such as mean or quantile.We further plot the empirical treatment effect distributions of Regime 2 and Regime 3 in Figure 1, and we observe that Regime 3 substantially enlightens the left tail of the distribution.This shows that by slightly reducing the mean, we can potentially achieve much better fairness.The numbers discussed in the main text are bolded.

Fairness-Oriented Optimality
Our goal is to tailor the treatment recommendation to patient i by considering the patient's individual characteristics, summarized in the covariate vector X i ∈ R d , to achieve certain optimal performance regarding the treatment benefit in the population.An individualized treatment rule (ITR) is a mapping that takes the covariate vector x i as input and outputs a binary variable in {0, 1}.For computational efficiencies as discussed in the next section, we consider stochastic ITRs, which output a probability for assigning the treatment.A stochastic ITR can be represented by f (•, •) : R d × U → {0, 1}, where Note that from here onward, for ease of presentation, we omit the term U.In particular, Y * i (f ) is the outcome following treatment regime f (•, •), which assigns patient i to treatment 1 or 0. In this article, we focus on randomized trials for which A i and Y * i (1), Y * i (0) are independent.Our results can be extended to observational studies, as discussed in Remark 2.
To evaluate a treatment regime, existing work has focused on the population mean of the potential outcome distribution, that is, E{Y * (f )}.We consider a refinement of this metric by enforcing certain fairness constraints.Intuitively, to protect the vulnerable, we would wish the lower tail of the potential outcome distribution should not be inferior.Specifically, the τ th quantile of Y * (f ) is defined as where F * denotes the cumulative distribution function of Y * (f ), and τ ∈ (0, 1) is the quantile level of interest (e.g., the 0.10quantile).Formally, given a collection D of treatment regimes, we propose to estimate the following fairness-oriented optimal ITR (F-ITR), which is defined as the solution to the optimization problem maximize where q ∈ R is a prespecified threshold.

Estimation
We first discuss how to estimate the F-ITR defined in Section 2.2 from a randomized trial.Denote the observed sample set as {(x i , y i , a i )} n i=1 , i = 1, . . ., n, where x i ∈ R d denotes the covariates, y i ∈ R denotes the response, and a i ∈ {0, 1} denotes the treatment.The optimization problem in (1) is challenging as it involves a nonsmooth nonconvex objective function subjecting to a nonsmooth nonconvex constraint.We show below how the computational challenges can be partly alleviated by considering stochastic ITRs.
Specifically, the stochastic ITR assigns treatment 1 to patient i with probability f (x i , β), where f (•, β) ∈ D, a parametric class of functions.For example, we may adopt the logistic function f (x, β) = {1 + exp(−x β)} −1 .There has been substantial recent interest in stochastic ITRs, see Luedtke and van der Laan (2016), Díaz and van der Laan (2018), Kennedy (2019), Díaz and Hejazi (2020) and Qiu et al. (2020), among others.Luedtke and van der Laan (2016) overcomes the challenges of an NP-hard knapsack problem by focusing on stochastic ITRs.Qiu et al. (2020) showed that in a nonparametric setting with instrumental variables, the optimal ITR among all stochastic rules is in fact deterministic whenever there is heterogeneity in the average treatment effect across subgroups defined by measured covariates in the population.In Section 4, we also establish a link between the stochastic ITR and the deterministic ITR for the current problem.
For a stochastic ITR induced by f (x i , β), we denote the mean and the τ th quantile of the corresponding potential outcome distribution by M(β) and Q τ (β), respectively.The optimal F-ITR is indexed by the parameter β * : where M(β) is the mean of the potential outcome distribution induced by the ITR index by (3) Similarly as in Zhang et al. (2012) and Wang et al. (2018a), we can consistently estimate M(β) and Q τ (β) without specifying an outcome regression model by and respectively, where ρ τ (u) = u{τ − 1(u < 0)} is the quantile loss function, and where C is a positive constant.Note that we introduce the term C/ √ n to ensure the feasibility of Q τ ( β) for ease of presentation in the theoretical results.However, this term can be dropped in practice.

Lagrangian Dual Problem and its Properties
Solving problem ( 6) is computationally challenging as the constraint is nonconvex in β.To generate a feasible high-quality solution, we consider its Lagrangian dual problem: As this dual problem is convex in λ, which is a scalar, classical methods such as golden section search can be applied to efficiently find the optimal λ * .Then, the corresponding β, defined as is the dual optimal solution and satisfies the quantile constraint.
In what follows, we present the key properties of the dual solution β and compare it with the primal solution β.We first show that the dual solution in (8) indeed exists.We provide the proof in Appendix Section A, supplementary material.
Proposition 1. Assume that there exists a feasible primal optimal solution as in (6).A dual optimal solution β in (8) also exits.
The next theorem quantifies the duality gap between the primal and dual optimal solutions.Theorem 1.Let β be the solution to problem (6), and λ * and β be some optimal Lagrangian multiplier and dual solution to problems ( 7) and (8), respectively.We have that there exists a β such that the duality gap is bounded by Proof.See Appendix B, supplementary material for the detailed proof.
In the simulation studies in Section 4, we observe that our algorithm achieves small duality gaps in different settings under consideration, which demonstrates that the proposed algorithm generates high-quality solutions in practice.We emphasize that although in this paper we adopt f i (β) = {1 + exp(x i β)} −1 , Theorem 1 actually holds for general choices of continuous f i .

Algorithm
We summarize the algorithm below.For the Lagrangian dual problem, during each iteration, given some λ, we solve the following problem to evaluate the value of the Lagrangian dual function L(λ).
In the current literature of precision medicine, this type of problem is commonly solved by genetic algorithms (Whitley 1994), which are known to be inefficient and lack stability.We propose to solve the problem by an efficient first-order method.
In particular, at the tth iteration, given the current solution β t , we compute a corresponding sub(super)-gradient v t ∈ ∂D s (β t ).
In particular, we first consider M(β) in (4).By straightforward calculation, we obtain Note that c i (β) is smooth due to the use of stochastic ITR.We can thus, compute the gradient of M(β t ) efficiently, which we denote as v t M .Next, we consider Q τ (β).Due to the discontinuity of the sample quantile function, we replace the indicator function in in the quantile loss function ρ τ (u) = u{τ − 1(u < 0)} by a sigmoid function.
Taking v t = v t M + λv t Q , we update the solution by the following gradient step where α t is a prespecified stepsize.This algorithm can be implemented efficiently and displays satisfactory performance in our simulation experiments.

Statistical Theory
In this section, we present the statistical theory of the proposed estimator.In particular, we show that the estimator β achieves the optimal risk asymptotically.
Note that we consider a class of stochastic ITRs, a more general class of decision rules than deterministic ITRs.It is not difficult to see that the expected risk achieved by optimal stochastic decision rule is always lower bounded by the risk achieved by the optimal deterministic decision rule.In addition, consider linear deterministic ITRs of the form fi (β) = 1(x i β > 0) and the corresponding stochastic ITRs f i (β) = P(a i = 1|x i , β) = {1 + exp(−x i β)} −1 , the latter of which are our primary focus as discussed earlier.We show in Proposition 2 below that if some linear deterministic decision rule achieves optimal risk and satisfies the quantile constraint, there exists a linear stochastic decision rule that approximates the risk and constraint up to arbitrary precision.Throughout our discussions, we assume that the response y i 's and the covariates x i 's are bounded.
Proposition 2. Suppose there exists a β, which is an optimal solution to the problem that where the deterministic ITR f i = 1(x i β > 0) is adopted.Considering problem (6), where stochastic ITR f (x i , β) = P(f i = 1) = {1 + exp(−x i β)} −1 is adopted, denote by β an optimal solution to problem (6).We have that for any given ε > 0, as n → +∞, β satisfies the constraint of problem (2), and Proof.See Appendix C, supplementary material for the detailed proof.
The following theorem proves that the risk incurred by β converges to the optimal risk R * .To derive the consistency, we employ the empirical process techniques.The detailed proof is presented in Appendix D, supplementary material.
probability, where M and M * denotes the estimator's corresponding population mean treatment results and optimal mean treatment results under the quantile constraint, respectively.
The next theorem derives the convergence rate of M( β) − M(β * ).
Theorem 3. Suppose that β * belongs to a compact set B(M), where M > 0 is a constant.Then we have that ∀τ ≥ 1 we have where P * denotes the outer probability for possibly nonmeasureable sets, and ε = O(n −1/2 ).Or, equivalently, Proof.See Appendix E, supplementary material for the detailed proof.
Remark 2. The proposed method and theoretical results can be extended to observational studies using the propensity score weighting approach.Assume the popular no unmeasured confounder assumption {Y * (1), Y * (0)} ⊥ A|X holds.Leting π(X) = P(A = 1|X), the propensity score Denoting the propensity score by π c (X, β), we estimate the mean and the τ -th quantile of the treatment effect by argmin , respectively, where πc (x i , β) is an estimator of the propensity score π c (x i , β).
In our simulation, we use the logistic regression to estimate π c (x i , β), where we model π(X) as π(X, γ ) = exp(X γ )/(1 + exp(X γ )).In practice, we may use other semiparametric or nonparametric models for the estimation.
Remark 3. In practice, the future patient population may not be exactly the same as the training samples, for example, a slight shift of age or other covariates.To solve this challenge, we refer the readers to Mo, Qi, and Liu (2020), which addresses the covariate shift problem.

Extension to Dynamic Treatment Regime
We extend the proposed F-ITR to the dynamic setting, which involves a sequence of decision rules.For example, in treating a chronic disease, the patient's condition often needs to be reevaluated over time.Depending on the patient's clinical information and how he/she responds to the previous treatment, the doctor may need to adapt the treatment decision.
For ease of presentation, we consider a two-stage dynamic setting, but our methods and results can be extended to the general T-stage case by induction.Assume that patient i receives treatment a (1)   i ∈ {0, 1} at stage 1 and treatment a (2) i ∈ {0, 1} at stage 2. At the end of stage 2, we observe the outcome Y i , based on which the overall treatment effect will be evaluated.A dynamic ITR has the form f = {f (1) , f (2) }, such that f (j) is a function of all information available before making the jth decision.We denote the baseline covariate vector as X (1)  i , and denote the covariate at the second stage as X (2)  i , which may depend on X (1) i and a (1) i and may include intermediate outcomes.Let (2) i }.Throughout our discussion, we adopt the no unmeasured confounder or sequential ignorability assumption, that is, conditioning on the history, the treatment is independent of any future information, see Robins (1997) for details.In addition, we adopt the positivity assumption that there exist positive constants c 1 < c 2 such that c 1 ≤ P(a j = a|H j ) ≤ c 2 for a ∈ {0, 1} and j = 1, 2.
Here our goal is to estimate the optimal dynamic ITR, which is defined as the one that maximized the average final outcome under the constraint that a lower quantile of the potential outcome distribution of the ITR exceeds a given threshold.Consider the class of candidate ITRs index by β = {β (1) , β (2) }, and f (j) Given a β, we denote the corresponding stochastic sequential decision rule by 2) ) .We sometimes write d(β) = (d 1 (β (1) ), d 2 (β (2) )) for brevity.
For sample i, the potential final outcomes are denoted as  1) )) and the potential final outcome is denoted by Y * (d(β)).We have Thus, we have that the population mean of the potential outcome distribution induced by d(β) is where μ(j 1 , j 2 , To estimate M(β), suppose we have a sample {x (1)  i , a (1) i , x (2)  i , a (2) i , y i } i∈ [n] , and let h (1) (2) i ) .We further assume that the sample is from the sequential multiple assignment randomized trial (SMART) (Murphy 2005;Lavori and Dawson 2000) with where π 1 , π 2 ∈ (0, 1) are two known constants.Let Then, we estimate M(β) by Similarly, we estimate the τ th quantile of the outcome by Then, our estimator for the F-ITR is given by Let M(β * ) be the optimal risk satisfying the quantile constraint that Q τ (β * ) ≥ q.We derive the risk consistency by showing that M( β) converges to M(β * ) as sample size n goes to infinity.The results hold for general T-stage problems by an induction argument.Proposition 3. Suppose there exists β = { β(1) , β(2) }, which is the optimal solution to the problem dynamic treatment regime problem at the population level, and the deterministic decision rule is adopted.Considering problem (12), where stochastic decision rule is adopted, denote by β = { β (1) , β (2) } an optimal solution to (12).We have that for any given ε > 0, as n → ∞, β satisfies the constraint of problem ( 12), and Proof.The proof is similar to the proof of Proposition 2, and we omit it here.
Next, we present the convergence rate.The proof is similar to the proof of Theorem 3.
Theorem 4. Suppose that the optimal regime β * = {β (1) * , β (2) * } belongs to a compact set B(M), where M > 0 is a constant.Then we have that for all ξ ≥ 1 we have where P * denotes the outer probability for possibly nonmeasureable sets, and ε = O(n −1/2 ).Or, equivalently, Remark 4. In this section, we focus on the scenario where the final outcome is of interest.In Appendix F, supplementary material we extend the analysis to a different scenario where a reward is observed at each stage and the goal is to optimize the total reward while constraining the quantile of the potential outcome distribution at each stage to exceed some threshold.Meanwhile, suppose the goal is maximizing the sum of outcome of stages 1 and 2. Denote the potential outcome for sample i at stages 1 and 2 with actions a (1) i and a (2) We may just replace the original potential outcome Then the proposed method still applies.

Numerical Results
In this section, we conduct extensive numerical studies using both synthetic and real datasets to investigate the empirical performance of our proposed approach.Our studies demonstrate that the proposed methods achieve desired quantile constraints and obtain desirable mean outcomes.

Monte Carlo Studies
Example 1 (Static ITR).We generate the random outcome y i from the following heteroscedastic outcome regression model where the x ij 's are independently generated from the Uniform(0, 1) distribution, and the treatment indicator a i satisfies log P(a i = 1|x i )/P(a i = 0|x i ) = −0.5 − 0.5(x i1 + x i2 + x i3 + x i4 ).We consider two different distributions for the random error ε i : the standard normal distribution, and a highly nonsymmetric distribution χ 2 5 − 5. We consider two different sample sizes n = 500 or 1000.
We consider the class of treatment regimes P( For each combination of error distribution and sample size n, we consider two different choices of τ (0.1 and 0.25).For each τ , we consider two choices of q.We aim to answer three essential questions through this Monte Carlo study: (1) How well does the proposed F-ITR meet the quantile constraint for fairness protection?(2) How does the F-ITR compare with the traditional mean-optimal ITR without fairness constraint?
(3) How does the proposed algorithm work comparing with the traditional genetic algorithm?NOTE: Under different quantile constraints, where we require the τ -th quantile of the treatment effect is at least q, we report the averaged treatment effects of sample mean M mean , sample quantile Q τ , sample duality gap (Dual), the corresponding population mean treatment E(M mean ), the population quantile E(Q τ ), and the percentage of infeasible cases (IF) among the total 1000 simulations.
Table 2 summarizes the results based on 1000 simulations runs, including the average of the estimated mean value (M mean ) and the average of the estimated τ th quantile (Q τ ) based on the sample.We also report the averaged duality gap (Dual) for each setting, which is an estimate of the optimality of the achieved objectives.Furthermore, using a large independent Monte Carlo sample of size one million, we evaluate the expected mean value (E(M mean )) and the τ th quantile (E(Q τ )) when the estimated ITR is used to assign treatment for each individual in the sample.We also report the number of infeasible (IF) cases in the last column, where our algorithm fails to find a feasible solution among 1000 runs.We point out that the infeasibility of the problem may due to the random samples that no such regime satisfying the quantile constraint exists.
First, the values Q τ and E(Q τ ) reported in Table 2 confirm that the fairness constraints are satisfied.Second, we compare the performance of F-ITR with M-ITR (mean-optimal ITR without fairness constraint) based on the testing samples.Figure 2 displays the density plots of the estimated potential outcome distribution of F-ITR and that of M-ITR under four different setups considered in the Monte Carlo experiment.It is observed that in setups 1, 3, and 4, F-ITR achieves substantially lower left-tail densities comparing with M-ITR while does not reduce the mean performance significantly.In setup 2, we observe that the two distributions do not differ significantly.This is because that the mean optimal treatment regime also satisfies the quantile constraint.In this case, we observe that F-ITR almost has the same distribution as M-ITR.These figures demonstrate that the proposed F-ITR leads to improved performance at the left tail, with little sacrifice the overall average benefits.Furthermore, we observe that when the quantile constraint is relatively relaxed, the achieved mean value is very close to that of M-ITR.These observations demonstrate strong evidence supporting the benefits of the proposed F-ITR.
Finally, we evaluate the performance and computational speed of the proposed new algorithm.We apply the proposed algorithm and the genetic algorithm to estimate the meanoptimal ITR (M-ITR) and quantile-optimal ITR (maximizing the τ th quantile of the potential outcome distribution).Note that the genetic algorithm is not applicable to F-ITR.Table 3 compares the average computational time and estimated values based on 100 simulation runs.For M-ITR, the new algorithm achieves similar values as the genetic algorithm does, while reducing the computational time by about one third.For Q-ITR, the new algorithm achieves significantly better values and only requires a small fraction of the computational time of the genetic algorithm.
Example 2 (Dynamic ITR).We consider a two-stage example and generate the data from the model Empirical treatment effect distribution of mean optimal treatment regime (Mean-Opt) and F-ITR under four setups, where we estimate the regimes using n = 500 samples, and test on 1 million samples.In setups 1 and 2, the errors are from N(0, 1).In setup 1 we set τ = 0.1, q = 1, and in setup 2, we set τ = 0.25, q = 2.In setups 3 and 4, the errors are from χ 2 5 − 5.In setup 3, we set τ = 0.1, q = −0.75, and in setup 4, we set τ = 0.25, q = 0.5.
Table 3. Quantitative comparisons of the proposed algorithm (Alg) and the genetic algorithm.We report the averaged objective value achieved by different algorithms, and the averaged running times in seconds after repeating the simulation 1000 times, the values in parentheses correspond to the sample standard deviations.and Bernoulli(expit(−1 + x i2 )), respectively.Similarly as in the previous example, we consider two different distributions for the random error ε i : the standard normal distribution N(0, 1) and the asymmetric 0.5 • (χ 2 5 − 5) distribution.We consider sequential ITRs of the form (A 1 , A 2 ), where A 1 = 1{c 1 X 1 +b 1 < 0}, and A 2 = 1{c 2 X 2 + b 2 < 0}.We consider the dynamic F-ITR in Appendix F, supplementary material.We aim to optimize Table 4. Simulation studies of F-ITR in dynamic settings.Under different quantile constraints, where we require the τ th quantile of the treatment effect is at least q, we report the averaged treatment effects of sample mean M mean , sample quantile Q τ , sample duality gap (Dual), the corresponding population mean treatment E(M mean ), the population quantile E(Q τ ), and the percentage of infeasible cases (IF) among the total 1000 simulations.
Error the mean value under different quantile constraints Q τ ≥ q for different τ and q.We consider sample size n = 1000 or 2000.
The simulation results are summarized in Table 4.For both normal and chi-square errors, we provide in the first line the mean of the M-ITR (without constraint) together with its 10% and 25%-th quantiles as benchmarks.It is observed that for both error distributions, the F-ITR has only slightly smaller mean than M-ITR but satisfies the conditional quantile constraints well.In contrast, for normal error the M-ITR has negative 0.10 quantile, for the chi-square error, the M-ITR has negative 0.10 and 0.25 quantiles, suggesting that the M-ITR may have undesirable effects for fragile individuals.Our proposed F-ITR achieve the desired constraints, and achieve near-optimal mean treatment effects as the duality gap is small.

Application
We apply the proposed method to analyze the ACTG175 dataset from the R package speff2trial.This dataset contains 2,139 HIV-infected patients.These patients are randomly assigned to one of the four treatments including zidovudine (AZT) monotherapy, AZT+didanosine(ddI), AZT+zalcitabine(ddC), and ddI monotherapy.The goal of the trial is to determine if the treatment with one drug (monotherapy) is better than the treatment with two drugs (combination therapy) in patients with CD4-T cells between 200 and 500/mm 3 .See Hammer et al. (1996) for more details.
In exploratory analysis, we observe heteroscedastic treatment effects and high asymmetry in the outcome variable distribution.It was known in the medical literature that the patients who had taken AZT before entering the trial, treated with ddI or AZT+ddI are better than continuing to take AZT alone.We thus, consider the problem of how to assign treatment to the patients who had taken AZT before the trial, to either continued treatment with AZT+ddI combination or the ddI monotherapy.Denote A i = 1 if patient i is assigned to the AZT+ddI therapy, and A i = 0 if the patient is assigned to the ddI monotherapy.The outcome is the CD4 count at 96±5 weeks from baseline (denoted as CD496) as it is a crucial measure of the progression for HIV-infected patients.
We consider two covariates for estimating the treatment regimes, which are the baseline weights of the patients, and the baseline CD4-T cell counts.We then estimate the M-ITR, Q-ITR (maximizing the 0.25th quantile), and the F-ITR (under the constraint that the 0.25th quantile is lower bounded by 230).(It has been observed that when CD4 is below 200 cells/mm 3 , the risk of serious health problems increases.For example, the risk of PCP (fungal pneumonia) and chest infections rise steeply when the CD4 falls below 200 cells/mm 3 ).The results are summarized in Table 5.We observe that the 0.25-quantiles and the means are significantly different for the Q-ITR and the M- ITR.Meanwhile, with a quantile constraint, the estimated mean of the F-ITR is close to the mean of M-ITR.

Discussion
To conclude, we propose a new framework for fairness-aware optimal ITR estimation under a quantile constraint.We show that the proposed estimator satisfies the quantile constraint, and achieves the optimal mean treatment effects asymptotically.
Our extensive simulation studies demonstrate that though the estimator is derived from a highly nonconvex problem, our proposed algorithm achieves high-quality solutions in practice.
In practice, it is important to properly choose the quantile level τ and the threshold q in our proposed model (2).In one of the motivating examples, we aim to control the tail behavior of the treatment results.Thus, in such applications, we suggest that we let τ be 0.05 or 0.10, and q be 0 or a small positive number, where we assume that a positive result means that the patient benefits from the treatment.In future work, we will discuss with practitioners and make better recommendations.
Unlike the unconstrained mean-optimal ITR approach, the proposed method does not achieve the Fisher consistency in general even if the decision space increases.We argue here that by imposing a practically meaningful quantile constraint, the Fisher's consistency holds asymptotically.In particular, for the unconstrained approach, as the functional space of the decision rule f increases, it is reasonable to assume that in expectation, the treatment effects are positive for all individuals, that is, E{Y * i (f )|x i } ≥ 0 for all x i .Meanwhile, as mentioned above, the main motivation of imposing the quantile/fairness constraint is to ensure vast majority of the patients would benefit from the treatment.That is, some lower tail of the distribution of treatment effects is positive.Then, if we impose a constraint that Q 0.05 {Y * (f )} ≥ 0, this constraint is satisfied by the optimal solution to the unconstrained ITR problem by our assumption.Thus, the two solutions coincide, and the Fisher's consistency is satisfied.
For future work, our proposed method can be potentially generalized to achieve group fairness.In particular, suppose that we have a small number of K groups of patients.The groups can be defined by gender, age, or income status.We can then require the quantile constraint to be satisfied for each group by imposing multiple constraints.For example, suppose that we have two groups of patients.Denote the two groups as G 1 and G 2 .With a slight abuse of notation, to ensure that the two groups get fair results, we may impose the constraints Q τ (Y * i (f )) ≥ q for i ∈ G k , k = 1, 2. In this case, we have multiple constraints, and by a similar Lagrangian dual approach as we propose to tackle our original problem with a single constraint, we can potentially solve the problem.We will study this problem from both algorithmic and statistical perspectives in the future.

Figure 1 .
Figure 1.Empirical distributions of treatment Regimes 2 and 3 corresponding to the four possible treatment sequences.Given a dynamic ITR d(β), the potential intermediate information is denoted by X (2) * i (d 1 (β (

Table 1 .
Mean and the 0.10th quantile of the outcomes of the seven different treatment regimes estimated using 10 6 simulated samples.

Table 2 .
Simulation studies of F-ITR.

Table 5 .
Estimated quantiles and means of different treatment regimes for ACTG175 data analysis.