High-Dimensional Expensive Optimization by Classification-based Multiobjective Evolutionary Algorithm with Dimensionality Reduction

Surrogate-assisted multiobjective evolutionary algorithms (SAMOEAs) are a promising approach for solving expensive multiobjective optimization problems (EMOPs), wherein the number of function evaluations is extremely restricted due to expensive-to-evaluate objective functions. However, most SAEAs are not well-scaled to high-dimensional problems because the accuracy of surrogate models degrades as the problem dimension increases. This paper proposes a dimensionality reduction-based SAEA, which involves the following two strategies to address high-dimensional EMOPs. First, mapping high-dimensional training samples to a low-dimensional space in building surrogate models can boost the accuracy of surrogate models. Second, compared to approximation-based surrogate models, reliable classification-based models can be obtained under a few training samples. Accordingly, the proposed algorithm is designed to integrate a dimensionality reduction technique into an existing classification-based SAEA, MCEA/D. It builds classification models in low-dimensional spaces and then utilizes these models to estimate good solutions without expensive function evaluations. Experimental results statistically confirm that the proposed algorithm derives state-of-the-art performance in many experimental cases.


INTRODUCTION
Many real-world optimization problems involve multiple expensive-to-evaluate objectives, wherein calculating objective function values, i.e., the function evaluation (FE), is computationally and/or financially expensive [1].For example, Mazda motor corporation conducted a 2-objective optimization of car designs in which evaluating a single design with a crashworthiness simulation took approximately 20 hours using the supercomputer K [2,3].Such expensive multiobjective optimization problems (EMOPs) are encountered in many engineering applications, e.g., photonic waveguide design [4] and satellite constellation systems [5].A grand challenge in EMOPs is to obtain acceptable solutions under restricted computational and financial budgets.For this challenge, sample-efficient approaches to reduce the number of FEs, such as Bayesian optimization, are promising approaches.
Multi-objective evolutionary algorithms (MOEAs) [6] are popular black-box optimizers.However, MOEAs typically assume tens of thousands of FEs and thereby become impractical for solving EMOPs.To tackle this challenge, numerous efforts have been dedicated to developing sample-efficient MOEAs under a few hundred of FEs, referred as to surrogate-assisted MOEAs (SAMOEAs) [7].A general concept of SAMOEAs intend to accelerate evolutionary optimization by replacing expensive objective functions with computationally and financially cheap surrogate models.In particular, surrogate models are designed to estimate the quality of solutions, and machine † Yuma Horaguchi is the presenter of this paper.learning techniques are employed to build these models.SAMOEAs prescreen unevaluated solutions to estimate high-quality solutions by utilizing predictions of surrogate models.Many existing SAMOEAs are categorized into an approximation-based approach, which utilizes approximation models for objective functions in a Bayesian optimization manner [8].In this category, Gaussian processes (GPs) are typically employed as surrogate models to approximate objective functions.Another category is a classification-based approach which, in its basic form, uses classification models to predict "good" or "bad" candidate solutions.
However, successful approximation-based SAMOEAs are designed for low-dimensional EMOPs, and those are not well-scaled to high-dimensional problems with more than 50 problem dimensions [9].This is because the approximation accuracy of GPs deteriorates as the problem dimension increases [10].A universal strategy for this challenge is to increase training data (i.e., evaluated solutions), but this is hindered in EMOPs owing to a restricted number of FEs.Accordingly, modern works have attempted to improve the scalability of SAMOEAs to highdimensional EMOPs, which are summarized below.
1. Some works integrated dimensionality reduction (DR) techniques into approximation-based SAMOEAs to improve the approximation accuracy.Specifically, approximation models including GPs are trained with low-dimensional training data mapped from an original high-dimensional space.SA-RVEA-PCA [11] and ADSAPSO [12] are state-of-the-art algorithms in this strategy.

Recently, the effectiveness of classification-based
SAMOEAs on high-dimensional EMOPs has been revealed [9,13].A presented insight is that compared to approximation models, reliable classification-based models can be obtained even with a few training samples.State-of-the-art algorithms in this strategy include MCEA/D [9] and REMO [14] etc.
Inspired by the above insights, the integration of a DR technique into a classification-based SAMOEA can be an effective strategy to address high-dimensional EMOPs.However, to our knowledge, existing classification-based SAMOEAs are designed to construct classification models with high-dimensional training samples, which may deteriorate their classification accuracy.Accordingly, this paper presents a classification-based SAMOEA that utilizes a DR mechanism to boost the accuracy of classification models.To this end, we introduce an extension of MCEA/D based on an adaptive dropout (AD) mechanism which is a linear DR technique used in ADSAPSO.The presented algorithm, called Dimensionality Reduction-based MCEA/D (DR-MCEA/D), builds Support Vector Machines (SVMs) [15] in a lower-dimensional space mapped by the AD mechanism.To the best of our knowledge, this paper considers the first attempt that the first combination of the dimensionality reduction technique and the classification-based SAMOEA.
The rest of this paper is organized as follows.Section 2 explains the MCEA/D framework.Section 3 describes the DR-MCEA/D framework.Note that the AD mechanism is briefly described in this section.In Section 4, experimental results are shown to validate the effectiveness of DR-MCEA/D by comparing state-of-the-art SAMOEAs adapted for high-dimensional EMOPs.Section 5 provides some additional results with respect to the impact of dimensionality reduction.Finally, Section 6 summarizes this paper with future work.

BACKGROUND
This section starts by introducing background information to introduce the MCEA/D framework.

Support vector machine
For a classification problem, where a D-dimensional input x must be classified to a binary class c ∈ {+1, −1}, a SVM learns a decision function h : R D → {+1, −1}, given by where ϕ(x) is a mapping function, and the weight vector w ∈ R D and the bias w 0 ∈ R are the parameters to be optimized.Given a training dataset i=1 , an optimization problem for w and w 0 can be simplified using the Lagrangian and the Karush-Kuhn-Tucker con-dition, that is where i=1 is a set of the Lagrange multiplier; K(x i , x j ) is a kernel function; and C controls the balance between the margin and empirical loss.Given solution of Eq. ( 2), denoted as i=1 , the decision function of Eq. ( 1) is written as where The decision score function d(x) can be a metric to quantify the distance between x and the decision boundary.
2.1.2.Decomposition approach for multiobjective optimization This paper considers a multiobjective optimization problem (MOP), formalized as where M is the number of objective functions; S ⊆ R D is a D-dimensional feasible space; and f : R D → R is an objective function.A goal is to obtain optimal solutions that approximate a Pareto optimal front of Eq. (5).
A single FE corresponds to calculating F (x) once.Note that an MOP instance of Eq. ( 5) can be considered as an EMOP instance if the number of FEs is typically restricted to thousands or even hundreds.
A popular approach to solve MOPs is to decompose an MOP instance into many single-objective subproblems and then solve each subproblem in a single-objective optimization manner.MCEA/D employs this approach.Each subproblem is defined with a scalarization function g : R D → R to be optimized.Thus, the decompositionbased approach intends to minimize all the scalarization functions in parallel.In this paper, we use the Tchebycheff scalarization function and g for the i-th subproblem is defined as where In general, a reference point z j is set to the minimum value of f j of discovered solutions during a run; and a set of weight vectors is determined by the two-layered approach to obtain uniformly-distributed vectors in the objective space (see [16] for more detail).

MCEA/D
MCEA/D is a classification-based SAMOEA that utilizes a set of SVM models on a popular decompositionbased MOEA called MOEA/D-DE [17].A basic idea of MCEA/D is threefold.First, a local SVM classifier is constructed for each subproblem to improve its classification accuracy.Second, SVM classifiers are utilized to select "good" candidate solutions to save the number of FEs.Third, the candidate solution closest to a decision boundary is selected if there are no candidates predicted as the "good" class.
Algorithm 1 describes the pseudocode of MCEA/D.Note that MCEA/D defines N subproblems with a set of weight vectors {λ i } N i=1 .The overall procedure is terminated when the number of FEs, FE , reaches the maximum number of FEs FE max .

Initialization
For the initialization process, MCEA/D determines an index set of neighbor subproblems for the i-th subproblem, denoted as B(i).In particular, , is generated and evaluated with objective functions.An archive set A, which stores all evaluated solutions, is also initialized with the initial solutions.Each reference point z j is set to the minimum value of f j of initial solutions.

Model consturction
As a main loop, MCEA/D sequentially solves each subproblem with the help of an SVM classifier till FE reaches FE max .For the i-th subproblem, it first builds an SVM model with a training dataset T .All the solutions x in A are used for training samples.To localize the SVM classifier, the current best solution for each neighbor subproblem of the i-th one is defined as a "good" solution; otherwise a "bad" solution.In particular, T is defined as where c(x) is a true class of x as Note that +1 and −1 stand for "good" and "bad"

Solution update
The determined solution y is evaluated with the objective functions and inserted to A. For a solution-update process, parent candidates x k , ∀k ∈ P may be updated with y if g(y|λ k , z) ≤ g(x k |λ k , z); however, the maximum update time is bounded by n r to maintain the diversity of solutions.The above entire processes are repeated for the next subproblem.

PROPOSED ALGORITHM
The proposed algorithm, DR-MCEA/D integrates a DR technique into the MCEA/D framework.The intent is to boost the accuracy of classification models under a few high-dimensional training samples, improving the scalability of MCEA/D.To this end, we modify the AD mechanism used in ADSAPSO to apply to a decomposition framework of MCEA/D.Note that ADSAPSO uses a Pareto-dominance framework.This section first introduces our DR mechanism.Subsequently, the overall procedure of DR-MCEA/D is explained.

Dimensionality reduction
A basic idea of the AD mechanism is to identify an important decision variable if its decision value of superior solutions is sufficiently different from that of inferior solutions.This is because such decision variables may significantly affect the objective values F (x).The AD mechanism performs as a linear DR technique in a feature selection manner.
Given a D-dimensional vector x = [x 1 , x 2 , . . ., x D ], our DR mechanism returns a set of D indices of decision variables, I = {i 1 , i 2 , . . ., i D }, wherein x i1 , x i2 , . . ., x i D are now considered as important decision variables.Accordingly, DR-MCEA/D is designed to build SVM classifiers on a D-dimensional space.Here, D = ⌈βD⌉ is defined with a reduction rate β ∈ (0, 1]. Algorithm 2 describes the pseudocode of our DR mechanism.To adapt the AD mechanism to the decomposition framework, our mechanism is executed for each subproblem.Supposing the i-th subproblem with λ i , superior solutions for the i-th subproblem are defined as having the top ρ minimum values of g(x|λ i , z), forming a A sup .In contrast, inferior solutions are defined as having the top ρ maximum values of g(x|λ i , z), forming a superior set of A inf .Let δ j be an absolute difference between the average value of x j of the superior solutions and that of the inferior solutions.That is, δ j is defined as where xj sup and xj inf are the average values of x j of the superior and inferior solutions, respectively, given by Finally, important decision variables are extracted as D ones having the top D maximum distances obtained by Eq. ( 12).The indices of their important decision variables are added to the index set I.

Algorithm
Algorithm 3 shows the pseudocode of DR-MCEA/D, wherein modifications from MCEA/D are highlighted in the underline.DR-MCEA/D requires two hyperparameters, ρ and β for our DR mechanism, in addition to those of MCEA/D.To begin with, DR-MCEA/D initializes P, z, and A in the same manner as MCEA/D.Subsequently, as a main loop, the following procedures with our DR mechanism are conducted for each subproblem.Covert T to a low-dimensional dataset T

14:
Build an SVM classifier h(x) trained with T by Eq. ( 7) 15: Set a parent index set P with Eq. ( 10) and Y as an emptyset 16: for The index set of important decision variables I is obtained by executing DimensionReduction(λ i , z, A, ρ, β).
The training dataset T is constructed in the same manner as in MCEA/D.Next, T is converted to a D-dimensional dataset T as where x = [x i1 , x i2 , . . ., x i D ] is a training sample mapped from its corresponding sample x with indices contained in I. Finally, an SVM classifier, h : R D → {+1, −1}, is trained with T .

Solution generation
The following minimum modifications are added to the solution-generation process of MCEA/D; to calculate classes and decision scores of candidate solutions y, each y is mapped to its converted form y = [y i1 , y i2 , . . ., y i D ] with indices contained in I.In particular, a class of each candidate solution is predicted using h( y) with y.If there is no candidate solution having the "good" class, y is set to the candidate solution closest to a decision boundary of d( y) used in h( y).
Note that procedures after the solution generation, that is, lines 25-29 of Algorithm 3 are exactly the same as those of MCEA/D.The overall procedures are terminated when FE reaches FE max .

EXPERIMENTS
This section conducts experiments to validate the effectiveness of our proposed algorithm.All experiments were conducted on the evolutionary multiobjective optimization platform PlatEMO [18].
Note that ADSAPSO and DR-MCEA/D used the same reduction rate β = 0.5 for a fair comparison.For all algorithms, N initial solutions were produced by the Latin hypercube sampling method [21].The maximum number of FE, FE max , was set to 300.
Evaluation criteria.The modified Inverted Generational Distance (IGD + ) metric [22] was used as a performance metric to evaluate both the diversity and the convergence of obtained solutions.The IGD + values were obtained from non-dominated solutions in all the evaluated solutions when the number of FEs reaches FE max .The average IGD + values of 21 trials with different random seeds are reported.Furthermore, the Wilcoxson rank-sum test was applied to examine significant differences at a significance level of 0.05.

Results
Table 1 reports the average IGD + values of compared algorithms.The best IGD + value for each experimental case is highlighted in color.The average ranks and statistical results of the Wilcoxon rank-sum test are summarized at the bottom of each table; three symbols, "+," "−," and "≈" indicate that the IGD + value of the compared algorithm is significantly better than, worse than, and competitive with that of DR-MCEA/D, respectively.Note that M must be set more than 2 for MaF13, and thereby we did not conduct the experimental case of MaF13 with M = 2.
As shown in Table 1, MCEA/D is assigned to better average ranks than that of ADSAPSO for all problem dimensions D. This result is consistent with a recent observation that classification-based SAMOEAs tend to outperform approximation-based ones [9,14,20]; approximation-based SAMOEAs may suffer to solve high-dimensional EMOPs owing to a few training samples even when a DR technique is utilized.Consequently, DR-MCEA/D sufficiently outperformed ADSAPSO, as the IGD + values of DR-MCEA/D were statistically better than that of ADSAPSO for more than 30 experimental cases (see the number of "−").
From Table 1(a), there is no clear difference between DR-MCEA/D MCEA/D when D is set to 50; the average rank of DR-MCEA/D is slightly worse than that of MCEA/D.DR-MCEA/D statistically outperformed and underperformed MCEA/D for five and three experimental cases, respectively.However, when D is further increased to 100, DR-MCEA/D sufficiently outperformed MCEA/D, as its average rank is improved.In particular, the IGD + values of DR-MCEA/D are statistically better than that of MCEA/D for 11 experimental cases.Furthermore, this tendency is observed for D = 150 (see Table1(b)).These observations empirically confirm that DR-MCEA/D successfully improves the scalability of MCEA/D by utilizing the DR technique.

DISCUSSION
This section provides additional insights to confirm the effectiveness of using our DR mechanism.

Impact of dimensionality reduction mechanism
We intended to improve the classification accuracy of the SVM classifiers by utilizing our DR mechanism.However, evaluating the classification accuracy during a run is difficult because generating plausible validation samples (especially truly good solutions) for each optimization progress is non-trivial.Instead, we here discuss the distribution of the decision scores obtained in DR-MCEA/D during a run.When a low-quality SVM clas-Fig.1 shows the variance of decision scores of all candidate solutions for three experimental cases; MaF1 (M = 6), MaF3 (M = 2), and MaF4 (M = 3).In the figure, the variances of MCEA/D and DR-MCEA/D (the vertical axis) for all the problem dimensions (the horizontal axis) are shown; these variances were calculated from the results presented in the previous section.As shown in this figure, overall results show that the variance for DR-MCEA/D is larger than that for MCEA/D except for MaF3 (M = 2, D = 50).The variance of MCEA/D significantly deteriorates when the problem dimension increases for MaF3 (M = 2) and MaF4 (M = 3).Note that for the experimental cases selected here, DR-MCEA/D outperforms MCEA/D on average for all the cases except for MaF1 (M = 6, D = 50), as shown in Table 1.These observations empirically confirm that using our DR mechanism yields a well-distributed distribution of the decision scores; thus we can expect that DR-MCEA/D performed well because the classification accuracy improved.
Our analytical results also suggest a possible drawback of DR-MCEA/D.That is, our DR technique may cause the over-fitting issue; an over-fitted classifier may assign good decision scores only to similar candidate solutions to each other, losing the diversity of solutions.For instance, a large variance for DR-MCEA/D is obtained for MaF1 (M = 6, D = 50), but the average IGD + value is statistically worse than that of MCEA/D.We suspect that DR-MCEA/D lost the diversity of solutions because it selected similar candidate solutions to be evaluated with the objective functions.To avoid this possible issue, selecting a candidate solution in a diversity measure may be useful to further improve the performance of DR-MCEA/D.
Note that there exists almost no significant difference between the number of "good" candidate solutions of DR-MCEA/D and that of MCEA/D; our DR mechanism intends not to increase the production of good candidate solutions.Fig. 2 shows the number of candidate solutions that were predicted as having the "good" class during a run (express it as n g ).As shown in this figure, DR-MCEA/D and MCEA/D generated almost the same number of candidate solutions.MCEA/D (as well as DR- Fig. 2 The number of candidate solutions predicted as having "good" class, n g .MCEA/D) is not designed to produce a good candidate solution to maintain the diversity of solutions, and it intends to accurately predict good solutions.Thus, the selection mechanism based on the decision scores is frequently executed, and our DR mechanism yields the performance improvement of MCEA/D, as discussed above.

Sensitivity to reduction rate
The reduction rate β can be an important parameter to affect the performance of DR-MCEA/D.We here investigate the dependency of the DR-MCEA/D performance to β.We set β = {0.3,0.7} in addition to its default value 0.5 with the same experimental settings as in the previous section.Table 2 reports the summary of statistical results for DR-MCEA/D with β = {0.3,0.5, 0.7}.The three symbols "+," "−," and "≈" indicate that the IGD + values of DR-MCEA/D with β = {0.3,0.7} is significantly better than, significantly worse than, or competitive with that for the default setting β = 0.5, respectively.
The performance of DR-MCEA/D improves when β is set to 0.3 on 100-dimensional experimental cases.However, it does not significantly change dependent on β for the other problem dimensions.Thus, the improvement of the classification accuracy may be achieved even with a large reduction rate.This result indicates that DR-MCEA/D is less sensitive to the value of β, and a possible benefit for this is to reduce the computational time when a small value of β is used.For instance, the computational time to complete one trial (averaged for all experimental cases with different values of M and D) took 9.987, 10.071, and 10.107 sec for β = {0.3,0.5, 0.7}, respectively.Note the computational time of MCEA/D was 1.316 sec.

CONCLUSION
This paper considered the first combination of a dimensionality reduction technique and a classificationbased SAMOEA for solving high-dimensional EMOPs.This is intended to improve the accuracy of classification models to select good candidate solutions.The proposed algorithm, DR-MCEA/D is an extension of MCEA/D and constructs SVM classifiers on a lower-dimensional space mapped in a linear reduction manner.Experiments were conducted on high-dimensional benchmark problems with up to 150 dimensions under the limited budgets of FEs, i.e., 300 FEs.Results showed that DR-MCEA/D outperformed two state-of-the-art algorithms, MCEA/D and ADSAPSO, for several experimental cases.Additional analysis showed that the accuracy of classification models likely improved, as well-distributed decision scores were obtained.
We also suggested that using a dimensionality reduction technique may cause the over-fitting issue, losing the diversity of solutions even under a restricted number of FEs.In future works, we will revisit a selection mechanism for candidate solutions to consider both the model predictions and the diversity of solutions.We also consider a solution-generation process adapted for our DR mechanism, e.g., generating candidate solutions in a lower-dimensional space (as done in ADSAPSO).In addition, it would be worth interesting in the effect of di-mensionality reduction techniques on different classification models, such as neural-based dominance predictors [14,23] and global classification modeling [24].

M = 3 Fig. 1
Fig. 1 Variance of decision score function values d(x) for three experimental cases.
wherein the maximum repeat time is bounded by R max .Initially, a set of candidate solutions Y is set to an empty set.Next, a candidate solution y is generated with the same procedure as that of MOEA/D-DE.In particular, MCEA/D builds an index set of parent candidates P as Input Maximum number of FEs FEmax, Population size N , Neighbor size T , Parent selection probability δ, Maximum number of candidate solutions Rmax, Maximum update time nr 2: Ouput non-dominated solutions in A 3: Set an index set of neighbour subproblems B(i) ∀i ∈ {1, . . ., N } 4: Initialize P with N initial solutions {x 1 , . . ., x N } 5: Evaluate all N initial solutions with objective functions 6: Set FE as FE ← N 7: Initialize z as z j = min x∈P f j (x) ∀j ∈ {1, . . ., M } 1: j ← min {z j , f j (y)} ∀j ∈ {1, . . ., M } 26: Replace x j with y if g(y|λ j , z) ≤ g(x j |λ j , z) ∀j ∈ P with maximum update time nr 27: if FE = FE max 28: return non-dominated solutions identified from A A candidate solution is then generated based on parent solutions x i , x r1 , and x r2 , where r 1 and r 2 are indices randomly selected from P .Then, a predicted class of y is obtained by h(y).If h(y) is +1, i.e., the "good" class, the solution generation process is terminated; otherwise, y is added to Y and y with different random values.When the size of Y reaches R max , y is set to the candidate solution closest to a decision boundary drawn by h(x); y is determined with the decision score function d(x) used in h(x) as y = arg max x∈Y d(x).

Table 2
Summary of statistical results for DR-MCEA/D with different values of β (300FEs, 21 trials).