A Surrogate-Assisted Reference Vector Guided Evolutionary Algorithm for Computationally Expensive Many-Objective Optimization

We propose a surrogate-assisted reference vector guided evolutionary algorithm (EA) for computationally expensive optimization problems with more than three objectives. The proposed algorithm is based on a recently developed EA for many-objective optimization that relies on a set of adaptive reference vectors for selection. The proposed surrogate-assisted EA (SAEA) uses Kriging to approximate each objective function to reduce the computational cost. In managing the Kriging models, the algorithm focuses on the balance of diversity and convergence by making use of the uncertainty information in the approximated objective values given by the Kriging models, the distribution of the reference vectors as well as the location of the individuals. In addition, we design a strategy for choosing data for training the Kriging model to limit the computation time without impairing the approximation accuracy. Empirical results on comparing the new algorithm with the state-of-the-art SAEAs on a number of benchmark problems demonstrate the competitiveness of the proposed algorithm.


Introduction
Many industrial optimization problems have multiple objectives to be optimized and these objectives are typically conflicting in nature, i.e. improvement in one objective will lead to deterioration of at least one of the other objectives.Such problems are known as multiobjective optimization problems (MOPs).In this paper, we consider MOPs in the following form : with k(≥ 2) objective functions f i (x): S → .The vector of objective function values is denoted by f (x) = (f 1 (x), . . ., f k (x)) T .The (nonempty) feasible space S is a subset of the decision space n and consists of decision vectors x = (x 1 , . . ., x n ) T that satisfy all the constraints.The image of the feasible region S in the objective space k is called the feasible objective set denoted by Z.The elements of Z are called feasible objective vectors denoted by f (x) or z = (z 1 , . . ., z k ) T , where z i = f i (x), i = 1, . . ., k, are the objective function values.As the objectives are conflicting, there typically does not exist a single optimal solution, but multiple so-called Pareto optimal solutions.The set of all optimal solutions in the objective space is called the Pareto front and in the decision space the Pareto set.
A large number of optimization methods have been reported in the literature.These methods can be classified into two main different fields, namely, multiple criteria decision making (MCDM) [33] and evolutionary multiobjective optimization (EMO) [10,13].Methods either find a representative set of Pareto optimal solutions or the most preferred solution for a decision maker and they differ in the way the solutions are obtained.For instance, in the MCDM community, an MOP is often scalarized into a single objective optimization problem.Moreover, a decision maker is usually involved in the solution process to identify preferred solutions.On the other hand, EMO algorithms work with a population of candidate solutions and often aim to find a set of solutions representing the whole Pareto front.The decision maker is involved usually after a set of Pareto optimal solution is found [42].
Evolutionary algorithms (EAs) have become popular in past decades due to several advantages.For example, they have the ability to handle different kinds of decision variables e.g.binary, integer, real or mixed and they do not assume any convexity or differentiability of objective functions and/or constraints involved.Despite of these advantages, EAs are often criticized because of slow convergence and a large number of function evaluations needed before an acceptable solution can be found.For instance, in aerodynamic optimization, one function evaluation involving computational fluid dynamics simulations may take substantial amount of time and it will become computationally prohibitive to use an EA to solve aerodynamic optimization problems.
One of the popular approaches to reduce computation time in evolutionary optimization is to introduce approximations, especially function approximation.Computational models for functional approximation are often known as surrogates and EAs using objective values estimated by surrogates are often referred to as surrogate-assisted evolutionary algorithm (SAEAs).A surrogate is also known as a metamodel in the literature, which can in part replace the computationally expensive objective functions.For more details about SAEAs, see [9,24,43].
Although numerous algorithms have been proposed in using surrogates in an EA, many challenges remain.One is the choice of the surrogate, as different types of surrogate techniques exist in the literature, e.g.Kriging, neural networks, support vector regression and polynomial approximation.Nevertheless, there is no simple rule for choosing the right type of surrogates for approximating the given computationally expensive objective or constraint functions.A second challenge is how to use a surrogate, i.e. what to approximate using the surrogate.The most conventional way is to approximate the objective or constraint functions.In addition, a surrogate can be used to estimate the rank of individuals [31], or some other quality measure, e.g.distance to the nondominated solutions [38,39,40], or hypervolume [2].The third challenge is the computational cost for constructing the surrogate, which is often neglected in SAEAs.In practical, training a surrogate may take a substantial amount of time, and the main aim of reducing computation time will be jeopardized.The fourth challenge is how to update the surrogate i.e. how to choose individuals in the current population to be re-evaluated using the original functions.Different ways exist in the literature for selecting the individuals, e.g.selecting a set of best solutions [25] or nondominated solutions [17] according to the surrogate and selecting a set of representative solutions [27].If the Kriging model is used, one can select solutions that maximize the expected improvement [47], the probability of improvement [12] and hypervolume improvement [18].Selection of individuals to be re-evaluated is also called updating criterion or infilling criterion.The fifth challenge is to determine when the surrogate needs to be updated.For instance, it may be possible that a surrogate is accurate enough and does not need to be updated even if new training samples are available.
In SAEAs, which individuals are to be re-evaluated using the original objective functions, how to update the surrogate and when to update the surrogate are referred to as model management, which is also known as evolution control [26].In [26], two methods were mentioned for managing the surrogate, i.e., fixed evolution control and adaptive evolution control.In fixed evolution control, updating the surrogate is based on a prefixed criterion, while in adaptive evolution control, a surrogate is updated based on its performance.
As pointed out in [9], little work has been reported on using SAEAs for solving computationally expensive problems having more than three objectives.During the years 2008-2015, only three algorithms [6,38,41] have been tested on multi-objective benchmark problems with more than three objectives.While many industrial problems, e.g., optimization of the controller of a hybrid car [36], involve more than three computationally expensive objectives, surrogate-assisted evolutionary optimization of many-objective problems has not attracted much attention in the evolutionary computation community and SAEAs developed so far cannot be directly extended to many-objective optimization.Therefore, our work is an effort to fill this gap.
Apart from the challenges resulting from the large number of objectives, it is notoriously difficult to achieve high-quality surrogates for large scale optimization problems due to the curse of dimensionality.For this reason, the number of decision variables SAEAs have handled is by far up to 50.According to a recent survey [9], only seven SAEAs have been tested on optimization problems with more than 20 decision variables and six of them were benchmarks.Note that SAEAs using neural networks as the surrogate were tested on more than 20 variables, while SAEAs using Kriging models as the surrogate have been used to solve problems with up to eight decision variables.This can be attributed to the fact that the computational time for training the Kriging model will become too long when the number of training samples increases [35].
This paper focuses on developing an efficient SAEA for solving computationally expensive manyobjective optimization problems.One of the major reasons limiting the applicability of existing algorithms to many-objective optimization is the lack of an efficient surrogate management method suited for the evolutionary algorithm used.In SAEAs when managing the surrogates, individuals should be selected by taking into account of both convergence and diversity.To select such individuals, surrogates need to be seamlessly embedded into the evolutionary algorithm.Most existing SAEAs are dominance based and thus are not well suited for handling many objectives.Therefore, the major contribution of the paper is to propose an efficient algorithm to manage the surrogates for handling a large number of objectives.To this end, we adopt the reference vector guided evolutionary algorithm (RVEA) [8] for many-objective optimization to be used as an evolutionary algorithm.Two sets of reference vectors adaptive and fixed, together with uncertainty information from the Kriging models as well as the location of the individuals are exploited for surrogate management.To limit the computation time for training the Kriging models, a strategy for choosing training samples is proposed so that the maximum number of training data is fixed.
The rest of the paper is organized as follows.In Section 2, we provide a relatively detailed description of RVEA as well as Kriging models so that the paper is self-contained.The Kriging assisted RVEA, called K-RVEA is introduced in Section 3. Section 4 presents the numerical results of K-RVEA on benchmark problems and compared them with a few state-of-the-art SAEAs.Finally, conclusions are drawn and future work briefly discussed in Section 5.

Background
In this section, we first summarize main components of RVEA, which we use as the underlying evolutionary algorithm.Next, we present a brief description of Kriging, including a discussion about its advantages and disadvantages over other surrogate models.

Reference vector guided evolutionary algorithm
Two major difficulties in solving problems with high number of objectives are convergence to the Pareto front and maintaining a good diversity between solutions.Several evolutionary algorithms have been proposed for solving many-objective optimization problems, by, for instance, using a revised dominance relationship, decomposing the multi-objective optimization into several single objective optimization problems, an indicator-based objective function, or using reference points.
For more details about these algorithms and challenges in solving problems with more than three objectives, see [23,29,45].
RVEA is an EMO algorithm most recently developed for many-objective optimization [8].While MOEA/D [50] and NSGA-III [14] use a set of weights and reference points, respectively, RVEA adopts a set of reference vectors.The main difference between RVEA and the MOEA/D and NSGA-III lies in its selection strategy.In RVEA, selection is based on a criterion known as angle penalized distance (APD), which is used to manage both convergence and diversity.It has been shown [8] that APD is better scalable to the increase in the number of objectives in maintaining a balance between convergence and diversity.APD relies on a set of reference vectors, which partitions the objective space into a number of subspaces, where selection of individuals is performed independently.The main components of RVEA are presented in Algorithm 1.

Algorithm 1 RVEA
Input: t max = maximum number of generations; N = number of reference vectors; V 0 = {v 01 , v 02 , . . .v 0N } a set of unit reference vectors; Output: nondominated solutions from population P tmax 1: Create an initial population P 0 of size N randomly and set generation counter t = 0 2: while t < t max do 3: Generate offspring Q t 4: Combine parent and offspring populations, L t = P t ∪ Q t 5: Select parents (P t+1 ) for the next generation 6: Update reference vectors (V t+1 ) 7: end while RVEA uses elitism and offspring generation strategies similar to other state-of-the-art EMO algorithms such as NSGA-II [15] and NSGA-III [14].RVEA distinguishes itself with NSGA-III in its selection strategy and the adaptation of reference vectors, which are Steps 5 and 6 in Algorithm 1.In the following, we present the four main components of RVEA, i.e., generation of reference vectors, assignment of individuals to reference vectors, selection and adaptation of reference vectors.

Generation of reference vectors
RVEA uses a set of reference vectors in the objective space to guide the search process.To generate a uniformly distributed set of reference vectors, first a set of uniformly distributed reference points (p) is generated on a unit hyperplane using the canonical simplex-lattice design method [11,7].
where i = 1, 2, . . ., N with N being the number of uniformly distributed points, k is the number of objectives and H is a positive integer used in simplex-lattice design.An example is shown in Figure 1 for a biobjective optimization problem, where the dots represent the reference points generated on a unit hyperplane.The corresponding reference vector is then obtained by projecting the reference points from the hyperplane to a hypersphere where ||p i || represents the L 2 -norm of p i .As a result, these reference vectors partition the objective space into a number of subspaces.In the next subsection, we describe how the individuals in a population are assigned to these reference vectors and how the population is partitioned into different subpopulations.

Assignment of individuals to reference vectors
After the generation of the reference vectors, individuals are assigned to them as follows.First, objective values of all individuals at the current generation are are translated, i.e. f j i = f j i −f * i , where f j i represents the objective value of f i for the j th individual and f * i the minimum objective values of f i at the current generation.We denote the translated objective vector by f = ( f1 , f2 , . . ., fk ).Translation of objective functions ensures that the initial point of reference vectors is always the origin and all the translated objective values are inside the positive orthant.After the translation, the acute angle is measured between an individual and all the reference vectors.For instance, let us consider the situation shown in Figure 2, with two reference vectors v i and v i+1 , and three individual f 1 , f 2 and f 3 .As the angle θ 1 i between the individual f 1 and the reference vector v i is less than the angle θ 1 i+1 between the individual and the other reference vector v i+1 , this individual is assigned to the first reference vector v i .Similarly, f 2 and f 3 will be assigned to reference vector v i and v i+1 , respectively.Therefore, an individual is assigned to a reference vectors if and only if the angle between it and the reference vector is minimum among all reference vectors.In this way, assignment of individuals to the reference vectors partitions the population into subpopulations.Other notations in Figure 2 are used in the next subsection.

Selection mechanism in each subpopulation
After the generation of reference vectors and the assignment of individuals them, one individual is selected from each subpopulation (Step 5 in Algorithm 1).The selection criterion consists of two subcriteria that are meant for managing convergence and diversity, respectively.Convergence is taken care by the distance between the translated objective vector and the origin.As selection is performed in each subpopulation independently, let us take the subpopulation corresponding to reference vector v i .In the illustrative example shown in Figure 2, two individuals f 1 and f 2 are assigned to the reference vector v i and the distance between them and the origin is denoted by || f 1 || and || f 2 ||, respectively.As the aim is to find solutions closer to the origin, individual f 1 is preferred over individual f 2 and will therefore be selected for this subpopulation.
In RVEA, diversity is accounted by the angle between the translated objective vector and the reference vector the individual is assigned to.The individual with the smallest angle is preferred over other individuals.For instance, for reference vector v i in Figure 2, individual f 1 with the angle θ 1 i is preferred over individual f 2 as θ 1 i < θ 2 i .To combine the two subcriteria for convergence and diversity, the following angle penalized distance (APD) is defined: where || f j || is the distance from the translated objective vector corresponding to the j th individual to the origin, and θ j is the angle between the j th individual and the reference vector it is assigned to.In APD, P (θ j ) is the penalty function defined as follows: where γ v is defined as the smallest angle between the reference vectors v i and its closest neighboring reference vector v j i.e. γ v = min i∈{1,...,N },i =j v i , v j .The angle γ v is used to normalize the angles and is important when the distribution of the reference vectors is either too dense or too sparse.As highly dense or sparse reference vectors may generate a very small or a very large angle between the individual and the reference vector, normalization of the angle can be helpful to alleviate this problem.Parameter α is used to change the rate of the penalty function P (θ).The rate of the penalty function is used to emphasize convergence at the early stage and diversity at the later stage of the evolutionary search process.For instance, at the early stage of solution process, convergence is preferred to push the individuals closer to the Pareto front.Once individuals have converged to the Pareto front, diversity is then preferred by distributing individuals along the Pareto front.Therefore, the rate of the penalty function depends on the current generation number t, the maximum number of generations t max and parameter α used in (5).As careful empirical studies for setting the parameter α have been performed in [8], we use the same parameter setting in this work.After calculating APD for all individuals in each subpopulation, one individual with the minimum APD value is selected from each subpopulation for the next generation.

Adaptation of reference vectors
In order to find a set of uniformly distributed nondominated individuals as close to the Pareto front as possible.For some optimization problems, e.g.those in the WFG test suite [22] where objective functions are scaled to different ranges, a uniformly distributed set of reference vectors is not best suited for getting uniformly distributed individuals.To tackle this issue, one possible solution is to adapt the reference vectors.In RVEA, reference vectors are adaptive.In other words, they change their position according to the location of individuals in the objective space.The adaptation of reference vectors for the next generation (v t+1 ) is applied in the following way: where • represents the Hadamard product [1] that multiplies two matrices of the same size elementwise, v 0,i is the uniformly generated reference vector in the initialization phase in RVEA and z max t and z min t are the maximum and minimum values of each objective function in the t th generation, respectively.The adaptation of reference vectors ensures that a set of uniformly distributed nondominated individuals will be obtained even for optimization problems whose objective functions have different ranges.

Kriging
We use Kriging, also known as Gaussian process to approximate each objective function.As per the survey on computationally expensive multiobjective optimization problems [9], Kriging has been frequently used for surrogate techniques, mainly because it is able to deliver uncertainty information of the approximated values, which is very useful in managing surrogates [24].In this work, we use uncertainty information from Kriging models to update the surrogates, which will be further discussed in the next section.
Kriging approximates the objective function value of an individual x as where µ is the prediction of a regression model F (β, x) i.e. µ = F β and (x) is a Gaussian distribution of zero mean and the standard deviation σ The regression model ) is a linear combination of l chosen functions with coefficients β.
To get an approximated value from (7) for any new input, Kriging model needs to be trained using training samples, which are the pre-evaluated individuals in SAEAs.Let matrix X = x 1 , . . ., x N I T represent the training data in the decision space with their corresponding objective vector y = y 1 , . . ., y N I T , where i = 1, 2, . . ., N I represents the number of samples or the size of the training data.One should note that the size of X is N I × n, where n is the number of decision variables, i.e., For any two arbitrary inputs x i and x j , the covariance between two random processes (x i ) and (x j ) is defined by cov[ where R is the correlation matrix of size and R(x i , x j ) is the correlation function between (x i ) and (x j ).The commonly used correlation function is where θ i , i = 1, . . ., N I denote the hyperparameters.
For a new input x, an approximated value from ( 7) can be written as where y contains the values of given N I individuals, r(x) is the correlation vector of size N I between the new input x and the training data x 1 , . . ., x N I i.e.
To obtain a new approximated value ȳ, we need to specify coefficients β and hyperparameters θ.Equation ( 12) has the generalized least square solution, and the estimated variance σ 2 is given by Values of θ are obtained by maximizing the likelihood function where det(R) is the determinant of the correlation matrix R.After getting hyperparameters θ, coefficients β and the variance σ 2 are calculated from ( 14) and (15), respectively, which are further used to approximate the objective function value from (12).While Kriging is a very attractive surrogate model due to its ability to deliver uncertainty information, it also suffers from potentially serious weaknesses resulting from the computational complexity for training surrogates.As indicated in [20], the computational complexity of training Kriging is O(n 3 ), where n is the number of training samples.The issue of high computational complexity will become worse if the hyperparameters θ are determined by maximize the likelihood function using an optimization algorithm, which has often been done in Kriging assisted SAEAs.For instance, MOEA/D-EGO uses differential evolution [44] and SMS-EGO employs covariance matrix adaptation (CMA-ES) [19] to optimize the hyperparameters, while in ParEGO, the Nelder and Mead algorithm [37] is used.
In this work, we use a modification of Hookes and Jeeves algorithm [21], which is implemented in DACE toolbox [30].The main reason is that it is not practical to use population based techniques to optimize the hyperparameters due to the prohibitively high computation time thus incurred, since in SAEAs, Kriging models need to be frequently re-trained.We will provide a brief empirical comparison in Section IV.

Surrogate-assisted reference vector guided evolutionary algorithm
Model management is crucial to the success of surrogate-assisted evolutionary algorithms [24].It is mainly concerned with how to use and update surrogates, including choosing individuals to be re-evaluated using the original objective functions.These re-evaluated individuals can then be used as training data for updating (retraining) the surrogate.Both convergence and diversity have to be taken into account in selecting individuals to be re-evaluated, which becomes more difficult for problems with a large number of objectives.In this paper, we focus on selection of training data in such a way that both convergence and diversity are managed given a large number of objectives, which is one major contribution of this paper.
The computation time for training the surrogate depends on the size of the training data set and the type of the surrogate used.The Kriging model is widely used due to its unique property of being able to predict with an error bound.Unfortunately, the computation time for training Kriging models will become prohibitive when the number of training data is large.Therefore, the second contribution of this paper is the proposal of a strategy to choose training samples so that the size of the training data can be kept sufficiently small.To this end, we use an archive to store the training data for updating the Kriging model.
The proposed Kriging-assisted reference vector guided evolutionary algorithm, called K-RVEA, is presented in Algorithm 2. This algorithm consists of three main phases, the initialization phase followed by the phase where a surrogate is used and finally updated in the last phase.Algorithm 2 is composed of two other algorithms, Algorithm 3 and Algorithm 4. Algorithm 3 selects the individuals for re-evaluation, and therefore also for updating the surrogate, while Algorithm 4 manages the training data in archive A 1 .In addition to the training archive A 1 , a second archive A 2 is used to store non-nondominated solutions as the final solution set.

Initialization
In the initialization phase, an initial population is generated e.g. using the Latin hypercube sampling [32].These individuals are evaluated with the original expensive functions and added to two archives A 1 and A 2 .Individuals stored in A 1 are used to build a Kriging model for each objective function.

Algorithm 2 K-RVEA
Input: FE max , maximum number of expensive function evaluations; u = number of individuals to be re-evaluated and to be used for updating Kriging models; w max = prefixed number of generations before updating Kriging models; Output: nondominated solutions of all evaluated ones from A 2 /*Initialization*/ 1: Create an initial population of size N I using e.g. the Latin hypercube sampling, initialize the number of function evaluations-FE = 0, the generation counter for using Kriging models w = 1 and a counter for the number of updates, t u = 0. Initialize two empty archives A 1 and A

Using the surrogate
In the phase of using the surrogate, we use Kriging models instead of the original functions to calculate objective function values.Kriging models are used for fitness evaluations for a prefixed number of generations without updating them.Empirically, the prefixed number of generations should be set in such a way that it allows the evolutionary algorithm to perform adequate search on the fitness landscape defined by the surrogate, while the search should be terminated if no further improvement in either convergence or diversity can be made.Ideally, the frequency of updating the surrogates can be made adaptive based on their performance, e.g., as done in [26].However, a rigorous guideline for adapting the frequency still lacks.For simplicity, in this work we adopt a prefixed frequency based on a sensitivity analysis of the performance on the prefixed number of generations.
Once the function evaluations are done, simulated binary crossover and polynomial mutations are applied to generate offspring, similar to [15].The parent and offspring populations are combined and then the selection criteria in RVEA detailed in Section 2 are used to select parents for the next generation.

Updating the surrogate
After an evolutionary search using the Kriging models for a fixed number of generations, the Kriging models will be updated.As previously mentioned, selection of individuals to be re-evaluated using the original functions, which will also be used for updating the surrogates, is essential for the performance of SAEAs.In this paper, we use information from the underlying evolutionary algorithm, RVEA, and uncertainty information from the Kriging models for selecting individuals to be re-evaluated and then for re-training the surrogate.As mentioned in Section 1, selection of uncertain individuals not only helps in finding the unexplored regions but can also improve the quality of the surrogate.Therefore, we select individuals with the maximum uncertainty whenever diversity is needed.If a satisfactory degree of diversity is already achieved, we select individuals with the minimum angle penalized distance, which is one of the selection criteria in RVEA that contributes to convergence.Full details of the strategy for selecting individuals for re-evaluation is given in the next subsection, also summarized in Algorithm 3. The selected individuals are then re-evaluated with the original functions and these data samples are added to both archives A 1 and A 2 .To keep a fixed number of individuals in the archive A 1 , we will eliminate extra individuals from from A 1 using a strategy to be introduced in the following.
A maximum number of function evaluations is used as the termination criterion of the evolutionary optimization process.After the evolutionary search is complete, nondominated individuals in A 2 are taken as the final solutions.

Strategy for selecting individuals for re-evaluation
In RVEA, the reference vectors are adaptive.In this work, we introduce an additional set of fixed reference vectors that are evenly distributed in the objective space.The number of fixed reference vectors is the same as the number of adaptive vectors.For convenience, we denote the fixed reference vectors by V f and the adaptive ones by V a .
The fixed reference set is mainly for determining whether diversity or convergence should be prioritized in selecting individuals for re-evaluation.This is done by comparing the number of empty vectors in the fixed reference set during the previous surrogate update, denoted by |V ia f | tu−1 and that in the current update denoted by |V ia f | tu .A vector is called empty or non-active if no individual is assigned to this vector.In case where δ > 0 is a small integer, which means the increase in the number of empty fixed reference vectors has exceeded a given threshold, diversity should be prioritized.In this case, individuals for re-evaluation should be selected based on the uncertainty information offered by the Kriging models.By contrast, if , which means that the diversity of the population is not the major concern, priority will be given to the convergence criterion.In other words, individuals should be selected using the APD according to the adaptive reference vectors.
The next step is to determine which individuals should be selected according to either the amount of uncertainty or the value of APD.To this end, we divide the active adaptive reference vectors into a given number of clusters (Step 1 in Algorithm 3), from each of which one individual that has either the maximum amount of uncertainty (in case diversity is prioritized) or the minimum APD (in case convergence is to be taken care of) in the corresponding cluster.Thus, the number of clusters is always equal to the number of individuals, denoted by u, to be selected for re-evaluation and for updating the surrogate.The process for selecting solutions to be re-evaluation described above is summarized in Algorithm 3.
To elaborate the above selection process, let us consider a few different situations shown in Figure 4 for a biobjective optimization problem, where the fixed reference vectors as well as the individuals (denoted by dots) associated to each vector are shown in updating Kriging models.Figure 4(a) illustrate the fixed reference vectors and the individuals assigned to them during the previous update, i.e., at the counter for updating the surrogate t u − 1.Two different cases at the current update, i.e., at t u , are shown in Figure 4 always used for selecting individuals for re-evaluation.
In the following, we describe the strategy for managing the training data in A 1 , when the number of available training data is large than the predefined maximum number defined by the size of A 1 .

Managing the training data
In order to limit the computation time for re-training the Kriging models, the number of training data in archive A 1 is fixed.To this end, some data need to be discarded if the number of available training data is larger than the archive size.Which data samples should be kept in A 1 becomes critical for the quality of the Kriging model and thus the overall performance of K-RVEA.In this work, the maximum size of the training data, i.e., the size of archive We first add recently evaluated individuals (u) obtained with Algorithm 3 to archive A 1 and remove duplicate data points from it (step 1 in Algorithm 4).If the number of training data is still larger than the archive size, we eliminate some training samples other than the recently evaluated one (step 2 in Algorithm 4).For this purpose, data points other than the recently evaluated individuals are assigned to the adaptive reference vectors.For instance, consider the situation in Figure 5 for a biobjective optimization problem.In Figure 5(a), individuals (training data) obtained with Algorithm 3 are assigned to the adaptive reference vectors V a and inactive reference vectors are identified, denoted by V ia a .Then, we assign A 1 \ u data points to these vectors V ia a and identify active reference vectors from them, as shown in Figure 5(b) (steps 4 and 5 in Algorithms 4).The active adaptive reference vectors are then grouped into N I − u clusters and data points assigned to these reference vectors in each cluster are identified (step 6 Algorithm 4).We randomly select one data point from each cluster and eliminate the rest of the data.In this way, a fixed number of diverse set of training data is maintained in A 1 , thereby to improve the quality of Kriging as much as possible while keeping computation time limited. of decision variables was set to 10.We also compared K-RVEA with representative Kriging based SAEAs, e.g.MOEA/D-EGO [51], SMS-EGO [41,46], ParEGO [28], and with the original RVEA without using the surrogates.We also included RVEA for comparison to give the reader a sense of how differently an algorithm with and without surrogates performs on computationally expensive problems.For RVEA, a population size of 50 was used and for other algorithms, the same parameters were used as recommended by the authors in the respective articles.Inverted generational distance (IGD) [3] used as the performance measure to compare different algorithms.A Wilcoxon rank sum test was used to compare the results obtained by K-RVEA and the other four algorithms at a significance level of 0.05.Results are collected in Tables 1, 2 and 3, where symbol ↑ indicates that K-RVEA performed statistically better than a compared algorithm, and ↓ means that other algorithms performed better than K-RVEA, while ≈ means that there is no significant difference between the results obtained by K-RVEA and the other algorithm.One should note that for a given number of decision variables, the landscape of the problems is getting easier as the number of objectives increases.This is due to the fact that the number of decision variables for driving convergence decreases with the increase in the number of objectives [22].Therefore, we have added the WFG suite [22] in our experiments and present the results in the supplementary material.The results obtained with the four compared algorithms over 25 independent runs are collected in Table 1.No results are given for ParEGO for more than four objectives as the current implementation given by the authors of ParEGO was limited to four objectives, which is denoted by 'NA' in the tables.The presented results include the minimum, mean and maximum values of IGD.The best values are highlighted.

Performance on DTLZ problems
Before we discuss the results, we want to mention an important issue in measuring the performance of many-objective evolutionary algorithms.IGD and hypervolume are two widely used performance indicators.However, the evaluation result may heavily depends on their parameters, particularly when the number of objectives is large.For instance, the IGD value is very sensitive to the size of the reference set, which has not been explicitly discussed.In [4,48], 100000 reference points were used irrespective of the number of objectives, while in [52], only 500 reference points were used.In [5], different sizes for the reference set were used for different numbers of objectives.
Here, we also use different sizes of the reference set for different numbers of objectives, as we believe more reference points are needed as the number of objectives increases, referring to the Supplementary material for details.Similarly, different criteria for setting the reference point in calculating the hypervolume have been adopted in different papers.It has been found in [49] that choosing a reference point slightly better than the nadir point is able to strike a balance between convergence and diversity of the solution set.Therefore, in this work, we use the worst objective function values of the non-dominated solutions found by all compared algorithms plus a small threshold.The detailed comparative results in terms of hypervolume are provided in the Supplementary material due to the page limit.We must emphasize that how to fairly compare the performance of EAs for many-objective optimization has received little attention in the literature and deserves more research.
As can be seen in Table 1, overall, K-RVEA performed the best among all algorithms compared in this study, except on DTLZ1 and DTLZ3.We surmise that DTLZ1 and DTLZ3 have many local Pareto optimal solutions.Both RVEA and K-RVEA try to keep a well-distributed set of solutions because of the distribution in the reference vectors and, therefore, the convergence rate is relatively slow on these problems.Nondominated solutions of the run producing the best IGD values for K-RVEA, RVEA and ParEGO for DTLZ1 are shown in Figure 6.As can be seen, solutions of K-RVEA and RVEA are better distributed than those of ParEGO.However, the IGD values of all three algorithms are high, in other words, the solutions obtained by these algorithms are all far from the true Pareto front.These results indicate that solving such problems requires more function evaluations to reach the Pareto front.
To compare the results with SMS-EGO, we selected a different stopping criterion.As SMS-EGO tries to maximize the expected hypervolume improvement and was implemented in MATLAB, it took about seven hours to complete one run on a computer with the i5 processor and 4GB RAM.The needed large amount of computation time of SMS-EGO was also mentioned in [51], for which reason the authors compared their algorithm with SMS-EGO only on two test problems.In this paper, we allowed SMS-EGO to run for 24 hours with 10 parallel runs and stored all the solutions.The number of function evaluations reached during this time was used as the stopping criterion for comparison with the other algorithms.The results for three and four objectives are obtained with 120 and 115 function evaluations, respectively, which are presented in Table 2.These results obtained with a small number of function evaluations show that K-RVEA performed better than the compared algorithms.We did not compare K-RVEA with SMS-EGO for problems within more than four objectives, as SMS-EGO requires even more time for such problems.
The comparison of K-RVEA and MOEA/D-EGO for three objective functions after 300 original function evaluations is shown in Table 3.An implementation of MOEA/D-EGO from the authors was available for only two and three objectives.As can be seen in Table 3, K-RVEA performed either better or comparably to MOEA/D-EGO for all problems except on DTLZ5.As the Pareto front of DTLZ5 is a curve that covers a small subspace in the objective space, most of the reference vectors in K-RVEA and RVEA are empty, i.e., no solutions are assigned to them.We observed that for both K-RVEA and RVEA, almost 70% of the reference vectors are empty, which makes the solution process to converge slowly to the Pareto front.For such problems, a large number of reference vectors could be helpful while using K-RVEA.Nondominated solutions from the run with the best IGD values obtained by the compared algorithms on the three-objective DTLZ7 are shown in Figure 7.As can be seen from the figure, nondominated solutions obtained of K-RVEA and RVEA are much closer to the Pareto front than those of ParEGO and MOEA/D-EGO.For DTLZ7 which has a disconnected Pareto front, RVEA and K-RVEA have a good potential to get solutions close to the Pareto front because of the adaptation in the reference vectors.Remind that we did not run SMS-EGO for optimization problems with more than four objectives as the runtime is prohibitive.In addition, a parallel coordinates plot for DTLZ2 with 10 objectives is presented in Figure 8, which we can see that the ranges of solutions obtained by K-RVEA are bigger than those obtained by RVEA.In other words, the solutions obtained by K-RVEA have a better distribution.The reason for a lower density of the solutions obtained by RVEA is due to the small population size.As the maximum number of function evalu-   ations for termination is quite low and the performance of RVEA depends on the maximum number of generations, we reduced the population of RVEA to 50.To more convincingly demonstrate the performance of K-RVEA, we tested and compared the algorithm on the WFG problems [22].Results in terms of both IGD and hypervolume are provided in the Supplementary material, which show that K-RVEA was able to perform better than the compared algorithms.
As In what follows, we consider the effect of different parameters in K-RVEA.The parameter δ in Algorithm 2 is used to select individuals for updating surrogates to balance between convergence and diversity.We did a sensitivity analysis to see the effect of δ on the diversity and the performance of the algorithm.As diversity in both RVEA and K-RVEA can easily be measured using reference vectors, we studied the effect of δ by measuring the change in the number of empty reference vectors.We also measured the hypervolume to see the effect on the performance of the algorithm and provide the results in the supplementary material.As expected, increase in the value of δ deteriorates the diversity and thus the hypervolume.This is due to fact that frequency of using uncertainty information from Kriging models decreases with the increase in value of δ.In other words, if the change in the number of the empty reference vectors is smaller than δ, individuals are selected based on convergence, otherwise based on uncertainty of Kriging models.Increase in the value of δ will force the algorithm to select the individuals based on convergence and thus deteriorates the diversity.In this article we fixed the value of δ to be 0.05 × N, where N is the number of reference vectors used and adaptation of δ will be our future work.
Apart from δ, two other parameters can also influence the performance of K-RVEA, which are the number of individuals to be selected to update the surrogates and the number of generations (w max ) used before updating the surrogates.The first parameter depends on the characteristics of the problem, e.g., multi-modality, and the evolutionary algorithm used.We study the effect of the parameter in the Supplementary material.Our results show that the value of the parameter is problem-specific and an adaptive way of using the parameter is needed.For the second parameter i.e. the frequency of updating the surrogates or when to update the surrogate is very important in surrogate management, although, unfortunately, there is no solid theory for guiding when to update the surrogates.We have performed a sensitivity analysis on the performance of K-RVEA given different prefixed numbers of generations before the Kriging models are re-trained.The results are also included in the Supplementary material.
We also tested K-RVEA, RVEA, ParEGO and MOEA/D-EGO on a three objective real-world polymerization problem [34].Even though K-RVEA and RVEA have been proposed for more than three objectives, they still performed better in solution quality and computation time than the compared algorithms.Details and results for this problem are given in the supplementary material.

Conclusions and future research
In this paper, a Kriging-assisted reference vector guided evolutionary algorithm, called K-RVEA, has been proposed for solving computationally expensive optimization problems with more than three objectives, where a Kriging model is used to approximate each objective function.We take care of both convergence and diversity in choosing individuals for re-evaluation with the original expensive objective functions.For this purpose, we introduced a set of fixed, uniformly distributed reference vectors in addition to the adaptive reference vectors in RVEA.In updating the Kriging models, attention is paid to limiting the computation time for training the surrogate by means of selecting training samples according to their relationships to the reference vectors, thereby limiting the number of training data.
We have examined the performance of K-RVEA on benchmark problems with 3, 4, 6, 8 and 10 objectives.We also compared K-RVEA with three state-of-the art SAEAs.Empirical results show that overall the K-RVEA obtained much better performance than the compared algorithms given the same number of function evaluations using the original expensive objective function.
In this paper, the number of decision variables was set to 10, as done in other papers in the literature that use Kriging as the surrogate model.This can be attributed to the factor that for solving optimization problems with a higher number of decision variables, much more training data will be needed, which will not only require more computational resource but also poses more serious challenges to Kriging based surrogate techniques.Nevertheless, it is highly desired that SAEAs can be applicable to optimization problems having a larger number of decision variables, which will be our future work.Another topic for future study is to investigate the performance of the proposed K-RVEA for constrained computationally expensive optimization problems.Finally, in the present work, we fixed the number of generations for updating the surrogates.Although our empirical results indicate that the performance of K-RVEA is relatively insensitive to the frequency of updating the surrogates for the benchmark problems studied in this work, our previous work [26] indicated that it is likely to adapt the frequency of updating the surrogates to further enhance the performance of SAEAs.Consequently, developing an adaptive method for updating the surrogates will also be our future research work.
Supplementary material -A Surrogate-assisted Reference Vector Guided Evolutionary Algorithm for Computationally Expensive Many-objective Optimization In the supplementary material, we provide the parameter values for the number of reference vectors, size of the reference set to calculate IGD, results on DTLZ problems using hypervolume and on WFG problems with both IGD and hypervolume as quality metrics.In addition, results of K-RVEA, RVEA, ParEGO and MOEA/D-EGO on a free radical polymerization system are also presented.Moreover, a sensitivity analysis of three important parameters used in K-RVEA is also provided.

Number of reference vectors
The number of reference vectors (N ) in K-RVEA is determined by the design factors (H 1 , H 2 ) for a simplex-lattice design [2] and the number of objectives (k ).The parameter values used (from [1]) are listed in Table 1 for the different numbers of objectives.3 Performance on the DTLZ suite Here, we summarize the performances of the different algorithms compared with hypervolume as a performance metric.We used the worst objective function values from nondominated solutions of all algorithms plus a small threshold as a reference point (f * i ).For less than eight objectives, a recently proposed method from [7] was used to calculate the hypervolume.For eight and 10 objectives, the Monte Carlo method with 1000000 sample points was used to approximate the hypervolume.The hypervolume values obtained were normalized by dividing with k i=1 f * i and are shown in Table 3.As can be seen, the performances of different algorithms when measured with the hypervolume are very similar to those with IGD values and overall, K-RVEA performed better than the other algorithms.Note that due to the degenerated Pareto front of DTLZ5, most of the reference vectors in both RVEA and K-RVEA are empty and, therefore, the performance of K-RVEA is worse than that of ParEGO.Results with K-RVEA and MOEA/D-EGO for problems with three objectives are given in Table 4.As can be seen, K-RVEA performed either better or equivalently to MOEA/D-EGO except on DTLZ5.Note that we did not test SMS-EGO here due to its computational overhead, which is also mentioned in the main paper.

Performance on the WFG suite
To be able to show the potential of K-RVEA with different problems, we extended the test set up by also considering the WFG test suite [3].Besides as in, some of the DTLZ problems the number of decision variables for driving the convergence decreases with the increase in the number of objectives, it is important to test with other types of problems as well.Values of two different types of parameters i.e. position (d ) and distance (l ) used in WFG problems with the number of objectives are shown in Table 5.
Results with K-RVEA, RVEA and ParEGO using IGD and hypervolume are provided in Tables 6 and 7, respectively.We used the same settings for the size of the reference set to calculate IGD and for the reference point to calculate hypervolume as mentioned in the previous sections.The problem WFG2 has a disconnected Pareto front and it is interesting to note that K-RVEA handled problems with disconnected Pareto front well and outperformed others with both hypervolume and IGD performance metrics.However in WFG3, the degenerated Pareto front caused the most of the reference vectors to be empty, thereby leading to a bad performance.Problems WFG4-WFG9 possess several challenges to the optimization algorithm in the decision space such as multimodality for WFG4, landscape deception for WFG5 and non-separability for WFG6, WFG8 and WFG9 and as can be seen from the results, K-RVEA performed better than the other algorithms.Results with K-RVEA and MOEA/D-EGO for three objectives are provided in Tables 8 and 9, respectively.For these problems as well, K-RVEA performed similarly or better than MOEA/D-EGO.5 Performance on free-radical polymerization of Vinyl acetate In the free radical polymerization of Vinyl acetate, the main aim is to find the optimal process conditions to obtain a polymer with specific properties.A high molecular weight is usually linked with a high melt strength and a low melt flow index [6].In addition, a homogeneous distribution of molecular mass polymer is required in short processing time.Polyvinyl acetate has a high molecular weight and often known as wood glue and processed in a batch reactor.This problem is computationally expensive and an average computation time for one function evaluation is around 45 minutes.The reason for the high computation time is the slow reaction rate after a certain point of time which is termed as gelation.For more details about kinetics and a study of multiobjective optimization for this polymerization, see [5].In this study, we optimize: 1. maximize average molecular weight M w , 2. minimize polydispersity index PDI and 3. minimize polymerization time t poly .
Decision variables or the process conditions are monomer concentration, initiator concentration, polymerization time and temperature.Bounds for these decision variables are given in Table 10.
We ran this process for 250 function evaluations with K-RVEA, RVEA, ParEGO and MOEA/D-EGO.
Nondominated solutions from all four algorithms are shown in Figure 1.As can be seen, solutions from K-RVEA are better distributed than those of RVEA and ParEGO.Solutions from MOEA/D-EGO are also well distributed but some of them are dominated by those from K-RVEA.To compare the results statistically, we combined nondominated solutions from all four algorithms and clustered them into a prefixed number.One individual from each cluster closest to the centroid was selected and all selected individuals were used as the reference set to calculate IGD values.The IGD values thus obtained are given in Table 11.As can be seen, K-RVEA was able to obtain of a better quality solutions than the other algorithms in the given number of function evaluations.

Sensitivity analysis of parameters in K-RVEA
We provide sensitivity analysis of three important parameters used in K-RVEA.The first analysis is for parameter δ used to select individuals based on the needs of convergence and diversity.The second is for the number of individuals (Nu) to be selected for updating the surrogates.The third is for the number of generations (w max ) used before updating the surrogates.

Effect of parameter δ
The parameter δ is effective whenever surrogates are updated.The main motivation for using δ is to reduce the number of empty reference vectors by exploiting the uncertainty information from Kriging models.In K-RVEA, when updating surrogates, the change in the number of empty reference vectors is measured from the previous update and if this change is more than δ, uncertainty is used.In other words, whenever surrogates are updated based on the uncertainty, the number of empty reference vectors is reduced.An increase in the value of δ will decrease the frequency of using uncertainty or decreasing the use of uncertainty from Kriging models will increase the number of empty reference vectors.To elaborate, we used different values of δ i.e. (0.05, 0.3, 0.5, 0.7, 1) × N, where N is the number of reference vectors used and measured the number of times uncertainty from Kriging models was used.Results on WFG1 with different numbers of objectives are provided in the first row of Figure 2, where NK denotes the frequency of using uncertainty.As can be seen, the increase in the value of δ decreases the frequency of using uncertainty from the Kriging models.
To see the effect on the reference vectors, we measured the change in the number of empty reference vectors (denoted by NR) whenever uncertainty was used.The Results are given in the second row of Figure 2 which shows the average change in the number of empty reference vectors (denoted by NR in the figure).For instance, if surrogates were updated with uncertainty five times in the solution process and the change in the number of empty reference vectors was (5, 10, 3, 1, 1), the average i.e. four is shown.Moreover, no change was observed at δ = 1 because surrogates were never updated using uncertainty information.In addition, the change was always a positive integer whenever uncertainty information was used, which indicates that the number of empty reference vectors decreased.
To see the effect of δ on the performance of the algorithm, we also measured the hypervolume for different numbers of objectives and present the results in Figure 3.As can be seen, hypervolume decreases with the increase in the value of δ.This is due to the fact that the frequency of using uncertainty information from Kriging models decreases with the increase in the value of δ.We also observed the similar behavior for other problems.In addition, the effect of the parameter δ depends on the number of reference vectors and the problem solved.Using δ in an adaptive way is a future research topic, however, in the current study, we kept it fixed as 0.05 ×N .

Effect of the number of individuals to be selected to update surrogates
In K-RVEA, surrogates need to be updated by selecting some individuals for re-evaluation using the original functions.These individuals should be selected in such a way that both convergence and diversity are taken into account.The number of individuals to be selected can be important and mainly depends on the problem solved.We performed a sensitivity analysis with numbers 2, 5, 10, 20 and 30 on the problem WFG9 for different numbers of objectives and the results are given in Figure 4.As can be seen, for up to eight objectives, an increase in the number of individuals (denoted by Nu) decreased the performance of the algorithm.In contrast, with 10 objectives, the performance was improved with an increase in the number.
As the total number of function evaluations is set as a constant, increasing the number of individuals for updating the surrogate will decrease the frequency of updating the surrogates.For instance, if the total number of function evaluations is 300 and the numbers of individuals for updating the surrogates are 2 and 10, then surrogates are updated 150 and 30 times, respectively.In other words, the number of times surrogates are used with an evolutionary algorithm is bigger in case of a low number of individuals.Using a low number of individuals thus may be helpful to achieve a good approximation of the Pareto front.On the other hand, a low number may not be enough and individuals selected do not necessarily contribute to the performance of the surrogates.In contrast, using a high number will reduce the frequency of using surrogates with an evolutionary algorithm and may be helpful to improve the performance of the surrogates.Therefore, this parameter depends on the type of problem solved and should be adaptive.However, in this paper we kept it fixed as five.

Effect of the number of generations before updating the surrogate
The frequency of updating the surrogate or when to update the surrogate is very important in surrogate management, although, unfortunately, there is no solid theory for when to update the surrogates.We, therefore, present here empirical studies on the performance of K-RVEA given

2 Figure 1 :
Figure 1: An illustrative example of reference vectors for a biobjective optimization problem.

Figure 2 :
Figure 2: An illustration for assignment of individuals to a reference vector assignment and calculation of the selection criteria.

Figure 3 :
Figure 3: Clustering of active adaptive reference vectors V a into a predefined number of clusters u.

3 4 : 2 Figure 4 :
Figure 4: Different cases for fixed reference vectors while updating Kriging models.(a): An example of the fixed reference vectors, (b): An example showing the fixed reference vectors at the current update, where the number of inactive reference vectors has decreased compared to the previous update in (a), (c): An example showing the fixed reference vectors at the current update, where the number of inactive reference vectors has increased compared to the previous update in (a).

A 1 ,Algorithm 4 4 : 5 : 6 :
is set to N I .The main steps for managing training data archive A 1 are presented in Algorithm 4. Managing individuals in the archive Input: Archive A 1 , adaptive reference vectors V a , u= individuals selected to update Kriging models, N I =maximum number of individuals in the archive A 1 Output: Updated individuals in A 1 1: Remove duplicate individuals from the training archive A 1 and update |A 1 | 2: If |A 1 | > N I 3: Assign u individuals to V a and identify inactive adaptive reference vectors, denote these reference vectors by V ia a Assign A 1 \ u individuals i.e. individuals other than recently evaluated ones in the archive to V ia a Identify active reference vectors from inactive adaptive reference vectors V ia a Cluster active set of inactive adaptive reference vectors V ia a into N I − u clusters 7: Select one individual from each cluster randomly and remove other individuals 11: end if

Figure 5 :
Figure 5: Managing training data in the archive A 1 , (a):Assignment of the recently evaluated individuals u to the active adaptive reference vectors V a , (b):Assignment of A 1 \ u individuals to inactive adaptive reference vectors V ia a .

1 . 1 2. 5 5.
Number of individuals to be evaluated using the original objective functions in the initialization phase (from the literature [28, 51]) = maximum number of individuals in the training archive, N I = 11n − Number of independent runs = 10 3. Maximum number of function evaluations, F E max = 300 4. Number of individuals to update Kriging models, |u| = Number of reference vectors (N ): number of reference vectors is determined by the design factors for simplex-lattice design [11] and the number of objectives and listed in the supplementary material 6. Parameter while updating the Kriging models δ = 0.05N 7. Number of generations before updating the Kriging models w max = 20.

Figure 6 :
Figure 6: Nondominated solutions obtained with K-RVEA, RVEA and ParEGO of the run with the best IGD value for three-objective DTLZ1, which are all very far away from the true Pareto front.

Figure 7 :
Figure 7: Nondominated solutions obtained by K-RVEA, RVEA, ParEGO and MOEA/D-EGO, denoted by circles, in the run with the best IGD value for three-objective DTLZ7, where the dots represent the Pareto front.

Figure 8 :
Figure 8: Parallel coordinate plot of the nondominated solutions obtained by K-RVEA and RVEA in the run with the best IGD value on the 10-objective DTLZ2.

Figure 9 :
Figure 9: Training time over the number of function evaluations in K-RVEA, ParEGO, SMS-EGO and MOEA/D-EGO on the 3-objective DTLZ2.
mentioned in Section II.B, the computation time for training Kriging models varies a lot depending on the specific implementation and the number of training, which may become prohibitive large.One contribution of K-RVEA is the development of a strategy to select training samples reference vectors, the number of samples for training Kriging models is kept constant, the computation time for training K-RVEA is remains constant as the number of expensive fitness evaluations increases.To empirically verify this, the run time of the different implementations in the compared SAEAs for training the Kriging model has been investigated.The results over the number of training samples obtained on the 3-objective DTLZ2 are shown in Figure 9.We can observe that the training time of K-RVEA remains constant, as the maximum number of training samples is kept constant, the training time for ParEGO increases slightly.In contrast, the training time of MOEA/D-EGO increases quickly over the number of training samples as a piecewise continues function.As already mentioned, the computation time of SMS-EGO increases dramatically over the number of training samples.Note however that SMS-EGO and K-RVEA are implemented in MATLAB, ParEGO is implemented in C and MOEA/D in Java.Therefore, the absolute times used by the different algorithms may not be directly comparable, although the different behaviours of the change in training time over the number allowed true function evaluations are of more interest.

Figure 2 :Figure 3 :Figure 4 :
Figure 2: Effect of parameter δ used in K-RVEA on the frequency of using uncertainty information and changes in the number of empty reference vectors 2 , i.e. |A 1 | = |A 2 | = φ 2: Evaluate the initial population with the original functions and add them to A 1 and A 2 , update FE = FE +N I , update |A 1 | = |A 1 | + N I and |A 2 | = |A 2 | + N I 3: Train a Kriging model for each objective function by using individuals in A 1 4: while FE ≤ FE max 1 and A 2 and update |A 1 | = |A 1 | + u and |A 2 | = |A 2 | + u 10: Remove |A 1 | − N I individuals from A 1 using Algorithm 4, update w = 1 and t u = t u + 1 and go to step 3 end while

Table 1 :
Statistical results for IGD values obtained by K-RVEA, RVEA and ParEGO.The best results are highlighted

Table 2 :
Statistical results for IGD values obtained by K-RVEA, RVEA, ParEGO, SMS-EGO and MOEA/D-EGO for three and four objectives with 120 and 115 function evaluations, respectively.The best results are highlighted

Table 3 :
Statistical results for IGD values obtained by K-RVEA and MOEA/D-EGO for three objectives after 300 function evaluations.The best results are highlighted

Table 1 :
Numbers (N ) of reference vectorsIn this paper, we use a different size of a reference set for different numbers of objectives to calculate IGD values and the sizes are presented in Table2.Test problems DTLZ7 and WFG2 have disconnected Pareto fronts and, therefore the size of the reference set is different from the other problems.In addition, specific number like 10000 for four objectives cannot be generated with the method used[4]for reference set generation and for this reason the number closest to 5000, 10000, 30000, 50000 and 90000 is used for different numbers of objectives. 1

Table 2 :
Size of reference set to calculate IGD

Table 3 :
Statistical results for hypervolume on the DTLZ suite obtained by K-RVEA, RVEA and ParEGO.The best results are highlighted

Table 4 :
Statistical results for hypervolume on the DTLZ suite obtained by K-RVEA and MOEA/D-EGO for three objectives.The best results are highlighted

Table 5 :
Numbers of parameters in WFG problems

Table 6 :
Statistical results for IGD values on the WFG suite obtained by K-RVEA, RVEA and ParEGO.The best results are highlighted

Table 7 :
Statistical results for hypervolume on the WFG suite obtained by K-RVEA, RVEA and ParEGO.The best results are highlighted