A data-driven surrogate-assisted evolutionary algorithm applied to a many-objective blast furnace optimization problem

ABSTRACT A new data-driven reference vector-guided evolutionary algorithm has been successfully implemented to construct surrogate models for various objectives pertinent to an industrial blast furnace. A total of eight objectives have been modeled using the operational data of the furnace using 12 process variables identified through a principal component analysis and optimized simultaneously. The capability of this algorithm to handle a large number of objectives, which has been lacking earlier, results in a more efficient setting of the operational parameters of the furnace, leading to a precisely optimized hot metal production process.


INTRODUCTION
Iron blast furnace is an immensely complex reactor and running it in an optimized fashion is a very complex task [1] . Although analytical models exist for this type of reactors that produces hot metal [2] , such models are often quite cumbersome and of limited applicability in a real-life industrial scenario. In addition, a complete understanding of the blast furnace process involves handling several objectives together, which so far has been only marginally successful [3] . Thus, it is extremely complex, if not impossible, to build a simulator for blast furnace optimization and one has to rely upon limited amount of noisy data collected in daily operations to perform optimization.
Another challenge in optimization of blast furnaces is that it involves multiple conflicting objectives, which is often known as multiobjective optimization [4] . The evolutionary algorithms have been widely used to solve multiobjective optimization problems [5] .
However, the efficacy of most multiobjective evolutionary algorithms deteriorates as the number of objectives becomes more than four [4] , which makes them less suited for blast furnace optimization. Fortunately, many-objective optimization to solve problems with more than three objectives, has received increasing attention recently and many evolutionary algorithms have been developed for such problems [3,6] .
Purely data-driven evolutionary optimization has received little attention with few exceptions. Most recently, Wang et al. [7] have also categorized data-driven optimization into two types: on-line and off-line. In on-line optimization, small amount of new data is available during the optimization while in off-line optimization, no extra data other than those in hands is available. The authors have also proposed a surrogate-based data-driven approach, capable of optimizing a trauma system involving two conflicting objectives in an evolutionary way. Although trauma system optimization belongs to offline data-driven optimization [7] , there are a large amount of data available. By contrast, as indicated by Guo et al. [8] , off-line optimization becomes extremely challenging, when amount of historical data is small and noisy. Unfortunately, blast furnace optimization that is being studied here requires off-line optimization where a very limited amount of data is available.
Data-driven evolutionary optimization when conducted off-line with a small amount of information must address the following two major challenges. First, how to construct a reliable surrogate model based on the limited amount of data and how to manage the surrogates without a true objective function, which are two most important questions in surrogate assisted evolutionary optimization [9] . Second, how to handle the several objectives simultaneously, in order to efficiently obtain a set of representative Pareto optimal solutions. Many real-world complex problems do not have any analytical functions or simulation model, and optimal solutions can only be obtained based on the available data. Moreover, collecting the data is usually very expensive and may involve a higher-level information, e.g. from a decision maker. In such cases, getting incremental data or online data can be cumbersome and expensive. Therefore, surrogates are built for the limited amount of offline data to generate the Pareto optimal solutions. In addition, solutions obtained via the offline approach can further be used to generate more incremental or online data based on the performance of the algorithm.
This article presents an application of a new off-line data driven evolutionary manyobjective optimization algorithm to blast furnace optimization to address the above mentioned two main challenges in off-line data-driven optimization. To this end, a new surrogate management strategy is incorporated in a recently proposed surrogate assisted many-objective evolutionary algorithm. Details of the algorithm will be presented below.

The Strategy For Implementingpareto Optimality For Many Objectives
In many problems, like the blast furnace problem studied during the present investigation, optimization leads to set of multiple optimal solutions, each representing its own tradeoff between the objectives. This set of solutions is known as a Pareto optimal solution and locus connecting them constitutes the Pareto front [10] . Evolutionary multiobjective optimization (EMO) algorithms that imitate the evolution process in nature to evolve a population of candidate solutions to generate a representative set of Pareto optimal solutions are commonly used for this purpose [11] . But the efficacy of most evolutionary multiobjective optimization algorithms, judged in terms of their ability to generate a diverse set of representative Pareto optimal solutions, in general, is limited to problems with up to two or three objectives [9] .
In many practical problems ranging from aircraft design [12] to molecular design [13] the number of objectives often exceeds three. Such optimization problems in the literature are referred to as many-objective optimization problems [6] . Traditional EMO algorithms developed cannot be simply used to solve problems with many-objectives due to difficulties in managing convergence and diversity. Recently many-objective optimization has received increased attention and several high performing algorithms have been proposed [6,14] . But often such algorithms are tested on benchmark problems with a substantial number of function evaluations. In real-world problems, where evaluating an objective function usually take a substantial amount of computation time, applicability of such algorithms is often not possible or recommended. To tackle this problem, several surrogate-assisted evolutionary algorithms [15] have been proposed in the literature to obtain solutions in few function evaluations. Despite the existence of such algorithms, problems with only two or three objectives are solved [9] . To elevate the scope of the surrogate assisted optimization to many-objective problems, recently, an algorithm called K-RVEA [16] was developed for computationally expensive problems, where Kriging models [17] are used in part of the computationally expensive objective functions.
This algorithm is based on a recently proposed evolutionary algorithm called RVEA [18] .
However, K-RVEA algorithm assumes that a small number of candidate solutions can be evaluated using the expensive fitness evaluation method during optimization. Therefore, we implement a data-driven RVEA to handle many objectives. The essential details of the data-driven RVEA approach are presented below.

RVEA: The Data-Driven Reference Vector Guided Evolutionary Algorithm
RVEA algorithm is adapted to optimize the problem where objective functions are built using Kriging models. In contrast to other many-objective evolutionary algorithms e.g.
NSGA-III [19] and MOEA/D [20] which use a set of reference points and weights, RVEA adopts a set of reference vectors. Another important component in RVEA which makes it efficient for many-objective problems is its selection strategy. In RVEA, selection is based on a criterion known as angle penalized distance (APD) to balance between convergence and diversity. A pseudo-code of the algorithm is presented in Figure 2. The algorithm consists of four major components, generation of reference vectors, assignment of individuals to reference vectors, selection and adaption of reference vectors.
In RVEA, the canonical simplex-lattice design method [21] is used to generate a set of uniformly distributed reference vectors in the objective space. An illustration of reference vectors is presented in Figure 1  consists of convergence and diversity and to combine these two, the following angle penalized distance (APD) is defined: where j f is the distance from the translated objective vector corresponding to the j th individual to the origin, and j is the angle between the j th individual and the reference vector it is assigned to and j P is the penalty function. After calculating APD for all individuals in each subpopulation, one individual with the minimum APD value is selected from each subpopulation for the next generation. In addition, reference vectors generated are made adaptive to handle the problems with different scales of objective function values. For a detailed description of RVEA, see [18] . We also create an archive of nondominated solutions in running RVEA which is used to obtain the final solutions.

The Blast Furnace Problem Description
As indicated before, during this study the data obtained from an industrial blast furnace was used for an off-line data driven evolutionary optimization algorithm to generate a representative set of Pareto optimal solutions for more than three objectives. The data pertaining to several months of actual operation of the furnace was used to create the surrogate models of a total of eight objectives using a total of twelve process variables for simultaneous optimization, as shown in Table 1. The total number of process variables in an actual industrial blast furnace is actually formidably large. This set of 12 variables is deemed to be significant in this study on the basis of a Principal Component Analysis (PCA), already in use in the materials domain [22] . The blast furnace always uses a surplus amount of coke. As explained in the standard texts [23] , in this reactor coke is not just the reducing agent for the iron oxides; above the melting temperature of the ore, in the so called melting and softening zone of the furnace, it remains solid and supports the enormous total weight of the charge materials above it. It also acts as a filter for soot and dust. A large excess of ore free coke is usually present at the center of the furnace, which effectively acts like a chimney and ensures optimum gas distribution throughout the furnace. The presence of excess amount of coke in the blast furnace thus is not negotiable and does not lead to any environmental concerns. Therefore, the amount of coke charged in the furnace is excluded from the list of variables considered in this study.
The data obtained from the blast furnace expectedly was noisy and contained outliers. To make the function landscape smooth with less peaks and troughs, local regression smoothing was used. In local regression smoothing a locally weighted linear regression is used to smoothen the data and each data point is associated with regression weights. The smoothened data point is calculated using the values of its neighboring data points.
Further information is available elsewhere [24] . clustering [25] are presented in Table 2. The blast furnace for which the current surrogate models were derived, apparently function at a substantially high coke rate. The optimized results indeed suggest that the coke rate can indeed be lowered even when a higher productivity is targeted. The higher productivity expectedly would require a higher volume of gas flow and a larger gas velocity at the tuyere and the optimized results indeed follow such trends.

RESULTS AND DISCUSSION
We also present a surface plot in Figure 4 using the nondominated solutions obtained, where relationship between gas flow (shown in the legend) (f 2 ) and other three objectives productivity (f 5 ), coke rate (f 6 ) and tuyere heating loss (f 1 ) is shown. As different objectives have different ranges of values, we present them on a normalized scale. As can be seen, a true conflicting nature is visible between gas glow and tuyere heat loss. In other words, heat loss will be increased to achieve a high blast furnace gas flow. On the other hand, one can also see that productivity increases and coke rate decreases with the increase in the blast furnace gas flow.
Another surface plot is presented in Figure 5 to show the effect on productivity (f 5 ) with tuyere velocity (f 3 ), heat loss (f 4 ) and carbon rate (f 8 ). As can be seen, to achieve a high productivity, one has to compromise on heat loss. Another interesting observation in Figure 5 is that a high productivity is achieved with the increase in the tuyere velocity and decrease in the carbon rate which is desirable. The studies show importance of the burden distribution on the coke rate and the present study points towards the burden composition as well. The slag basicity and flux consumption are also known factors controlling it, which are also corroborated in the present study.
A running blast furnace routinely collects information on numerous process parameters and it is always not readily known which of them actually affect any targeted objective.
In this study the significant process variables were successfully identified. Since blast furnace is a fully developed process, where any additional technological breakthrough is very unlikely, the major challenge is to run its operation optimally keeping in mind the environmental and economic requirements. Because of the increasingly stiff economic challenges and environmental regulations, an urgent requirement however remains in this field in terms of fine tuning the decision variables, so that a number of objectives, which are very often mutually conflicting, could be simultaneously optimized. This however is a very complicated task. For example, if one attempts to solve an eight objective optimization problem as reported here, through the numerical solution of the pertinent fluid flow, heat transfer or thermodynamic equations, it would be computationally prohibitive and very likely to go out of hand. The major advantage of the present approach is to replace those theoretical equations by simple surrogate models derived from the available data, which can be computed very fast without any significant loss of accuracy. The results reported here are fully reproducible and verifiable through the RVEA algorithm described in this paper.

CONCLUSIONS
The conventional blast furnace modeling strategies are based upon transport phenomena, thermodynamics or for that matter statistical techniques and RIST diagram based approaches [26,27,28,29] . Recently the efficacy of data-driven approaches involving strategies like neural network, support vector machine, fuzzy logic and certainly various evolutionary algorithms are gaining ground, as evidenced in a huge volume of blast furnace related articles in these areas, some of which are referred here [30,31,32,33,34,35,36,37] .
The present work extends a newly developed evolutionary algorithm to data-driven blast furnace optimization. The major contributions are the following: An efficient algorithm is proposed that have successfully reduced noise and removed outliers from the raw data generated from an operational blast furnace.
A total of twelve most significant process variables were identified, for the first time, out of a very large set of possible alternates that are most relevant to the eight objectives to be optimized.
A major difficulty for the operational blast furnaces is to run the process optimally, in particular when a large number of conflicting objectives need to be optimized simultaneously. Even with an evolutionary approach, handling a large number of objectives simultaneously for the optimization purpose and many of the previous studies [32,33,34] could only handle less than three objectives at a time. This work presents an efficient and novel evolutionary algorithm that is able to solve problems having eight conflicting objectives. Unlike a previous work that used the notion of k-optimality [36] to deal with a large number of objectives, the present work could implement the condition of Pareto optimality and can be further utilized in blast furnace research.
The present approach uses simple surrogate models for optimization, which makes the computation very fast and quite acceptable accurate. Data-driven models are becoming increasingly popular in other areas of ferrous production metallurgy [38] , where again till date only a small number of objectives could be handled for simultaneous optimization. The present approach therefore is of very high relevance in many practical problems in the metallurgical and materials domain discussed earlier [39] where the relevance of an evolutionary approach [40,41] is already well established.   and other three objectives, namely productivity (f 5 ), coke rate (f 6 ) and tuyere heat loss (f 1 ) Figure 5 A surface plot showing the relationship between productivity (f 5 ) and other three objectives, namely tuyere velocity (f 3 ), heat loss (f 4 ) and carbon rate (f 8 )