A Memetic Algorithm Based on Probability Learning for Solving the Multidimensional Knapsack Problem

The multidimensional knapsack problem (MKP) is a well-known combinatorial optimization problem with many real-life applications. In this article, a memetic algorithm based on probability learning (MA/PL) is proposed to solve MKP. The main highlights of this article are two-fold: 1) problem-dependent heuristics for MKP and 2) a novel framework of MA/PL. For the problem-dependent heuristics, we first propose two kinds of logarithmic utility functions (LUFs) based on the special structure of MKP, in which the profit value and weight vector of each item are considered simultaneously. Then, LUFs are applied to effectively guide the repair operator for infeasible solutions and the local search operator. For the framework of MA/PL, we propose two problem-dependent probability distributions to extract the special knowledge of MKP, that is, the marginal probability distribution (MPD) of each item and the joint probability distribution (JPD) of two conjoint items. Next, learning rules for MPD and JPD, which borrow ideas from competitive learning and binary Markov chain, are proposed. Thereafter, we generate MA/PL’s offspring by integrating MPD and JPD, such that the univariate probability information of each item as well as the dependency of conjoint items can be sufficiently used. Results of experiments on 179 benchmark instances and a real-life case study demonstrate the effectiveness and practical values of the proposed MKP.


I. INTRODUCTION
T HE MULTIDIMENSIONAL knapsack problem (MKP) is a generalized version of the conventional 0-1 knapsack problem. It is to select a subset of given items and put them into a multidimensional knapsack in order to maximize the total profit subject to the limited capacity of each dimension of the knapsack.
MKP is one of the most popular combinatorial optimization problems with rich practical applications, including cargo loading [1], cutting stock [2], portfolio selection [3], capital budgeting [4], and resource allocation in distributed systems [5].
Let N be the set of given items, N = {1, . . . , N}, M be the set of dimensions, M = {1, . . . , M}, W j be the capacity of dimension j ∈ M, p i be the profit of item i ∈ N, and w ij be the weight of item i in dimension j. Then, MKP can be formulated as follows: Subject to i∈N w ij x i ≤ W j ∀j ∈ M (2) where x i defined in (3) is a 0-1 decision variable indicating whether item i is selected. The objective of model (1) is to maximize the total profit. Constraints (2) ensure that the constraint of each dimension must be satisfied. MKP is known to be NP-hard [6], and so the size of the problem that can be solved by exact algorithms [7]- [9] is limited. Even the latest exact algorithms (such as [9]) still have difficulties for instances with N = 250 and M = 30. In this case, metaheuristics can be suitable solution methods because they can obtain near-optimal solutions within acceptable running time [10]. Memetic algorithm (MA) is a metaheuristic which uses a hybrid search model integrating evolutionary algorithms [11], [12] and local search to achieve a good balance between the global exploration and the local exploitation. It has been proved that MAs are effective for many combinatorial optimization problems [13]- [16] and, in particular, for MKP [17]- [19]. This motivates our MA search framework for MKP.
Learning-based mechanisms have been widely advocated in enhancing the search performances of MAs [20]- [22]. For most learning-based MAs, certain kinds of probability models are adopted to extract problem-specific knowledge and generate offspring. As Gupta and Ong [23] pointed out, accurate probability models are useful for improving the global exploration of MAs. In this sense, suitable probability models are critical to learning-based MAs, which in general can be represented in different forms, such as a probability distribution or other learning-based networks [24], depending on the problem being solved. The above works motivate us to propose an MA based on probability learning (MA/PL) to solve MKP, in which a problem-specific probability model is introduced.
There are various kinds of probability models in existing learning-based metaheuristics, such as estimation of distribution algorithm (EDA) [25]- [28], cross-entropy algorithm (CEA) [29], and ant colony optimization (ACO) [30]. In MA/PL, a simple probability model (i.e., probability distribution) is used. The probability model of MA/PL is different from EDA, CEA, and ACO in that it considers the problemspecific knowledge of MKP. Based on the structures of MKP, the probability model of MA/PL can reveal the inherent relationships among items. We not only consider individual items but also the dependency of two conjoint items. Two kinds of the probability distribution are used in MA/PL: 1) the marginal probability distribution (MPD) for each item to be selected and 2) the joint probability distribution (JPD) for two conjoint items to be selected simultaneously, whereas MPD reflects the univariate relationships of items, and JPD reflects the connections of conjoint items. Complementing each other, MPD and JPD work together to improve the global exploration ability of MA/PL. Very recently, Cai et al. [31], [32] proposed a novel dual population framework which used an external archive (ExA) to store elite solutions found during the search process. It can make good use of the information from ExA for generating high-quality solutions. In MA/PL, we introduce an ExA as well, which is used to build MPD and JPD.
Infeasible solutions might be generated during the search process of MA/PL. In this case, introducing suitable constraint handling techniques is crucial for the search performance [33]. In existing works, the concept of utility has been shown useful for designing repair operators for MKP. In [34] and [35], two novel utility-based repair operators for MKP, that is, RO2 and RO3, were proposed. RO2 and RO3 are guided by the pseudoutility ratio of each item that is related to the objective function and the constraint violation. It is not uncommon, however, that the values of the objective function and the constraint violation are in different magnitude. Thus, normalization is necessary. In this article, we advance existing utility-based repair operators and embed the logarithmic utility function (LUF) [36] into the proposed repair operator. LUF can obtain a composite value by weighting the preference scales of the objective function and the constraint violation. Thereby, we propose an LUF-based repair operator (RO_LUF). RO_LUF dynamically considers the profits of items and their contributions to the current overload of each dimension, so it is different from RO2 and RO3.
Local search is another important component of MAs. Given any feasible solution of MKP, local search can be viewed as an inverse process of the repair operator and so may be guided by an LUF similar to that used in the repair operator. We propose a simple yet effective local search based on LUF (LS_LUF), where neighboring solutions are generated according to another LUF that considers the profits of the items and spare capacity they need in each dimension. Such a problem-specific LS_LUF intends to enhance the local exploitation ability of MA/PL.
The framework of the proposed MA/PL is given in Fig. 1. As shown in Fig. 1, it starts with a randomly generated initial population. The individuals in the population are repaired and improved using RO_LUF and LS_LUF and then the best ones among them are selected to form the initial ExA. Next, the probability models MPD and JPD are built by using ExA, and used to generate a population of the next generation. The individuals in the new population are repaired and improved, the resulting feasible individuals are used to update ExA, and the iteration continues until some stopping criteria are satisfied.
We summarize the contributions of this article as follows. 1) Based on the features of MKP, we propose two LUFs considering the profit of each item and the overload/remaining capacity of each dimension. The LUFs are used to measure the tradeoff between the objective and the capacity constraint when performing the repair operation and the local search operation. 2) We propose the probability learning rules for MPD and JPD, which are derived from competitive learning and the binary Markov chain. Especially, JPD considers the dependency of two conjoint items in generating solutions of MKP. Integrating MPD and JPD strengthens the global search of MA/PL. 3) We introduce a random route in MA/PL's sampling operator, such that the overall dependency of items can be potentially revealed based on the dependency of two conjoint items. The remainder of this article is organized as follows. Section II introduces related work on MA and MKP. In Section III, based on the solution representation of MKP, the proposed RO_LUF and LS_LUF are described. Section IV presents MA/PL in detail. Results of experiments on 179 benchmark instances and a real-life case study are presented in Sections V and VI. Finally, Section VII draws conclusions and proposes future work.

II. RELATED WORK
MAs have been successfully adopted to solve various optimization problems. A comprehensive review of MAs can be found in [37]. MA is naturally suitable for combinatorial optimization problems, due to the fact that local search can provide basic building blocks in discrete search domains. There have been only a few studies that use MA to solve MKP. Özcan and Başaran [17] proposed a genetic algorithm (GA) with hill climbing heuristic. Rezoug et al. [18] proposed two GA-based MAs which incorporated a stochastic local search and a simulated annealing. Sörensen and Sevaux [19] reported an MA with a population management strategy. These works show that MAs can achieve good performance for MKP and so motivate the proposed MA-based search model.
Machine learning has shown powerful ability for enhancing the search performance of MAs. Ong and Keane [21] proposed a meta-Lamarckian learning to improve the local search of MAs. Rakshit et al. [22] reported an adaptive MA based on Q-learning for multirobot path planning. Indeed, the foremost highlight of learning-based MAs is the probability model. Different kinds of probability models have been proposed in previous learning-based metaheuristics. Liang et al. [25] proposed an EDA with a Gaussian probability model for the function optimization. In [26] and [27], EDAs with Gaussian probability models were used to solve multimodal problems. Shim et al. [28] integrated an underlying EDA into the multiobjective search model. The above works show that suitable probability models should reflect the features of problems studied. These also motivate the proposed probability models of MA/PL.
Research on MKP started in the 1950s, from seminal works of Lorie and Savage [38], Markowitz and Manne [39], Loulou and Michaelides [40], and Magazine and Oguz [41], [42]. In [43], a comprehensive review of studies on MKP is provided. Generally, solution methods for MKP can be classified as optimization-based algorithms and metaheuristics.
There are a number of optimization-based algorithms proposed for solving MKP which apply optimization methods as part of the solution process. Hill et al. [44] proposed a problem reduction heuristic for MKP, where they first formulated MKP as a Lagrangian dual problem and then presented a heuristic for solving the estimated core problem. Yoon [45] reported a Lagrangian method using the concept of Lagrangian capacity/relaxation for MKP. Rong et al. [46] solved a kind of multicriteria MKP with k-min objectives using hybrid algorithms. Other optimization-based methods, such as branch and bound [1], [7] and collaborative methods [8], were also proposed to solve MKP. Most of the above algorithms can only get optimal solutions for instances of small or medium sizes. Although some of them can obtain near-optimal solutions for large-sized ones, such as [8], [44], and [45], the computational time may increase dramatically when the scale of instances increases.
However, for MKP, studies that incorporate machine learning into metaheuristics are relatively unexplored. Wang et al. [34] presented a hybrid EDA (HEDA) for MKP, which adopted a probability matrix to record the probability distribution of elite solutions. Arin and Rabadi [55] proposed a randomized priority search model (Meta-RaPS) for MKP, whereby presented a Meta-RaPS EDA and a Meta-RaPS Q by integrating EDA and Q-learning into Meta-RaPS. Rojas-Morales et al. [56] presented an ant knapsack algorithm based on opposite learning for MKP. The above works show that there is still a gap between machine learning and metaheuristics for solving MKP.

III. PROPOSED PROBLEM-DEPENDENT HEURISTICS
In this section, we describe the solution representation and the problem-dependent heuristics, that is, RO_LUF and LS_LUF that are used as repair and local search operators in MA/PL.

A. Solution Representation
In MA/PL, a string of N bits is used to represent a solution. Value 1 of a bit represents that the item is selected while value 0 means not selected. Given any solution, it may or may not be feasible. If infeasible, we can repair it by taking some items out. For feasible solutions, we can try to improve it by adding more items in. In both operations, we propose to choose items based on LUFs. The idea of LUF stems from the utility concept in quality management areas [36]. We define an LUF as a metric that measures the effect of a given item on the total profit and constraints, and use LUFs to determine the priority of candidate items in RO_LUF and LS_LUF. Fig. 2 shows an infeasible solution π and candidate items in it for RO_LUF, as well as a feasible solution π and candidate items in it for LS_LUF.

B. Proposed Repair Operator Based on LUF (RO_LUF)
The repair operator for infeasible solutions can be seen as a readjustment of the load in each dimension. It is necessary to consider the overload in each dimension. Let ol = [ol 1 , . . . , ol M ] be the overloads in the M dimensions for an infeasible solution π = [π 1 , . . . , π N ]. Each element of ol can be calculated as where the first term is the load in dimension j. Obviously, a negative or zero value of ol j indicates that the capacity constraint in dimension j is satisfied.
In case ol j is positive for some j, the solution is infeasible. To repair the solution, we take out items one by one until the solution becomes feasible. Denote Cr as the set of candidate items to be taken out (e.g., Cr = {1, 2, 5, 6} in Fig. 2). To determine the priority of the candidate items, we calculate an LUF value for each item i ∈ Cr using two attributes of the item. One is its profit p i , another reflects its impact on the capacities in all dimensions, which is defined as where we set ε = 0.001 to ensure ψ i > 0. Then, the value of LUF in RO_LUF is defined as where λ 1 and λ 2 are coefficients. Without loss of generality, we set λ 1 = λ 2 = 0.5 to represent the equal importance of the two attributes for given items. Obviously, when changing the bits of items in Cr from 1 to 0, an item that has larger LUF will result in smaller decrease of total profit while making more contribution in getting the capacities satisfied.
In RO_LUF, the item in Cr with the largest LUF is taken out, the related bit is changed from 1 to 0, and this item is taken out of Cr. If the solution is still infeasible, the values of ψ i and σ i are recalculated to identify the next item to be taken out. This process continues until the solution becomes feasible. The proposed RO_LUF is given in Algorithm 1.
In the illustration example in Fig. 2, RO_LUF is applied on the infeasible solution π . After one iteration taking out item 1 that has the largest LUF, a feasible solution π is obtained.

C. Proposed Local Search Based on LUF
As an inverse process of the repair operator, we here consider the spare capacity in each dimension. For a feasible solution π = [π 1 , . . . , π N ], let sc = [sc 1 , . . . , sc M ] be the spare capacities in the M dimensions. Each element can be calculated as Let Cl be the set of items not in the knapsack (e.g., Cl = {1, 3, 4} in Fig. 2), such that adding each of them alone does not result in capacity violation Set f (π ) = f (π ) + p s ; //update the objective 10 Update Cl by removing all such item i that sc j − w ij < 0 for any j = 1, ..., M; 11 End do 12 We calculate another LUF for item i ∈ Cl, again using two attributes, profit p i and a specially defined τ i where ε = 0.001 is used to guarantee τ i > 0. Then, this LUF for item i ∈ Cl is calculated as where λ 1 = λ 2 = 0.5 are two coefficients. In LS_LUF, we identify the item with the smallest LUF and change its bit from 0 to 1. LS_LUF is also an iterative procedure, each time choosing one item to add in, then recalculating spare capacity of each dimension, updating Cl, and recalculating τ i and δ i , until Cl is empty. This procedure is given in Algorithm 2.
Note that we use LS_LUF to improve only the feasible solutions that are generated by repairing infeasible solutions with RO_LUF, such as π generated by repairing π in Fig. 2. This is because, in our preliminary tests, we observed that executing LS_LUF for all feasible solutions produced by MA/PL's global search is very likely to cause a premature convergence.

Algorithm 3: Initialization and Updating of ExA
Input: π P (gen), π e (gen) Output: π e (gen) 1 If (gen = 0) then 2 Find the best AL individuals from π P (gen); 3 Record these best AL individuals to π e (gen); //initialize the ExA 4 Rank individuals of π e (gen) in descending order of their objective values; 5 Else 6 Find the best individual π lb in π P (gen); 7 If f (π lb ) > f (π e AL (gen)) then 8 Remove π e AL (gen) from ExA; 9 Insert π lb into a proper position of the ranked list of individuals in ExA;

10
End If 11 End If 12 Return π e (gen)

IV. MA/PL FOR MKP
In MP/PL, we use two learning rules: 1) the learning rule based on the competitive learning for MPD and 2) the learning rule based on the proposed binary Markov chain for JPD. Using the learning rules, the dependency of items can be captured when a new population is produced by using a random sampling route.

A. ExA and Updating Mechanism
ExA is updated at each generation of MA/PL, so that it always contains high-quality solutions. Let PS be the population size of MA/PL, AL the size of ExA, π P (gen) = {π In Algorithm 3, ExA is incrementally updated by replacing the worst individual in it with the best in the current population if the latter is better. Thus, the information of ExA that reflects elite solutions can be gradually enriched. No more than one individual from the current population is added to ExA to avoid premature convergence. MPD and JPD are all based on ExA, so it can be seen as a set of training data for our learning rules.

B. Learning Rule Based on Competitive Learning for MPD
The ideas of the proposed learning rule for MPD derive from competitive learning [57]. The competitive learning can make good use of the information of high-quality samples in learning probability models. We represent MPD as the union of the univariate marginal distributions of all items. Then, the existing learning rule in [34] and [57] is used. However, we learn MPD from ExA rather than from elite individuals in current population, which is different from [34]. Let MPr(gen) be the union of the marginal distribution at generation gen. It is defined as where 1 k=0 mpr k,i (gen) = 1(i ∈ N) and mpr k,i (gen) represents the probability value for the bit of item i to be k. Note that MPr(gen) is initialized with mpr k,i (gen = 0) = 0.5 (i ∈ N and k ∈{0,1}. Let LR be the learning rate of MA/PL. The competitive learning method for estimating MPD is given in Algorithm 4. From Algorithm 4, it can be seen that MPDs between two consecutive generations are connected with the learning rate LR, and the probability values of items are gradually moved toward TmpPro/AL obtained from elite solutions (line 6). This is similar to the weight update rule in competitive learning. In this case, MPD is not only determined by its previous statuses but also related to the information of the current ExA (lines 3-6). Thus, the information of elite solutions can be adequately used by MA/PL in order to predict promising search regions.

C. Learning Rule Based on Binary Markov Chain for JPD
The purpose of JPD is to get the dependency of two conjoint items. Our learning rule for JPD is motivated from the binary signal source in telegraph systems [58]. The telegraph system outputs a series of pulses with high (H) and low (L) symbols (i.e., an N-bit string) in a temporal order. It can be modeled by a Markov source with given memory length. The memory length defines the dependency structure of signals. If we only consider a dependency of two conjoint symbols, the memory length is 1 (i.e., first-order Markov source). The first-order Markov source can be analyzed by a binary Markov chain. As shown in Fig. 3, the dependency of two conjoint symbols can be captured using the binary Markov chain, such as the (i-1)th and the ith signals.
It is interesting that the output of the first-order Markov source is equivalent to the solution representation of MKP (see Fig. 4), when regarding the "temporal order" as the "spatial order." Thus, it is reasonable to extract the bivariate dependency of conjoint items by using a binary Markov chain.
In Fig. 4, the probability distributions of points 2 to N are often different, which are determined by the features of MKP. Therefore, the nonstationary binary Markov chain is used in our learning rule, as shown in Fig. 5. There are N − 1 pairs of conjoint items for the order 2 → · · · → N. For each point of 2 to N, we first calculate the one-step transition probability matrix (OSTPM) from ExA and in turn obtain JPD based on OSTPM. Let where jpr i k,l (gen) is the probability of π i−1 = k and π i = l at generation gen. Apparently, JPr i (gen) can be seen as a discrete joint distribution of conjoint items i − 1 and i in the order of 2 → · · · → N. Then, we can calculate jpr i k,l (gen) as where ρ is set to 1, 2, 3, and 4 for conditions

D. Sampling Operator Based on MPD and JPD
Accurate probability models are beneficial to improve the global exploration of learning-based MAs [23]. Indeed, there are many probability models that can capture the precise dependency of all random variables, such as Bayesian networks. However, such probability models in general take more computational time. Hence, we only consider the dependency of two conjoint items in the proposed JPD. Besides, there may be a dependency between any pair of items. In this case, many such dependencies would not be considered if one fixed sampling route is always used. Thus, we propose to take a random sampling route, so that the dependency between any items can be potentially considered. In our sampling operator, we first generate a random permutation of integers 1, . . . , N : r = [r 1 , . . . , r N ] and then use it as the sampling route, as shown in Fig. 6.

E. Overall Procedure of the Proposed MA/PL
Let genMax be the maximum generations and gbest(gen) be the best feasible individual found so far. The overall procedure of MA/PL is given in Algorithm 7. It shows that MPD and JPD are all estimated from ExA (lines 5 and 10), which is useful to get more information of promising search regions. Moreover, we use RO_LUF and LS_LUF to repair and improve new solutions (lines 4 and 9). With this precise model, MA/PL can hopefully achieve a good balance between exploration and exploitation.

A. Experimental Setup and Parameter Settings
To verify the performance of the proposed MA/PL, we test it on four sets of well-known benchmark data for MKP. The sets are related to different sizes. Test set I contains 11 large-scale instances with M = 15 to 100 and N = 100 to 2500 (available from http://hces.bus.olemiss.edu/tools.html). Test set II contains 20 medium scale instances with M = 5 and 10 and N = 100 (available from ORLIB [59] In Tables I-XIV, each number in the row #Best is the total number of instances for which the relevant algorithm obtained the best result in terms of different quality metrics. Besides, we highlight the best quality metric of each instance found by relevant algorithms in bold, and mark the column of BEST of certain instances with symbol "*" if BKS is attained. To check the significance of the differences between MA/PL and other In Sections V-C and V-D, we adopt the results of existing algorithms from relevant references. The average CPU times (s) (i.e., CPU (s)) of MA/PL are listed for information.

B. Effectiveness of MA/PL's Components 1) Effectiveness of Proposed Learning Rules and Local
Search: To study the impact of the proposed learning rules, random sampling route, and local search, we introduce the following variants.  Table I. In terms of the percentage deviations from BKS, that is, Dev (%), the related violin plots are given in Fig. 7.
From Table I, we can see that MA/PL_V2 outperforms MA/PL_V1, clearly showing that the integration of MPD and JPD enhances the searchability. That is, considering the dependency of conjoint items leads to better global exploration. Meanwhile, MA/PL_V3 outperforms MA/PL_V2 for almost all the instances in terms of AVG. Especially, there are a total of 13 instances for which the SD values of MA/PL_V3 are smaller than those of MA/PL_V2. Thus, by capturing the potential dependencies among all items, the random sampling route improves the effectiveness and the robustness of MA/PL. Moreover, MA/PL outperforms MA/PL_V3 which demonstrates the effectiveness of the proposed LS_LUF. From  Fig. 7, it is clear that the distributions of MA/PL_V3 are better than MA/PL_V1 and MA/PL are better than MA/PL_V3, in terms of both the dispersions and the interquartile ranges.
In addition, we also observe the 95% CI for the solutions of MA/PL and the three variants for the other instances, that is, Hp2, Knap 28, MK_gk04, Pb6, Pet7, and Sento2. The related CI's are shown in Fig. 8. We can see that MA/PL can obtain relatively more consistent solutions than the other three variants. These CI's also reveal the performance of MA/PL's components.
To further reveal the effectiveness of the proposed LS_LUF, two variants of MA/PL that use advanced local search methods in existing works for MKP are introduced. The two variants are MA/PL_NLS and MA/PL_ILS, which are the same as MA/PL except that LS_LUF is replaced by the neighborhood local search [62] and the intensity-based local search [34]. The same CPU times used in Table I are adopted for each algorithm. Results are given in Table II. In addition, in Fig. 9, we give the 95% CI for the solutions of MA/PL and the two variants for the same instances that are shown in Fig. 8. The results of Table II and Fig. 9 reveal the competitiveness of the proposed LS_LUF.

2) Effectiveness of the Proposed Repair Operator:
To verify the effectiveness and efficiency of the proposed RO_LUF, we compare MA/PL with two variants, namely, MA/PL_RO2 and MA/PL_RO3. They are the same as MA/PL except that RO_LUF is replaced by RO2 [34] and RO3 [35]. RO2 and RO3 are all the most effective repair heuristics for MKP in existing works. Note that the same stopping criterion (genMax = 5000) is used for two variants and MA/PL. Relevant results on Test set II are reported in Table III. Meanwhile, the average CPU times for three algorithms are given in Fig. 10. Table III shows that MA/PL is better than MA/PL_RO2/3 for almost all the instances, regarding BEST and AVG. Hence, RO_LUF is beneficial to enhance the overall solution quality of MA/PL. Besides, as shown in Fig. 10, MA/PL takes less CPU time than MA/PL_RO2/3 although the same stopping criterion is adopted. These results prove the effectiveness and efficiency of the proposed RO_LUF. In terms of Dev (%), the violin plots of MA/PL and MA/PL_RO2/3 are given in Fig. 11, showing that MA/PL can obtain more robust solutions than the other variants.

3) Discussions on Coefficients for RO_LUF and LS_LUF:
The coefficients λ 1 and λ 2 applied for calculating LUFs play an important role on the performance of RO_LUF and    Table IV. In addition, we give trends of SD in Fig. 12. Note that the four variants are executed with genMax = 5000 which is the same as MA/PL. Table IV shows that the values of AVG of MA/PL are much better than those of the four variants for almost all the used instances. The results verify that in RO_LUF and LS_LUF the equal weight of the two attributes for candidate items is helpful in enhancing the effectiveness of MA/PL. Meanwhile, in Fig. 12, MA/PL has relatively smaller SD for each instance, so setting the equal coefficient to the two attributes is beneficial to enhance the robustness of MA/PL as well. This can be due to the fact that for the proposed LUFs the different magnitude of the two attributes has already been normalized before using them. Thereby, the equal coefficient of them is more reliable to reflect the problem-specific relationships of the two attributes.
Moreover, we draw the trend of CPU (s) for each algorithm in Fig. 13. It is clear that algorithms with larger λ 2 tend to take less CPU time. This is because, during the search process of MA/PL, a larger λ 2 that corresponds to a smaller λ 1 will emphasize the role of the constraint violation but weaken the impact of the objective. Thereby, for MA/PL with a larger λ 2 , the time complexities of RO_LUF and LS_LUF can be much lower than the worst case O(PS·N ·(3·N +M)), compared with ones using smaller λ 2 . Especially, the trend line of MA/PL in Fig. 13 is located in the relatively middle area among all the lines. However, the CPU time for MA/PL is quite acceptable as we can see in this section. Therefore, we set (λ 1 , λ 2 ) = (0.5, 0.5) considering both the effectiveness and efficiency of MA/PL.

C. Comparisons With Learning-Based Metaheuristics
We compare MA/PL with four learning-based metaheuristics for MKP, that is, HEDA1 [34], HEDA2 [34], Meta-RaPS EDA [55], and Meta-RaPS Q [55]. HEDA1 and HEDA2 use the EDA-based learning scheme, whereas HEDA2 uses RO2 as the repair operator and HEDA1 uses a repair operator based on LP. Meta-RaPS EDA and Meta-RaPS Q adopt a model of memoryless metaheuristic (i.e., Meta-RaPS). Meta-RaPS EDA and Meta-RaPS Q employ EDA and Q-learning as the underlying algorithms. These algorithms are the most currently reported learning-based ones for MKP. The stopping criterion of MAPL is genMax = 5000. Relevant results are listed in Tables V and VI. From Table V, we can see that MA/PL outperforms HEDA1 and HEDA2. Although MA/PL's learning mechanism is similar to HEDA1 and HEDA2, its probability distribution is estimated based on ExA, rather than the elite individuals in the population. Such a learning method of MA/PL can make good use of the information of elite individuals and so speeds up the convergence. Besides, the integration of MPD and JPD can capture valuable probability information to guide the global exploration of MA/PL. We can see from Table VI

D. Comparisons With Other Existing Algorithms
In this section, we compare MA/PL with nine existing algorithms for MKP with respect to Test sets I-IV. These existing algorithms are listed in    Based on the results in Table XII, we draw the violin plots of KMTR-Cuckoo, KMTR-BH, and MA/PL in Fig. 14, in terms of Ave. Dev and SD, showing that MA/PL is better than the other two algorithms. Therefore, there would be a large potential to use MA/PL for real-life problems that can be modeled as MKP.

VI. REAL-LIFE CASE STUDY
Recently, the authors conducted a real-life project for an iron and steel company in Qian'an, Hebei, China. In this project, we focus on the production planning for the cold rolling of the iron and steel production process. Cold rolling is the most complex process in the iron and steel industry. The primary procedure of the cold rolling production planning is the order selection (OS), which involves selecting suitable orders (items) from a given order pool, on the basis of the capability of processing units (dimensions) and the due dates of orders. The sketch of OS is given in Fig. 15. We consider five bottleneck units in OS, that is, acid pickling line (APL), pickling line tandem (PLT), continuous annealing 1 and 2 (CA1 and CA2), and 20-high cold rolling mill (RCM). Note that in a given planning interval, each unit has a limited capacity (constraint) that is determined by the maintenance activity and the already assigned load.
To improve customer satisfaction, orders that are closer to their due dates should be given higher priority to be selected. We thus model OS as an MKP where the objective to be maximized is the total inverse of the time difference between the current time and the due date of each order. Besides, constraints considered here are the limited capacity of each unit in the current planning interval. In this section, OS is used as a case study and the related factory data are collected (available from http://dx.doi.org/10.13140/RG.2.2.21263.12964).
In this case study, MA/PL is independently run 30 times with the stopping condition genMax = 5000. Besides, we especially invite two experienced schedulers (i.e., Schedulers A and B) from the company to manually make the decision of OS by using the same factory data as MA/PL. The results of MA/PL and the manual  Table XV, it is clear that MA/PL outperforms, in terms of BEST and AVG, the manual method. Especially, MA/PL can select 306 orders with SD = 0.00, which is much better than the two schedulers. Moreover, the value of CPU (s) of MA/PL is 77.14 s, which is acceptable for the efficiency requirements of the company. Furthermore, we give the trend of objective found so far at each generation of MA/PL in Fig. 16, showing that MA/PL has a good convergence feature to solve OS. Toward that end, MA/PL can be seen as an effective and efficient solution method for practical applications.

VII. CONCLUSION
This article proposed an MA based on probability learning (i.e., MA/PL) to solve MKP. The primary highlights of  this work lie in two aspects, that is, the problem-dependent heuristics and the learning-based search framework of MA/PL.
Based on the features of MKP, we proposed two problemdependent heuristics for the repair operation and local search of MA/PL, that is, RO_LUF and LS_LUF. These heuristics fully considered the tradeoff between the objective and the violation of constraints according to the presented LUFs. The framework of MA/PL adopted two kinds of probability distribution, that is, MPD and JPD. MPD was concerned with the univariate probability distributions of items, whereas JPD focused on the bivariate dependency of conjoint items in a given spatial order. The learning scheme of MPD and JPD were all based on elite solutions recorded in ExA. To be specific, MPD was first estimated by the canonical competitive learning. And then JPD was constructed using the proposed learning rule based on a special binary Markov chain. By integrating and complementing MPD and JPD, more accurate probability models can be constructed accordingly. Moreover, we introduced a random sampling route based on the solution structures of MKP to reveal the overall dependency among all items by means of the bivariate dependency of two conjoint items. Thereby, promising offspring of MA/PL can be produced by the integration of MPD and JPD. Results of experiments and comparisons on 179 well-known benchmark instances and a real-life case study verified the effectiveness and practical values of the proposed MA/PL.
For further research, effective metaheuristics may be designed based on machine learning and mathematical programming to address other types of knapsack problems, such as the multiobjective multidimensional knapsack problem (MOMKP), the quadratic knapsack problem (QKP), and the multichoice knapsack problem (MCKP).