A rollout algorithm for the resource constrained elementary shortest path problem

ABSTRACT This paper presents a metaheuristic approach for the resource constrained elementary shortest path problem (). arises as pricing problem, when the vehicle routing problem is solved by branch-and-price algorithms. The availability of efficient metaheuristic and optimal solution approaches has contributed to the success of solution procedures based on column-generation. We focus on rollout strategies integrated with local search strategies. The scientific literature considers metaheuristics based on a tabu search procedure in order to price out columns. A comparative analysis between the proposed rollout approaches and the tabu search is conduced and the effectiveness of our proposed algorithms is tested. A comparison with exact solution approaches is also carried out in order to assess the behaviour of the implemented solution strategies in terms of both efficiency and solution quality.


Introduction
The Resource Constrained Shortest Path Problem (RCSPP) aims at finding a minimumcost path from a given source node to a given destination node in a digraph, satisfying a set of constraints, defined over a set of resources.
Desrochers introduced the RCSPP in his Ph.D dissertation [15] as a subproblem of a bus driver scheduling problem. The RCSPP can be used to represent several real-life problems, arising in scheduling, telecommunications and transportation networks, thus, it has attracted the attention of many researchers over the last decade. Surveys of scientific contributions related to the RCSPP and its variants can be found in [16][17][18]26,27].
Its elementary version, referred to as the Resource Constrained Elementary Shortest Path Problem (RCESPP), deals with the specific situation in which the nodes belonging to the optimal path must appear at most once.
It is well known that the development of solution approaches for the RCESPP has contributed to the success of column-generation methods for addressing several versions of the vehicle routing problem (VRP). When the VRP is solved via a branch-and-price approach, the corresponding master problem is formulated as a set-partitioning model and the corresponding pricing problem can be mathematically represented as a RCESPP. Indeed, the variables derived from the dual problem of the set-partitioning formulation are viewed as the prizes associated with the nodes. The cost of the path is the sum of the cost for traversing the arcs minus the prizes collected at the nodes. Since the graph is assumed to be cyclic, it is necessary to introduce constraints that guarantee the elementarity of the optimal path.
In the column-generation framework, metaheuristics and/or heuristics are used to obtain negative reduced cost paths quickly. Optimal solution approaches are applied when the heuristic procedures fail to price out columns. Since the RCESPP is NP-hard in the strong sense (see [19]), a low number of calls to the optimal approach is beneficial for the entire resolution process. For this reason, the availability of effective heuristics could reduce the overall computational effort. However, performing an optimal algorithm is mandatory in order to prove the existence of negative reduced cost paths. Various optimal solution strategies have been proposed by the scientific literature for the RCESPP (see e.g. [4,9,20,30,35]).
Several papers, related to branch-and-price algorithm (see e.g. [8,12,32,33]), suggest to apply dynamic programming-based heuristics when the metaheuristic fails to price out columns and before applying an optimal solution strategy. The dynamic programmingbased heuristics make use of modified optimal solution algorithms, that explore a restricted solution space. This allows to reduce the computational effort, but the determination of an optimal solution is not guaranteed.
In general, the pricing procedure is divided into three main levels. In the first level, a metaheuristic is applied. If a path with negative cost is not determined, the second level is executed where a dynamic programming-based heuristic is performed. The third level is invoked in order to solve to optimality the RCESPP.
This paper focuses on the first level. The aim is to define an effective metaheuristic able to price out as high as possible columns.
Heuristic procedures to address the RCESPP have been proposed in [13,14,31,36]. In particular, Desaulniers et al. [14] applied a tabu search method for solving the RCESPP arising as subproblem in a branch-and-price-and-cut procedure for the VRP with time windows (VRPT W). Archetti et al. [1] extended the approach presented in [14] to the split and delivery VRPT W.
Dell' Amico et al. [13] investigated the VRP with simultaneous distribution and collection, by developing a column-generation based approach, that uses fast deterministic and randomized greedy algorithms to quickly generate columns with negative reduced cost.
Liberatore et al. [31] studied the RCESPP with time windows and implemented three different pricing algorithms with increasing complexity and accuracy: a greedy constructive algorithm, a local search algorithm and an approximate dynamic programming algorithm.
Salani and Vacca [36] considered the VRP with discrete split deliveries and time window constraints. The problem is solved via a column-generation approach. The authors stated the related pricing problem as an instance of the RCESPP, and defined a simple greedy heuristic to obtain valid upper bounds on both the cost and the number of vehicles.
Pirkwieser and Raidt [34] presented a column-generation approach to address the periodic VRP with time windows. They formulated the pricing problem as a RCESPP with time windows and restrictions on path duration and developed a metaheuristic, which can be viewed as a greedy randomized adaptive search procedure.
Several procedures have been proposed in the scientific literature to determine lower bounds on the optimal solution by eliminating cycles of at most a given length z. In particular, dynamic programming approaches aimed at eliminating cycles of length two have been presented in [11,25], whereas the case of z ≥ 3 has been considered only in [28].
Very recently, Baldacci et al. [2] proposed a new relaxation of the RCESPP (i.e. the ng-route relaxation), that allows to obtain very good quality lower bounds.
In this work, we design rollout-based approaches (RH s ) to solve the RCESPP. RH s have been introduced originally in [6] and [7] and can be used to solve NP-hard combinatorial optimization problems. The basic idea is to use the cost obtained by applying a heuristic method (i.e. base heuristic) for discriminating among several search options at each step.
Even if our proposed approaches can be used merely to solve the RCESPP, our main concern is to propose a new idea to efficiently generate columns in the context of the column-generation framework. When the RCESPP is solved as subproblem in columngeneration, it is important to rapidly obtain a considerable number of negative reduced cost columns. Thus, the definition of a rollout procedure has been motivated by two main reasons.
Firstly, these algorithms are very appealing from the practical point of view. Indeed, several experimental results, on both deterministic and stochastic problems, in sequential and in parallel computing systems (see [21][22][23]), have shown that RH s significantly improve the performances of the base heuristic in terms of solution quality. Moreover, they are more robust than other metaheuristic techniques (e.g. Tabu Search and Simulated Annealing, as shown in [7] because no input parameters needed to be tuned.
Secondly, a set of feasible paths is determined at each iteration. Thus, it is possible to store feasible solutions with negative reduced cost to populate the pool of paths in a column-generation procedure. Since a set of complete paths is determined at each iteration, a rollout-based approach is able to explore a large-size space.
The remainder of this paper is organized as follows: in Section 2 we introduce and mathematically formulate the RCSPP and the RCESPP. In Section 3 we give a description of the proposed solution approaches. Since we compare the defined rollout algorithms for the RCESPP with a tabu search approach, for the sake of completeness, in Section 4 we describe the tabu search. In Section 5 we present the computational results, where the performance of the proposed solution strategies are evaluated and compared with the tabu search algorithm. In Section 6 we summarize the final remarks and conclusions. A detailed accounting of the computational results is reported in the Appendix of the supplemental material.

Problem definition and notations
The RCSPP is formulated on a directed graph G (N , A) where N is the set of n nodes and A is the set of m ≤ |N | × (|N | − 1) arcs.
Let c ij and R ij be the cost and an r-dimensional vector associated with each arc (i, j) ∈ A, respectively. The r = |R ij | scalars w h ij , h = 1, . . . , r, represent the consumption of the resource h along the arc (i, j). Given two distinct nodes u and v, a path π uv from u to v is a sequence of nodes and arcs π uv = {u = i 1 , (i 1 , i 2 ), i 2 , . . . , i q−1 , (i q−1 , i q ), i q = v}. Let c(π uv ) be the cost of the path π uv defined as the sum of the costs associated with the arcs of the path π uv . A resource extension function (REF), denoted as w h (π uv ), is associated with each resource h = 1, . . . , r and represents the quantity of resource h consumed along the path π uv .
Depending on the nature of the resource h, w h (π uv ) can assume different forms, i.e. additive, multiplicative, bottleneck-type (see [26] for more details on REF). Given a source node s and a destination node t, the aim of the RCSPP is to find a path π * st from s to t such that c(π * st ) ≤ c(π st ) for each path π st in G (N , A) , where π * st and π st satisfy feasibility criterion associated with each resource h = 1, . . . , r.
Depending on the values of a h i and b h i , at least two types of constraints can be modelled: 1) time windows constraints and 2) capacity constraints. For the former, the resource h represents time and a h i , b h i ∈ R + , ∀i ∈ N . We distinguish between two kinds of time window constraints: hard and soft constraints. However, in this work only hard constraints are considered. Thus, if the node i is visited before a h i , then waiting is allowed at no cost, in addition, all arrival times after b h i are declared infeasible. For the capacity constraints, where W h is the vehicle capacity. The RCESPP is the elementary version of RCSPP and its aim is to find the elementary minimum-cost path π * st from node s to node t. A path is elementary when it does not contain repeated nodes. Since the cost c ij , ∀(i, j) ∈ A, is not constrained in sign and the graph is not assumed to be acyclic, it is necessary to impose that the final solution must be elementary, that is a node cannot appear more than once in the final path.
We highlight that the RCESPP arises as subproblem, named pricing problem, when the VRP is solved via column-generation method; thus the VRP is mathematically reformulated as a set-partitioning problem as follows: where¯ is the set of all feasible paths from node s to node t, μ ij (π st ), ∀(i, j) ∈ A are binary constants that assume a value of 1 if and only if arc (i, j) belongs to the path π st ; the binary variables x(π st ) are used to indicate whether the path π st is selected or not. Solving the set-partitioning formulation requires the determination of the set¯ . However, it is well known that enumerating all paths is impractical. Thus, a relaxation of the model (1)-(3), obtained by considering a subset of the set¯ is considered. In this framework, a column-generation approach is used with the aim of finding promising paths not yet included in the subset. Thus a subproblem (i.e. the pricing problem) have to be solved in order to price columns (paths) of the set-partitioning problem. In the context of the VRP, the pricing problem represents an instance of the RCESPP.
Let λ i , i = 1, . . . , |N| be the dual variables associated with the constraints (2), the reduced costč(π st ) of the path π st is: The pricing problem aims to find a feasible minimum reduced cost path, and it can be formulated by the following RCESPP mathematical model: i∈Ñ j∈Ñ x ∈ X (7) whereč ij = c ij − λ i and X in Equation (7) defines the solution space induced by the resource constraints. In other words, X reflects the requirement imposed to the routes in the VRP in terms of resource consumption. The constraints (6) are the subcycle elimination constraints, whereÑ ⊆ N is a subset of nodes. We highlight that the proposed rollout algorithms work for any definition of set X. However, in the computational analysis, we focus our attention to capacity and time window constraints. For the sake of comprehension, we report the definition of set X for the two types of aforementioned resource constraints Assuming h = 1 is the resource associated with either the pick-up or the delivery operations and h = 2 is the resource time, set X is defined as follows: where w 2 i is the arrival time to node i ∈ N from source node s, i.e, w 2 i ≡ w 2 (π si ).

Rollout approach for the RCESPP
In this section, we present the proposed RH defined to address the RCESPP. A RH can be viewed as a construction procedure, that determines a solution of the problem under investigation, by starting with some feasible partial solution, which is a feasible path starting at node s and ending at node i ∈ N \ {t}, (denoted as ) and by enlarging the current solution iteratively.
A base heuristic, denoted as H, is used to evaluate the different search options at each step. It is assumed that, given a partial solution , a complete feasible solution (i.e. a path starting at node s and ending at node t) is determined by applying H and the corresponding cost is denoted by c( ).
Let ( ) be the neighbourhood of the partial solution (i.e. the set of all feasible solutions that can be obtained from applying an elementary perturbation or move).
At the generic iteration k, a partial solution (k) is available, thus, the solution neighbourhood ( (k) ) of (k) can be determined. ( (k) ) contains all feasible solutions that can be obtained from (k) by adding an additional element (i.e. a node). We refer to k i as the complete feasible solution forced to contain node i, generated starting from the partial solution (k) . Let 0 be an initial partial feasible solution, the general scheme of RH is depicted in Algorithm 1 [21].
Step 1a. If (k) = ∅: exit Otherwise, select from the neighbourhood (k) of (k) the solution k i for which the heuristic cost is minimized, that is: Step 1b.
It is worth observing that if a feasible solution π st does not exist, then the neighbourhood ( (0) ) is empty and Algorithm 1 terminates. Starting from Algorithm 1, a RH for solving the RCESPP can be derived, as described in the following. At each iteration k of RH, (k) corresponds to the partial path (s, (s, The solution neighbourhood ( (k) ) represents all the feasible solutions (i.e. feasible paths) obtained by adding to the partial path (k) one more node i k+1 . Obviously, i k+1 can be added to the current partial path (k) if and only if it does not already belong to (k) and the newly created path satisfies the resource constraints. Furthermore, given a partial path we use the base heuristic H to generate a complete feasible path k i k+1 = (s, (s, i k+1 ) forcing the complete feasible solution to contain node i k+1 . The node i k+1 that results in a complete feasible solution generated by H with the smallest cost is selected and added to (k) , that is, (k+1) = (k) ∪ {i k+1 }, such that: is the complete and feasible path generated by H starting from (k) containing nodeī.
It is worth observing that, in order to add a node to the current solution, the base heuristic is executed O(n) times, whereas each complete feasible solution (i.e. an elementary paths) contains at most n nodes. Consequently, the computational complexity of RH for RCESPP is O(n 2 θ), where O(θ) represents the computational complexity of the base heuristic H.
In the proposed RH, we have considered as base heuristic H a hybrid approach defined by combining a construction heuristic, referred to as H c , with a local search method, denoted by H ls .
H c is a simple greedy construction algorithm (nearest feasible neighbour heuristic), that starts from the origin node s and adds new nodes, by taking into account the elementary and resource constraints, until the destination node t is reached. Given a sequence of nodes ending to node i, H c chooses the next node j to be added minimizing the cost c ij and satisfying all constraints, H ls is an improvement heuristic that, starting from a feasible path from node s to node t determined by H c , attempts to improve the solution by a sequence of arc interchanges by using 'Or-Opt' algorithm. The 'Or-Opt' attempts to improve a current path, by moving a chain of two or three consecutive nodes in a different position. Then, the preceding and the succeeding nodes in the earlier path, are linked by an arc. This kind of local search is widely applied when solving the VRP. For more details on 'Or-Opt', the reader is referred to [10].

Tabu search approach for the RCESPP
The Tabu Search (T S) method, used for comparison, is inspired by the approach presented in [1]. The authors in their work addressed the VRPT W. In particular, they implemented a T S algorithm in order to solve the pricing subproblem when the VRPT W is solved by using a branch-and-cut algorithm.
Let π be the route which starts from the source node s, visits a set of nodes N π and ends at node t and N r = N \ N π . Let K max be the maximum number of iterations imposed for the T S. Let us indicate with TL insert and TL remove the tabu search lists. The T S procedure relies on the following two operators: (1) Remove: the main idea is to determine the most expensive node (in terms of cost) belonging to the route. For each vertex i ∈ N π \ TL remove , the operator evaluates δ(i), representing the decrement in the objective function value, if node i is removed from the current solution, and removes the node i best , for which the greatest decrement is obtained, that is, i best = argmax i∈N π \TL remove {δ(i)}. (2) Insert: the aim is to find the best possible insertion position for all the nodes in N r \ TL insert . The operator tries to insert each node j ∈ N r \ TL insert into the current route.
If all the constraints are respected, the cost γ (j), that is the value that the objective function assumes if the node j is added to the route, is calculated. The chosen node j best is then added to the route, where j best = argmin j∈N r \TL insert {γ (j)}. In addition, an Or-Opt procedure is applied to try to improve the solution obtained after the node j best is inserted.
When a node i is removed from the route, it is added to the tabu list TL insert , and it is forbidden to insert this node to the route for a certain number of iterations defined as TL max . Similarly, if a node j is added to the route, it is also added to the tabu list TL remove , thus, removing this node from the route is a forbidden move for TL max iterations. A greedy algorithm is used in order to provide an initial solution for the T S.
The pseudo code of the T S is reported in Algorithm 2.

Computational experiments
This section presents the results of the experiments carried out in order to assess the performance of the proposed algorithms.
Starting from a naive version of the rollout algorithm, named (N RH), where the baseheuristic is composed of only the construction heuristic H, we defined several versions by combining different local search-based heuristic. We compared the proposed solution approaches with the T S, and two optimal solution strategies, i.e. the general state space augmenting algorithm (GSSAA, for short) [9], and the branch-and-cut (BAC, for short) proposed in [29]. We coded the proposed solution approaches, the T S and the GSSAA in Java language. We carried out the computational experiments by using an Intel PC Core TM i5 − 3337U CPU @ 1.80GHz, 4,00 GB RAM, under Microsoft Windows 8 operating system. The computational study is divided into two parts. In the first part (see Section 5.2), we carried out a comparison with the T S procedure. To this aim, we consider a version of the proposed rollout algorithm, that is, RH 1 where the Or-Opt algorithm [10] is applied starting from the complete solution generated by the N RH. The aim of this study is to highlight the usefulness of RH 1 when it is embedded in a column-generation procedure. It is well known that the efficiency of the branch-and-price is strongly related to the ability of the heuristic to price out the columns. The number of columns generated has a direct influence in the number of iterations. Scientific contributions highlight that for computationally demanding difficult problems, such as VRP, the higher the number of negative columns generated, the more efficient the branch-and-price algorithm (see e.g. [3]). Part two deals with the evaluation of the results obtained by testing the proposed rollout algorithm and its variants (see Section 5.4). The aim of this part is to give an idea on the quality of the solution in terms of optimality gap. Indeed, beside the number of pricing columns, high quality solution could accelerate the search process in the branch-and-price algorithm.

Test problems
We carried out computational experiments on two sets of benchmark instances (referred to as S1 and S2).
The instances of set S1 are those used in [16]. They are obtained by the original Solomon's networks of the class clustered (C), random (R) and random clustered (RC). The three classes differ each others for the positions of the vertexes in the field. To obtain negative cost, we associate a price with each vertex chosen randomly from the range [0, 20]. The price values are selected exactly as done in [35]. From each original instance, one RCESPP instance with 50 nodes has been derived, setting the vehicle capacity to 40.
The test class S2 refers to the networks used in [29]. They are 'pricing problems' generated by solving VRP with Capacity Constraints (CVRP) via column-generation algorithms. The dual values have been subtracted, thus some costs are negative. According to the authors, these networks have been divided into six different sets:

Comparative analysis
In this section we present a comparative analysis of the results of the RH 1 algorithm with the results of the T S.
At first, we conducted a series of tests in order to choose values of the T S's parameters. After, we focus on the comparison of the proposed approaches. Table 1 describes the three parameters used in our work. In particular, the table includes the maximum number of tabu search iterations (K max ) and the size of tabu lists (TL max ), which depends on a random number ρ. This number is related to the size of the instances, in other words it depends on the number of the nodes in the network and it is selected in the interval [1, |N|]. We tested all the instances, using K max taken from {10, 20}. For each value of K max , we use six different values of TL max . We chose these parameters in a way that allows a good exploration of the solution space. Tables A2 and A3 in the Appendix of the In particular, when the value of TL max depends on ρ, we executed the algorithm five times on each instance for each different value of K max and TL max . The average on the objective function values is reported under the columns avg cost, in the Tables A2 and A3. When TL max has a fixed value, the objective function is given under the columns cost.
From Tables A2 and A3 it is possible to observe that the algorithm obtains the best value of avg cost when K max = 20 and TL max = 5 + 0.5 * ρ. Thus, for our tests we set the parameters accordingly to these values.
To analyse and compare the performance of the algorithms, we report in Table 2 the results attained for the classes C, R and RC belonging to the set S1, and in Table 3 those obtained for the set S2. In row AVG we report the average results for each class of problems, that is, C, R and RC for set S1 and A, B, E, M, and P for set S2. In row AVG1 we highlight the average results over all instances, for each set.
Each instance is solved nine times by the T S where the parameter ρ is chosen randomly in the interval [1, |N|].
Each table contains three columns for each strategy: time is the time in milliseconds spent by RH 1 to solve each instance, cost is the value of the objective function and nc is the number of the negative columns generated by RH 1 . AVG time and AVG cost are the average on time in milliseconds and the average on objective function values obtained when the instance is solved by T S, respectively, while AVG nc is the average on the number of negative columns generated by T S.
According to Table 2, RH 1 is competitive in terms of cost and number of negative columns (see row AVG1). Even if by using T S there is, on average, an overall reduction in computational time of about 22%, RH 1 is able to find about the 24% more of negative columns compared to T S. We also note that for the 89.66% of the instances, the value of the best solution obtained with RH 1 is less than or equal to that obtained by T S. It is worth observing that for the instances generated from the class RC, both RH 1 and T S do not find any path with negative cost. However, RH 1 behaves the best in terms of solution quality. Indeed, the average cost of the solutions determined by T S is 1.35 times higher than the average cost of the solutions found by RH 1 . Considering the computational overhead, the two approaches show similar performance and RH 1 behaves slightly better than T S. The average computational effort of RH 1 is 7.50 milliseconds against 7.64 milliseconds required by T S. Table 3 presents the average results for the instances belonging to the set S2. The average performance in terms of cost and number of negative columns of RH 1 is clearly better than that obtained with T S. RH 1 outperforms T S in terms of effectiveness. Indeed, RH 1 finds the best solution for all the instances of the set. Despite T S is faster in finding the solution, the quality of solutions is always widely worse (see row AVG in Table 3). Furthermore, it is worth to observe that RH 1 quickly finds more negative columns than T S. Indeed, RH 1 finds one negative column in 0.38 milliseconds, on average, while T S spends 0.96 milliseconds, the increase is about 152%. In summary, RH 1 finds a large amount of negative columns in a quite short time for both the classes S1 and S2. In addition, the solutions found are not only superior in number, but they are more effective compared to those obtained by using T S.

Rollout approach: implemented solution strategies
With the purpose to improve the effectiveness of the proposed approach, we considered several different versions of RH, that relies on the use of local search method. We used three different 'l-Opt algorithms' in this work, that is 2-Opt, 3-Opt and Or-Opt [10]. In the first version, described in Section 3 and named RH 1 , the Or-Opt is applied starting from the complete solution determined by the N RH. In the second one, that is, the RH 2 all the considered local search methods (in the order Or-Opt, 2-Opt, 3-Opt) are applied starting from the complete solution determined by the N RH. In the third version, referred to as RH 3 , the local search is used to improve the complete solution determined by H c , for evaluating all nodes belonging to the neighbourhood, at each iteration. Whereas in the fourth one, denoted by RH 4 , the local search is used to improve the complete solution obtained by H c every p iterations, for each neighbourhood. In the last version, named RH 5 , H c is applied for a subset of the neighbours of the current partial solution at each iteration. In addition, starting from the basic RH, we considered two enhanced versions. The former is a modified version of the 'Fortified Rollout Algorithm' F(RH) [5] . The main idea is to maintain, at each iteration k, in addition to the partial solution (k) , a complete path st k from node s to node t, which represents the best feasible trajectory obtained so far, whose cost is denoted by c( st k ).
If the cost of the best complete solution c( k i k+1 ), k i k+1 ∈ ( (k) ), obtained by applying the base heuristic H c to the partial solution (k) , is better than c( st k ), the nodeī k+1 chosen according to (12), is added to (k) and st k is updated accordingly. In the contrary case, the partial solution (k+1) is set equal to the first k+1 nodes of the complete solution st k and the procedure is iterated until a complete solution is found. We obtained the second version by incorporating a two step look-ahead strategy into the rollout framework (see [5]). The aim of this variant, denoted by 2SL, is to predict the profitability of choosing a node, by estimating its effect on its descendant nodes. In particular, at the k-th iteration, the partial path (k) is generated starting from a sequence of two given nodes and the effect on the cost, in the case that these two nodes are added to the path (k) , is evaluated.
In the testing phase, we implemented and tested the following strategies: (1) N RH: the naive version of RH that used H c as base heuristic.
(2) RH 1 : the Or-Opt algorithm is applied starting from the complete solution generated by N RH. (11) 2SL(F (RH 5 ) + (RH 2 )) the Or-Opt algorithm is applied to the complete solution determined by 2SL(F (RH 5 )).

Rollout-based algorithms results
We conduced experiments to evaluate the performances, in terms of both computational efficiency and solution quality, of the aforementioned strategies for solving the RCESPP.
It is worth to observe that we do not expect that our proposed algorithms have superior performances compared to the exact efficient algorithms proposed in the literature [9,29]. Actually, we want to propose a new potential approach that can be embedded in the column-generation, not only by generating a large amount of negative columns in a short computational time, but also by providing good quality solutions. We compared the solutions found by the rollout approaches, for the instances belonging to the set S1, with those determined by using the GSSAA. For detailed results, the reader is referred to Appendix of the supplemental material. Table A4 presents the time spent, in milliseconds, to solve each instance by the algorithms and the relative value of objective function. Table 4 summarizes the average results obtained with the GSSAA, compared with the different rollout strategies. According to Table 4, the rollout algorithms are highly competitive in terms of computational time. Focusing on set R, we are able to find 11 optimal solutions out of 12 instances by using the majority of the algorithms within a significantly reduced time. According to Table 4, our algorithms outperform the GSSAA in terms of average on time also for the set C and RC. Table 5 presents the percentage gap in cost ('gap') and the ratio between the computational overhead of the heuristic approaches and the execution time required by the GSSAA ('speed up'). The results show that all the rollout-based approaches are competitive, particularly, in terms of computational times. However, the 2SL(F (RH 3 ) + (RH 2 )) and 2SL(F (RH 5 )) versions are highly competitive in terms of both computational effort and cost.
To evaluate the performances of the proposed algorithms, we carried out a second series of tests, by considering the instances belonging to the set S2. Then, we compared the solutions found by the rollout approaches with those determined by using the exact approach proposed in [29], that is, BAC. For the sake of comprehension, the computational results Table 4. Average of the solution values found for the instances belonging to S1 by GSSAA and rollout strategies.   presented in [29] have been reported in Table A1 of the Appendix of the supplemental material and used for comparison. Tables A5 and A6, in the Appendix of the supplemental material, resume the details of the results collected by solving the instances belonging to the set S2. Table 6, presents the comparison results for the instances belonging to set S2. Focusing our attention to the classes A, B and E, the results show that the use of more sophisticated algorithms leads an increase of the computational times. However, the most  A, B and E, the results collected for the classes G, M and P (see Table 6) highlight an increase in the computational time when more sophisticated versions of the algorithm are used. The look-ahead strategy is the most time consuming approach for this set of instances. However, only for a limited number of tests, about the 37%, the look-ahead strategies are not competitive in terms of time spent to find a solution. It is worth to observe that, we are able to find the optimal solution with 2SL(F (RH 4 )) strategy for five instances belonging to the class P in a very limited amount of time (see Table 7).

Final remarks
We carried out the experimental phase with the aim of both comparing the proposed rollout approach with the state-of-the-art and evaluating the performance of the different versions of the proposed solution strategy.
Referring to the comparison with the state-of-the-art, the collected computational results highlight the potential of the proposed rollout algorithm to solve the RCESPP in an efficient and effectiveness way. Indeed, the rollout approach explores a larger portion of solution space than T S, determining a higher number of paths, with negative reduced cost, characterized by an objective function value better than that obtained by T S. This is a promising result in view of an implementation of the rollout algorithm for the RCESPP embedded in a column-generation approach for solving the set-partitioning formulation of the VRP. In addition, even though the proposed rollout algorithm is slightly slower than T S for some instances, it does not require the tuning of the parameters that is mandatory for the T S. This is an important enhancement since the rollout has the potential to be applied with success to solve any kind of instances. This is not true for the T S unless an in depth tuning of the parameters is performed.
Considering the comparison among the several defined versions of the rollout approach, the computational results highlight that no version dominates the others. They have similar performance and, as expected, the more sophisticated ones provide better solutions in term of objective function value but they are more expensive in terms of computational cost. In addition, the version 2SL(F (RH 4 )) is able to provide the optimal solution for some instances by paying a higher computational overhead. Thus, there exists a trade-off between quality of the solution and computational cost. In this respect, we can not conclude that one version is better than the others, rather we leave the reader to choose the version that best fit his/her application based on the in depth analysis carried out in Section 5.4.

Conclusions
In this work, we have presented new approaches to address the RCESPP. The proposed solution strategies are based on a rollout metaheuristic [7,24] combined with local search procedures.
The developed approach is tested on benchmark instances. We carried out a comparative analysis with the state-of-the-art tabu search algorithm proposed in [1].
The computational results highlight the effectiveness of the proposed strategy. Indeed, on average, it is able to determine a better upper bound than that provided by the tabu search. In addition, the execution time slightly increases with respect the computational overhead of the tabu search.
The search process of the proposed rollout algorithm allows to determine a number of solutions with negative cost higher than that found by the tabu search. These results suggest a better behaviour of the proposed approach than the tabu search for pricing columns in a column-generation approach.
It would be interesting to embed the rollout algorithm in a column-generation approach. In addition, the rollout framework can be parallelized improving the performance in term of efficiency, speeding up the search process. These will be the subjects of future investigation.