Maximizing throughput in zero-buffer tandem lines with dedicated and flexible servers

For tandem queues with no buffer spaces and both dedicated and flexible servers, this article studies how flexible servers should be assigned to maximize the throughput. The optimal policy is completely characterized. Insights gained from applying the Policy Iteration algorithm to systems with three, four, and five stations are used to devise heuristics for systems of arbitrary size. These heuristics are verified by numerical analysis. Throughput improvement obtained when, for a given server assignment, dedicated servers are changed to flexible servers.


Introduction
Consider a tandem queueing network with N ≥ 2 stations, M ≥ 1 dedicated servers, and F ≥ 1 flexible servers. At any given time, each station can be assigned multiple servers and each server can work only on one job, and a job may have at most one server assigned to it. Assume that the service times of each job at station i ∈ {1, . . . , N} are independent and identically distributed with rate μ i ; i.e., the service rate of each server at the i th station is only dependent on the station.
In the above system, we assume that dedicated servers are already assigned to the stations. We are interested in determining the dynamic assignment policy for flexible servers that maximizes the long-run average throughput. For simplicity, we assume that the travel times for jobs to progress and also the travel times for flexible servers to move between stations are negligible. We also assume that there is an infinite supply of jobs in front of the first station and infinite space for jobs completed at the last station. There are no buffer spaces between stations.
We introduce the concept of a "hand-off" for flexible servers. This appears to be a new concept, as there does not appear to be any work in the literature that considers a mixture of dedicated and flexible servers, where servers cannot collaborate. A hand-off happens when a flexible server passes the job it is serving to a dedicated server at the same station. Although it is possible to perform hand-off at any time, we let it occur only in the following two cases. When a station has a busy flexible server and a free dedicated server, the flexible server can pass its job to the dedicated server and become free. When a station has a busy flexible server and a blocked dedicated server, jobs can be swapped between the two servers. In either of the two cases, we say a hand-off has taken place. In this work, we only consider the manufacturing blocking mechanism (see Avi-Itzhak and Halfin (1993) for details of manufacturing versus communication blocking).
Any allocation policy should define appropriate actions when blocking or starvation occurs. In cases where there are multiple blocked or starved servers, policies should prioritize resolving blocking or starvation of the involved servers. We will show that when workloads (i.e., the ratio of the mean service time to the number of dedicated servers for a station) are equal, a near-optimal policy is one that clears blocking from the end of the line to the beginning and avoids starving servers. With different workloads, a near-optimal policy prioritizes clearing blocking for stations with higher mean service times.
An instance of the application of the problem described above is the bed management problem from the healthcare domain. Bed management deals with optimizing patient flow into and out of beds, so that waiting times are reduced (Department of Health and Aged Care, 1999). A patient arriving at a hospital might need to go through a sequence of units. For example, the patient might go through, in order, the emergency, express, medicine, surgery, recovery, and complex units. The role of the express unit is to accommodate the patient until a bed becomes available in the medicine unit. A number of beds, called express beds, are shared between the emergency, express, and medicine units to allow patients to be moved between these units. However, sharing is performed in an ad hoc manner and is solely limited to these three units. Several questions arise, including: How much is the throughput improvement, if beds are shared among the units? How should the shared beds be allocated dynamically to the units? If we need to take a number of dedicated beds from the units and share them, which units should be chosen? Although we do not model this application directly, our results are a first step in an analytic approach to such problems. The literature on tandem lines with multiple servers and finite buffers is not large (see the discussion in van Vuuren et al. (2005)), but there could potentially be additional applications in areas such as manufacturing in zero-buffer settings where a subset of fixtures could be reconfigured for different stations in the line (see Hu et al. (2011) for an example of a zero-buffer automobile manufacturing setting).
Beyond multiple server and finite buffer systems, there is a rich literature on the assignment of flexible servers in tandem queues. Most of the papers in this domain focus on minimizing holding costs. Iravani (1997) considers tandem queues attended by a moving server with holding and switching costs. As a basic model, he shows that for two-stage tandem queues, the policy that minimizes the total discounted and long-run average holding and switching costs is a greedy and exhaustive policy in the second stage. The first stage could then follow static, gated-limited, or double-threshold policies. Ahn et al. (2002) consider the optimal control of two parallel servers in a two-stage tandem queueing system with two flexible servers and holding costs. They examine both collaborative and non-collaborative cases. In the collaborative case, servers may collaborate to work on the same job at the same time, whereas in the non-collaborative case each job can be served by at most one server at a time. They provide simple conditions under which it is optimal to allocate both servers to station 1 or 2 in the collaborative case. In the non-collaborative case, they show that the same condition as in the collaborative case guarantees the existence of an optimal policy that is exhaustive in station 1. However the condition for exhaustive service at station 2 to be optimal does not carry over. Pandelis (2008) considers a two-station tandem line with both dedicated and flexible servers where all servers have time-varying rates. Servers can work collaboratively. With a given probability, jobs can leave the system after completion in the first station. The optimal policy to minimize holding costs is described.
For a two-class queueing system with one dedicated server, one flexible server, and no exogenous arrivals, Ahn et al. (2004) characterize the server assignment policy that minimizes the expected total holding cost incurred until all jobs initially present in the system have departed. Andradóttir et al. (2007) consider tandem queues having both flexible and dedicated servers with finite or infinite buffers between stations. They study the dynamic assignment of servers, in the collaborative case, such that the long-run average throughput is maximized. They assume that the service requirements of jobs at stations are exponentially distributed. For two stations and three servers (both flexible and dedicated) the allocation of the flexible servers is of threshold type. Multiple thresholds are used to express allocation policies and the thresholds are specified. Wu et al. (2008) consider the notion of dedicated and flexible servers to determine the allocation of flexible servers that minimizes holding cost in a clearing system with two queues in tandem and in which dedicated servers are subject to failure. Wu et al. (2006) consider the case of more general serial lines with external arrivals under discounted and average cost criteria. Andradóttir et al. (2001Andradóttir et al. ( , 2003, consider the dynamic assignment of servers to maximize the long-run average throughput in queueing networks with flexible servers. They consider tandem queues with two stations and flexible servers with a finite buffer between the stations. Servers work collaboratively. They use Policy Iteration to show that with less than three servers, a policy that avoids blocking at the first station and starvation at the second station is optimal. Andradóttir and Ayhan (2005) study the same system with three servers. Assuming server 1 (2) is more efficient than server 2 (3) for serving jobs at the second station, they prove that the optimal policy has three mandates: (i) server 1 works at station 1, unless the station is blocked; (ii) server 3 works at station 2, unless it is starved; (iii) server 2 is a roving server, it works at station 1 if the buffer is low and works at station 2 when the buffer level is high.
Hasenbein and Kim (2011) also consider the system introduced in the last paragraph. They prove that the conjecture proposed by Andradóttir and Ayhan (2005) for generic numbers of servers is indeed true. They use general properties of the bias of the optimal policy to show that a threshold policy on buffer levels exists that relies on ordering of server efficiency at stations. They further determine the threshold values. They also prove that restricting the buffer size decreases the maximum achievable throughput. Kırkızlar et al. (2012) analyze a tandem line that is understaffed; i.e., there are more stations than servers. They consider tandem queues with three stations, two servers, different flexibility structures, and either deterministic service times and arbitrary buffers or exponential service times and small buffers. In the deterministic setting, they prove that the best possible production rate with full server flexibility and infinite buffers can be attained with partial flexibility and zero buffers.
In contrast to Iravani (1997), Ahn et al. (2002), Ahn et al. (2004), and Pandelis (2008), who consider holding costs, we study throughput in our work. The problem in Andradóttir and Ayhan (2005) differs from our work in that it studies the collaborative case and includes buffers between stations. The work in Andradóttir et al. (2001Andradóttir et al. ( , 2007 and Hasenbein and Kim (2011) encompasses buffer spaces and collaborative servers, assumes heterogeneous service rates for servers, and allows a single server per station. Note that in a zero-buffer setting, server allocation in tandem lines with collaborative servers is a different problem than server allocation with non-collaborative servers. Andradóttir et al. (2003) consider the case where jobs in the system belong to different classes and service rates are heterogeneous based on server and job class. In this article, we determine optimal policies for the dynamic allocation of flexible servers in zero-buffer tandem lines with heterogeneous service rates for stations. We study the noncollaborative case and our goal is to maximize throughput. In addition, we report the throughput improvement gained from making dedicated servers flexible.
This article is organized as follows. In Section 2, the optimality of hand-off and non-idling for tandem lines with two stations is shown. In Section 3 we use Markov Decision Process theory to derive the optimal policy for tandem lines with two stations, an arbitrary number of dedicated servers, and one flexible server. We further show how to employ the Policy Iteration algorithm to construct the optimal policy. In Section 4 we apply Policy Iteration to larger instances (tandem lines with three, four, and five stations) and describe a more generic near-optimal policy. Using the insights gained from Section 4, in Section 5 we provide heuristics for allocation policies for systems of arbitrary size and with heterogeneous mean service times. We also study configurations with service times that are not exponentially distributed. Section 6 studies the effects of increasing the number of flexible servers on the throughput. Finally, Section 7 concludes the article and discusses future work. The proofs of most of our results are included in the online supplement.

Optimal policy properties
In this section we introduce and prove two properties of the optimal policy for tandem lines with two stations. By optimal we mean that the policy maximizes throughput, which we equivalently consider that the policy leads to the highest number of departures (at every point in time) from the first station. These properties are the optimality of hand-off and of non-idling of flexible servers. As a result, the state and action spaces when modeling the system with Markov Decision Processes are constrained so that any states or actions which do not perform a hand-off or idle flexible servers can be excluded. This fact becomes helpful when search spaces are big; i.e., for systems with a large number of stations or servers. In Theorems 1 and 2 there is one dedicated server at each station and one flexible server that can work at either station. In Section 2.4, Corollary 1 extends the theorems and shows that Theorems 1 and 2 hold for systems with arrivals and for clearing systems. Also, Corollary 2 shows that the results hold for systems with arbitrary numbers of dedicated and flexible servers. The proofs of Theorems 1 and 2 are sample path proofs, i.e., we fix a sample path and generate all of the service times at the beginning. The service times can be from any distribution.

Hand-off property
In this subsection, we state and prove Theorem 1 on the optimality of hand-off.
Theorem 1. The optimal policy performs hand-off whenever possible.
Proof. There are only two scenarios under which hand-off is possible.

A dedicated server becomes blocked in the same station
where a flexible server is working (for N = 2, this can only happen in the first station). A hand-off is used to clear blocking.

A dedicated server becomes available in the same station
where a flexible server is working (for N = 2, this can only happen in the second station). A hand-off is used to avoid a flexible server working at a station where there is an idle dedicated server.
Let ω and ω be policies that always perform a hand-off, with the only exception that ω does not perform a hand-off at one of the times when a hand-off is possible. We define the property ahead(n) as follows. Comparing the system under policies ω and ω when n jobs have departed from the first station under ω, ahead(n) holds if any of the following hold for each job at a station: (i) the same job is being served with identical time spent in the system; (ii) the same job is being served with less residual service time under ω; and (iii) ω is serving a job that ω has not yet admitted. Note that when there are two jobs in a station, more than one of the above situations (i.e., (i), (ii), and (iii)) can hold. Both stations must satisfy these conditions. The proof is structured as follows: For both scenarios, Lemma 1 states that if the nth departure happens sooner under ω than ω and ahead(n) holds, then the (n + 1)st departure under ω will not occur sooner under ω.
For both scenarios, Lemma 2 completes the proof by showing that given a system with m > 1 departures from the first station that satisfies ahead(m), Lemma 1 can be applied iteratively on the system to satisfy ahead(m + 1).
Given Lemmas 1 and 2, a simple induction argument follows. Define D ω (n) to be the time at which the nth departure from the first station occurs under ω. We use induction to show ∀n, m.

Proof of Lemmas 1 and 2
In all of the lemmas below, the following labeling convention is used. The labeling Case l 1 , . . . , l k−1 .l k means that all of the assumptions made in cases l 1 to l 1 . . . . .l k−1 hold for this case, in addition to the assumption introduced in l k . Case l 1 , . . . , l k−1 .l k and Case l 1 , . . . , l k−1 .l k where l k = l k refer to two different branches of l 1 . . . . .l k−1 that differ only in the assumptions introduced in l k and l k . In all figures in the lemmas below, the top label is the time stamp of the configuration, ovals represent stations, the first row shows ω, and the second shows ω . Within each oval are job numbers; in each station the top job is served by a dedicated server and the bottom job by a flexible server. The notation below the stations shows the residual service time of a job appearing in the same station under both policies when residual service times are different. In some cases, a hand-off is shown by two configurations in a row separated by a vertical line.
The following notation is used throughout the following lemmas. Let X p,i n,t be the residual service time of job n at time t served by server X that works in station i ∈ {1, 2} under policy p ∈ {ω, ω }. Server X can be replaced by d or f , representing a dedicated or a flexible server, respectively. The residual service time for a job is the time remaining to complete the job's service. Dropping any of the indices of Server X p,i n,t means that the dropped index can take any value. Also note that the service time of job n is independent of its admission time, the server being dedicated/flexible, and the policy used; i.e., is the residual service time of job n at time t that is being served by a dedicated server at station i under policy ω. Finally, to refer to a server working at station i , we use σ i X , where X ∈ {d, f } as above. Note that D w (n) is the time of the nth departure, which in general is not the nth job to enter the system. To emphasize this fact, we do not use n to enumerate jobs according to their time of entry (k and p are used instead).
In all of the following lemmas, let t 0 be the hand-off time at which the two policies make a different choice. We give part of the proof of Lemma 1 here (to give a flavour of how the proof proceeds); the remainder is in the online supplement. The proof of Lemma 2 is also given in the online supplement.
Proof. The two primary steps of the proof (basis and inductive steps) are shown in the following. We consider the first scenario here and leave the second scenario for the online supplement.

Basis step.
If time t 0 is not reached, both systems remain the same and there is nothing to prove in the basis step. Otherwise, start the system from the empty state and let the first departure from the first station (D ω (1) = D ω (1)) occur. If d 1 2,t 0 < min{ f 1 3,t 0 , d 2 1,t 0 }, the system follows the first scenario. Server σ 1 d is serving job 2, σ 1 f is serving job 3, and σ 2 d is serving job 1. Up until time t 0 , the two policies are the same. When time t 0 is reached, the following cases are possible.
Case 1: Assume d 2 1,t 0 > f 1 3,t 0 , meaning that the flexible server completes its service before the dedicated server. Let t 1 = f 1 3,t 0 . The policy ω waits t 1 time units until σ 1 f completes job 3 and sends the server to the second station with f ω ,2 3,t 0 +t 1 as a residual service time. The policy ω performs a handoff at t 0 and sends the flexible server to the second station with a residual service time of f ω,2 2,t 0 . Therefore, at t 0 + t 1 , D ω (2) < D ω (2). Case 2: Assume d 2 1,t 0 ≤ f 1 3,t 0 , meaning that the dedicated server completes its service before the flexible server. Let t 1 = d 2 1,t 0 . The policy ω waits t 1 time units until σ 2 d becomes available. Thus, at time t 0 + t 1 , the blocked job goes to the second station with d ω ,2 2,t 0 +t 1 as a residual service time. The policy ω performs a hand-off at t 0 and sends the flexible server to the second station with a residual service time of f ω,2 2,t 0 (here the residual service time is equal to the service time). Therefore, at t 0 + t 1 , D ω (2) < D ω (2).

Inductive step.
If t 0 has occurred in the basis step, Lemma 2 shows that for each case considered in the inductive step, the system respects the ahead property. Hence, ω will not have departures occurring sooner than ω. If time t 0 is not reached, the two systems remain the same and there will be nothing to prove at this stage.
If time t 0 occurs after the (n − 1)st departure but before the nth departure, consider this configuration: a dedicated blocked server (serving job k) and a flexible busy server (serving job k + 1) at the first station and a dedicated busy server (serving job p) at the second station.
meaning that the flexible server completes its service before the dedicated server. Let t 1 = f 1 k+1,t 0 , assume σ 1 d is holding the kth job, and n − 1 departures from the first station have occurred. The policy ω waits t 1 time units until σ 1 f completes service and sends the server to the second station with f ω ,2 k+1,t 0 +t 1 as residual service time. (To simplify the argument, we assume that at t 0 + t 1 a hand-off occurs and the kth job is sent to the second station instead of the (k + 1)st job and hence σ 2 f should serve a job with service time f 2 k,t 0 +t 1 . This can be done as the jobs are indistinguishable). The policy ω performs a hand-off at t 0 and sends the flexible server to the second station with a service time of f ω,2 k,t 0 . Therefore, at t 0 + t 1 , Considering the system at t 0 + t 1 , the following cases can happen. Note that σ 1 d is blocked at t 0 + t 1 under both policies, holding job k + 1. Figure  2 illustrates this case. We further divide this case. Figure 3 illustrates this case. This is similar to Case 1.2. Although this reference seems circular, the fact that job k + 2 eventually leaves the system avoids this.
This is similar to Case 1.2.2.1, discussed in the online supplement.  The remaining cases in the inductive step are provided in the online supplement.

Non-idling property
In Theorem 1 we proved that an optimal policy uses a handoff whenever possible. Now in Theorem 2, we aim to show that an optimal policy avoids idling the flexible server. In the proof of Theorem 2, we will use the results of Theorem 1; i.e., the two policies used in the proof perform hand-offs whenever possible. Having said that, we only claim nonidling to be a property of an optimal policy if a hand-off is employed. Otherwise, there might be cases where idling leads to better results.
For example, assume at time t 0 , jobs k and k + 1 are being served by dedicated servers at the first and second stations, respectively, and that policies do not perform a hand-off. At t 0 , a non-idling policy assigns the flexible server to the first station to serve job k + 2. An idling policy lets the flexible server idle at t 0 . Now assume σ 1 d completes serving its job sooner than the other servers. At t 0 + d 1 k,t 0 , under the non-idling policy σ 1 d (job k) becomes blocked. Under the idling policy the flexible server is assigned to the second station to admit job k and σ 1 d admits job k + 2. Hence, assuming there have been n − 1 departures from the first station for both policies, D idling (n) < D non-idling (n) which means idling the flexible server leads to better results. As a result of this observation, our induction method in the proof of Theorem 2 below can not be used (when hand-off is not assumed).

Theorem 2. The optimal policy is non-idling under hand-off assumptions.
Proof. Let π and π be non-idling policies; i.e., whenever the flexible server becomes idle, the policy assigns it to r the second station, if the first station is blocked; r the first station, otherwise.
The only exception is that π idles the flexible server once for only t time units starting from t 0 .
Only two other non-idling policies exist. The first is a policy that assigns the idle flexible server to the first station, if the first station is blocked. However, as a hand-off is used, the flexible server performs a hand-off and takes the blocked job to the second station, resulting in the same behavior as π. The second policy assigns the idle flexible server to the second station, if the first station is not blocked. Again, as a hand-off is used, the flexible server passes its job to the free dedicated server in the second station and admits a new job at the first station, leading to the same behavior as π. Therefore we only need to consider π and π throughout the rest of the proof.
There are three scenarios under which idling is possible.
1. A blocked server at the first station and a busy server at the second station. 2. Two busy servers, one at each station. 3. One busy server at the first station.
The proof requires explicit, careful, exhaustive evaluation of all possibilities. Due to space limitations, we do not include the details of the proof for this theorem. However the proof is similar to the proof of Theorem 1. A lemma is needed for each of the three scenarios above to do the induction over departures. A fourth lemma is required to show that the ahead property holds when these three lemmas are used (i.e., after the next departure). The proof of Theorem 2 includes more cases than Theorem 1. The reason is that in addition to comparing the residual service times of jobs to each other, we need to compare them at an extra point in time, t 0 + t (defined above).
A natural extension of Theorem 2 is as follows: introducing idling to any non-idling policy decreases the throughput, assuming that a hand-off is used. If a hand-off is not used, a non-idling policy can lead to better results, as discussed at the beginning of Section 2.3. Theorem 2 shows that after the idling period ([t 0 , t 0 + t]), the system is ahead under the policy that is always non-idling. It also shows that if a non-idling policy is applied to the system, the ahead property still holds.
To investigate the proposed extension, we could follow an approach similar to the proof of Theorem 2. Given a policy with idling, we produce another policy by making it non-idling for a period of time. Theorem 2 shows that after this period, the system is ahead under the modified policy. Conceptually, we believe the fact that a hand-off can be employed makes it impossible for an idling policy to provide a higher throughput than a non-idling policy.

Extensions
In this subsection, we state that the results of Theorems 1 and 2 can be extended to systems with arrivals, clearing systems, and systems with additional dedicated or flexible servers. The proofs of the following corollaries are presented in the online supplement. In this section we proved that an optimal policy performs a hand-off and does not idle. In the next section, we model tandem lines with two stations and determine the optimal allocation policy. The properties proven in Section 2 will be used to reduce the size of the action and state spaces.

Markov decision process model
We use a Markov Decision Process (MDP) to model the above problem. For our controlled Continuous-Time Markov Chain (CTMC), let S, A, and A s represent the state space, the action space, and the action space conditioned on the Markov chain being in state s ∈ S, respectively.
A state s ∈ S denotes a tuple: where x i is the number of busy servers in station i and y i is the number of blocked servers in station i . Note that we do not need to include y N in the state, as no servers can be blocked at the last station. We uniformize the CTMC (see Lippman (1975)) to convert the continuous-time problem into a discrete-time problem. The normalization constant that we will employ is denoted by q and is defined below.
Let v i be the number of dedicated servers at station i . Using the non-idling and hand-off properties of optimal policies, the number of flexible servers at station i is x i + y i − v i . Hence, the state (1) also determines the locations of the flexible servers.
Constructing the transition matrices is a two stage process. One needs to start from the initial state (s) and follow possible actions (a) to determine the new states (s ). State s is an intermediate state that does not appear in the transition matrix. The transition between s and s is immediate. From there, it is possible to follow further transitions and reach new states (s ) with the probabilities defined in the transition matrix (P a ). We have which is reflected in the transition matrix as P a (s, s ) = μ/q where q = max s∈S s ∈{S−s} P a (s, s ) and μ is the transition rate from s to s .
The size of the action space is An action a ∈ A is denoted by a tuple: a 1 , . . . , a j , . . . , a N ), where a j is the number of flexible servers assigned to station j , 0 ≤ a j ≤ F, and N j =1 a j = F.

Generic MDP
We begin by specializing to the case of N = 2, F = 1, v 1 = I, v 2 = J. We intend to apply the Policy Iteration (PI) algorithm to find the optimal policy.

States:
Using the state description in Equation (1), the state space is as follows: Our state space is constrained so that a hand-off is used whenever possible (as we showed in Theorem 1 that an optimal policy performs a hand-off). If we made it optional to employ a hand-off, additional states (such as (I, 1, J) and (i, b, J), i, b > 0, i + b = I + 1) would need to be introduced.

Actions:
The action space is as follows: where a i means moving the flexible server to the i th station. The permissible actions are given by In the PI algorithm, we choose the initial policy (d 0 ) to be d 0 (s) = ⎧ ⎪ ⎨ ⎪ ⎩ a 1 for s = (I + 1, 0, j ), (I + 1, 0, J) a 2 for s = (i, b, J), (I, 0, J + 1), (i, b, J + 1) (0, I, J + 1) , where i , j , and b are constrained so that s is in the state space. The reward gained from d 0 is .
The reward function as stated here is a valid choice for maximizing the throughput, as shown in Section 3 of Andradóttir et al. (2001).
In the PI algorithm we solve is the result of one iteration of the PI algorithm. This is equivalent to showing that for all s, In inequality (2) given that s a → s P a → s , r (s, a) = x N μ N where x N appears in the representation of state s according to Equation (1) and μ N = P a (s, s ). Note that s is also a function of s and a.
We implemented the PI algorithm in MATLAB and intended to apply it to the stated generic MDP. However, it is not an easy task to model the generic MDP in MATLAB as each of the state space members represents a set of states.
The result of the PI algorithm (inequality (2)) is an expression in terms of service rates. We tried to verify the inequality for generic cases using induction on I and J. We have not been able to find such a relation to this point. However, we could prove that the expression holds for arbitrary service rates for every specific pair of values I and J that we considered. Thus, we have the following conjecture, which has been verified for the values of I and J described above.
Conjecture 1. The optimal policy resulting from these observations is the policy that clears blocking whenever possible and assigns the flexible server to the first station otherwise.
Comparing this policy with allocation policies for collaborative servers, Andradóttir et al. (2007) (Lemma 4.1) show that for a tandem queue with two stations, a finite buffer, a dedicated server at each station and a flexible server, the optimal policy is threshold type. The optimal policy assigns the flexible server based on the number of jobs in the system (both in service and in the buffer). The optimal policy sends the flexible server to the second station, when there is a busy dedicated server at each station (even if there are no blocked servers). This difference with our optimal policy occurs due to the server collaboration assumption in Andradóttir et al. (2007).

Larger systems
With two stations, only one station can be blocked. As we saw in Section 3, the primary task of the optimal policy is to clear blocking. When the number of stations is increased, blocking can occur in more than one station, and policies should prioritize the order in which blocked servers are cleared. Also, starvation might occur in the system. Starvation occurs when a dedicated server becomes free and has no jobs to process. To study these effects, in this section we apply the MDP modelling and PI algorithm to longer tandem lines, and use these results to provide heuristics. Similar to Section 3, the policies that we consider are nonidling and perform a hand-off whenever possible (which are properties of optimal policies as shown by Theorems 1 and 2).
To begin, we study tandem lines where stations have equal mean service times, have a dedicated server per station, and have one flexible server. Solving the MDP for three, four, and five stations suggests the following nearoptimal policy: (i) clear blocking from the end to the beginning unless it causes starvation; (ii) if clearing a blocked server would cause starvation, admit a new job.
Note that as N increases, the number of states increases exponentially, which makes the overhead of constructing the MDP prohibitive. For example, when N = 5 (with the configuration introduced above), there are five actions, 236 states, and P a 1 , P a 2 , P a 3 , P a 4 , and P a 5 have 498, 496, 462, 447, and 413 non-zero entries, respectively. Calculating each of these entries is difficult to automate, and hence is errorprone. In Section 5 we will use simulation to show that if the policy stated in the beginning of this section is employed, near-optimal results are gained for different configurations.

Numerical results
A number of policies and configurations are considered here. The first set of configurations (Tables 1, 2, and 3) has v i = 1, μ i = 1, i = 1, . . . , N, and F = 1. N ranges from four to 30. The second set of configurations (Tables 4, 5, and 6) deals with heterogeneous service rates and a single server per station. We let W = (w 1 , . . . , w i , . . . , w N ) be the vector of mean service times, where w i is the mean service time at the i th station.
The policies we consider can be distinguished by the way they treat blocking and starvation and how they employ a hand-off. A policy should determine the order in which it clears blocking. It should specify how it deals with the trade-off between blocking and starvation. The policy also describes whether it uses hand-off or not. Although Theorem 1 suggests that a hand-off should be used, here we also consider policies that do not employ hand-off. We intend to show how the suggested heuristic works and also study the impact of allowing hand-offs.
In all policies if there is no blocking in the system and free flexible servers exist, flexible servers are sent to the first station to admit a new job. This is consistent with the necessity of non-idling for optimality.
The policies we study are as follows: r r Policy 8: clears blocking from beginning to end; does not use a hand-off.
Policies 4 and 5 prioritize clearing blocking caused by the station with the highest mean service time (among the stations that cause blocking). Policies 4 and 5 are applied only on the second set of configurations. In practice, Policies 2, 3, and 4 show similar behavior as Policies 3 and 4 clear a blocked server only if all stations preceding a blocked station are blocked. The hand-off in Policy 2 pushes a flexible server through all consecutive blocked stations. Simulation results reveal that Policies 1 and 4 perform better than the other policies, depending on the configuration considered. In Policy 1, the 2N/3 parameter gives higher priority to avoiding starvation, compared with Policies 2, 3, and 4. This parameter is derived experimentally and is used to ignore a blocked server, if clearing that blocking causes simultaneous starvations in a large number of stations.
We applied the above policies to several configurations (defined below) and measured throughput in a simulation environment. Each simulation is a long run, in terms of the number of departures from the system. The run is long enough (100 000 000 departures) so that the effect of starting the system from an empty state becomes negligible. In the following tables policies are ordered based on their throughputs. Table 1 compares the throughputs of configurations with one flexible server, N ∈ {4, 5, 8, 15, 30} stations where the service time of each station is exponentially distributed with rate 1 and one dedicated server at each station. Policy 1 is close to what the PI algorithm suggested and results in the best throughput among these policies. It can be observed that Policies 1 and 2 lead to identical throughputs. This is because when a station and all its preceding stations are blocked, Policy 1 clears blocking from the last station in the sequence of blocked stations and Policy 2 uses a hand-off to pass the flexible server to the last station in the sequence of blocked stations. Although the throughputs are identical, Policy 2 needs to perform extra hand-offs compared with Policy 1. Comparing Policies 1 and 3, Policy 1 allows starvation in the initial stations, whereas Policy 3 avoids it in all stations. Policy 1 has better performance than Policy 3. The reason for this behavior is that the downside of having starvation in the initial stations is less than the benefit of clearing blocked servers at the end stations. There are two explanations: (i) starvation closer to the first station is likely to be resolved more quickly compared with starvation in the end stations and (ii) starvation at the initial stations, when the downstream stations are busy, does not affect the downstream stations. Hence, in Policy 1, the trade-off between clearing blocking and avoiding starvation (in the initial stations) is resolved in favor of clearing blocking.

Homogeneous mean service times (the first set)
To illustrate the effect of including flexible servers, in the last column of Table 1 we show the case where there are no flexible servers and the extra server is assigned to one of the middle stations. It can be observed that making a server flexible can lead to throughput improvements up to 36.08%.
We also study configurations that include service time distributions with coefficients of variation other than one. In simulations, an Erlang distribution is used for a coefficients of variation less than one and a hyper-exponential distribution for coefficients of variation greater than one. Let cv i be the coefficient of variation of the i th station and cv be the vector of the coefficients of variation. We simulated a number of configurations with different coefficients of variation and concluded the same results as the case with coefficients of variation equal to one for all stations. More specifically we tested the following configurations: N = 4, cv 3 = 0.5, 1.34; N = 5, cv 2 = 0.5; N = 5, cv 4 = 1.34; N = 8, cv 3 = 0.5, cv 7 = 1.34; N = 15, cv 3 = cv 10 = 0.5, cv 7 = cv 13 = 1.34; N = 30, cv 3 = cv 10 =cv 18 = cv 25 = 0.5, cv 7 = cv 13 = cv 22 = cv 28 = 1.34. The unspecified coefficients of variation are equal to one. Table 4 compares the throughputs for configurations with various numbers of stations, different service rates, and one flexible server. Policies 4 and 5 are not applicable to the first set of configurations, but they are compared against other policies in the second set of configurations. Table  4 illustrates that all orders remain the same as Table 1, except that Policy 4 outperforms Policy 1 in certain cases. The intuition is that while for small systems, an optimal policy clears blocking from the end to the beginning, for large systems an optimal policy prefers to clear blocking for bottleneck stations first (i.e., stations with higher mean service times). For small systems, several adjacent blocked stations could deny admissions to the system. Therefore, it is valuable to clear blocking from the end as Policy 1 does. Table 5 is similar to Table 4 except that the coefficients of variation of all service times are 0.5. The results are almost the same as when the coefficients of variation are equal to one. Table 6 is similar to Table 4 except that the coefficients of variation are greater than one for all stations (with different values). For W = (1, 3, 6, 9, 5, 4, 2, 3, 7, 10, 5, 2, 1), W = (1, 3, 2, 4, 5, 7, 10, 8, 6, 4, 5, 2, 1), and W = (7,5,6,4,2,3,1,3,2,4,5,9,8)  Compared with the case with coefficients of variation equal to one, Policies 1 and 2 perform better than Policy 4. The intuition is that high variability increases the chance of blocking or starvation in adjacent stations.

Heterogeneous mean service times (the second set)
Tables 2, 3, and 5 suggest that if coefficients of variation of all stations are changed uniformly, the ordering between policies remains the same as when coefficients of variation are one. However, Table 6 indicates that when coefficients of variation of stations are arbitrarily modified (in this case all greater than one), the ordering between policies might change. Finally, we should comment that our results on coefficients of variation are not conclusive, in the sense that we have not considered vectors of coefficients of variation (cv) with more heterogeneous values.

Increasing the number of flexible servers
In this section we consider the effect of increasing F, the number of flexible servers. Assume that we are given a sys-tem with a particular allocation of dedicated servers. We start to change dedicated servers to flexible servers until all servers are flexible. In other words, F takes all values between zero and N. An interesting question is how much throughput improves as F increases.

Homogeneous W
For equal workloads, we consider a tandem line with 15 stations where originally each station possesses one dedicated server. Service times at each station follow an exponential distribution with rate 1. The deployed server allocation policy is one that clears blocking from the end to the beginning of the line, unless there is blocking caused by a station with no dedicated servers. In that case, the policy prioritizes allocating servers to the station with no dedicated server over clearing blocked servers from the end to the beginning. The policy also avoids starvation.
While increasing F, one should specify the order in which dedicated servers are changed to flexible servers. We contemplate four ordering policies: Policy (i) choosing servers from the end to the beginning of the line; Policy (ii) picking servers from the beginning to the end of the line; Policy (iii) selecting servers around the middle station in a zig-zag way; Policy (iv) choosing servers randomly. Simulations showed that Policies (i) and (ii) lead to similar results. Figure 4 contrasts Policies (i) and (iii) (for the 15 stations configuration described above) and shows that throughputs are fairly close for the two policies but slightly higher for Policy (i). Policy (iv) performs slightly worse than the two other policies, suggesting that choosing the first few servers is more sensitive to the employed policy, compared with choosing the last servers. It is worthwhile to note that, as can be seen in Figure 4, the highest improvements in throughput are gained from making the first few servers flexible. Also, as F increases, the throughput improvement is diminished.  (7,5,6,4,2,3,1,3,2,4,5,9,8) 0  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1).  (12,7,13,3,5,4,1,10,9).   (1,3,4,5,7,9,10,12,13).

Heterogeneous W
In addition to the equal workload case, we want to consider more generic configurations with heterogeneous service rates and multiple servers per station. We propose four ordering policies that specify the order in which dedicated servers are changed to flexible servers.
r Policy I. Changing a dedicated server of a station to a flexible server reduces the total service rate of that station and can even make the station a bottleneck. Therefore, to gain better results, the order in which flexible servers are chosen should depend on the mean total service rates of stations. As we intend to avoid introducing bottlenecks, consider the case where each station has one less dedicated server and then compare their total service rates. More specifically, start making servers flexible from the stations with the highest μ i (v i − 1) value. In case the values of μ i (v i − 1) are equal for more than one station, choose the one with the lowest μ i first. See Yarmand and Down (2013) for further details.
r Policy II. Start choosing servers from the station with the highest v i value. In case more than one station has the same value for v i , choose the one with the lowest μ i .
r Policy III. Start picking servers from the station with the highest μ i value regardless of the value of v i .
r Policy IV. Select servers randomly.
Policy I follows a selection order that intends to avoid introducing new bottlenecks (if all v i and μ i values are the same, introducing bottlenecks is inevitable). On the other hand, Policy II makes use of the server multiplicity effect (Yarmand and Down, 2013). This effect suggests that comparing two stations with equal total mean service times, the station with a higher number of servers leads to higher throughput. Policy II chooses the station that benefits most from this effect. Policy III is based on the fact that jobs could depart sooner from stations with higher service rates (when there is no blocking). Figure 5 compares these four policies over a configuration with W = (12,7,13,3,5,4,1,10,9) and an initial server allocation of (4, 5, 5, 2, 4, 2, 1, 4, 5). Figure 5 shows that the throughput improvement is dependent on the order that servers are chosen to become flexible. As the server allocation policy in the simulations, we employed Policy 4 (from Section 5) enhanced as follows: the policy prioritizes helping a bottleneck caused by a station with no dedicated servers.
We tried additional configurations to compare the performance of the proposed policies. Figure 6 illustrates the comparison for W = (9, 7, 10, 3, 5, 4, 1, 13, 12) with an initial server allocation of (11,7,12,3,5,4,1,15,14). Figure 7 presents the comparison for W = (1, 3,4,5,7,9,10,12,13) with an initial server allocation of (3,5,6,7,9,11,12,14,15). Figures 5, 6, and 7 show that Policy I performs better than Policies II and III. The figures also illustrate that the highest throughput improvements are gained when the first few servers are made flexible and that as F increases, the throughput improvement is diminished. Additionally, the performance of Policy IV in these figures shows that the ordering policy employed affects the throughput improvement values (and if not chosen well, results in lower improvements). The fact that in some cases Policy IV performs better than Policies II and III implies that Policies II and III, while intuitively reasonable, can be problematic to employ.

Conclusions
In this article we studied the allocation of flexible servers in zero-buffer tandem lines in the presence of dedicated servers. We focused on configurations with only one flexible server. The optimal policy for two stations performs a hand-off, clears blocking, and admits new jobs when there is no blocking. We used the PI algorithm to show that for small systems with equal workloads, the optimal policy clears blocking from end to beginning and avoids starvation. Simulation results illustrated that for longer lines with varying workloads the optimal policy tends to prioritize clearing blocking from stations with higher mean service times. When making dedicated servers flexible, we observed that significantly higher throughput improvements were gained when the first few servers were made flexible. The improvement was diminished as the number of flexible servers increased.
In this work we have considered non-collaborative homogeneous servers. A natural extension is to study the case where servers are heterogeneous (i.e., have different service rates at different stations) or can work collaboratively at a station. Another potential direction is to run the PI algorithm for F > 1 to identify optimal allocation policies. With F > 1, the allocation problem is more challenging, as the allocation of a flexible server is dependent on the assignment of other flexible servers. The problem is more interesting when long lines are considered for F > 1, as more complicated collaboration among flexible servers is required. Considering costs for making a server flexible and then minimizing the overall cost and maximizing the throughput, is also of interest. It may also be of interest to introduce hand-off costs or server switching costs.

Supplemental material
Supplemental data for this article can be accessed on the publisher's website at www.tandfonline.com/uiie.