Understanding and managing product line complexity: Applying sensitivity analysis to a large-scale MILP model to price and schedule new customer orders

This article analyzes a complex scheduling problem at a company that uses a continuous chemical production process. A detailed mixed-integer linear programming model is developed for scheduling the expansive product line, which can save the company an average of 1.5% of production capacity per production run. Furthermore, through sensitivity analysis of the model, key independent variables are identified, and regression equations are created that can estimate both the capacity usage and material waste generated by the product line complexity of a particular production run. These regression models can be used to estimate the complexity costs imposed on the system by any particular product or customer order. Such cost estimates can be used to properly price new customer orders and to most economically assign them to the production runs with the best fit. The proposed approach may be adapted for other long-production-run manufacturing companies that face uncertain demand and short customer lead times.


Introduction
Recent marketing pressures pushing for product proliferation have forced numerous manufacturing companies to reengineer their production processes toward more of a mass customization model. Managers have installed various flexible manufacturing machines and systems to ease the task of switching production from one product type to another within the same production line. Nevertheless, in many factories, product changeovers may still be costly and time-consuming. In addition, increasing product variety generally compounds the combinatorial production scheduling problem (Pinedo, 2009) and may produce system burdens such as higher capacity utilization or more waste. In this article, we investigate the notion of complexity costs (Kouvelis and Munson, 2004), loosely defined as increased costs due to the addition of a new variety of product into the product mix.
The new product development process is often extensive and may span most departments within an organization.
In many cases, however, production cost estimates of the new product may not take into account the impact of introducing the product on the cost and scheduling of other products using the same production line. Given the everchanging nature of many scheduling environments, such an impact may be very difficult to estimate. Nevertheless, managers need answers to questions such as: "Will my overall production costs increase significantly if I add a new product to my current product mix?" or "Which products should I drop from my mix because they place too much burden on the system?" In addition to the scheduling complexity stemming from new product introduction, many firms must incorporate unanticipated orders when they determine the lengths of production runs and production frequencies for product families. Therefore, managers may also need answers to questions such as: "What is the burden on my current system if I accept the order for this customized product?" "How should I price this unplanned new customer order?" or "How do I most economically assign customer orders to the production runs with the best fit?" Our research is based on a U.S. company that produces 1700 different products on a single production line. Because of limited production capacity, the proliferation of 0740-817X C 2015 "IIE" 308 Tian et al.
products poses serious burdens on this over-utilized production line. This research is also motivated by firms such as DuPont, which produces chemical products using a continuous chemical process, and Independent Can, which manufactures nearly 5000 different varieties of decoration cans. Each of these companies experiences some degree of product proliferation and struggles to meet customer needs with limited capacity. In our company's industry, customers often do not place orders early enough to incorporate into our firm's mid-term planning schedules because these customers strive for low inventory levels of parts or components. Among all orders received at our firm, 60% must be filled within 10 days. Unfilled orders often result in lost sales to our firm's competitors. Loss of customers not only reduces short-term profits but also jeopardizes the firm's long-term survival in this competitive industry. To retain its customers and remain profitable, the firm must alter its detailed production schedule by inserting some unanticipated orders and modifying the production quantity of other products, despite having initially determined lot sizing and production-run length 2 to 3 months prior.
Our research helps the firm identify complexity factors that managers can use to make strategic product portfolio decisions. Our regression models of complexity cost are also used in the product portfolio selection model. To make better order acceptance decisions, demand planners and salespeople can use our model to select orders and assign the selected ones to appropriate production runs. When the joint selection/allocation decision is not possible, the salespeople can instead use the regression models to select and price orders based on the estimated production cost for the orders in question. Previously, the company had been basing order-acceptance decisions only on aggregate mid-term capacity levels, which often imposed exceedingly high costs on the system for certain poorly matched orders. We provide a tool that now enables the firm to make betterinformed acceptance decisions that are based on estimated short-term production costs of each specific new order. The firm now knows which poorly matched orders to reject and which profitable orders to accept.
Clearly every manufacturing environment has its own unique circumstances, suggesting that no general complexity cost formula can be developed to apply to all cases. Although our research is based on a particular firm, our introduction of cutting patterns to the continuous-time formulation of detailed production schedules can be generalized to other continuous and semi-continuous processes. In such processes, capacity utilization and material waste are critical to production cost because sheets of material are cut into smaller sheets that are either packaged or further transformed into finished products. Furthermore, in this article, we illustrate an approach for complexity cost development of a specific chemical processing plant by applying a method related to Wagner's global sensitivity analysis technique (Wagner, 1995). The complexity cost development approach could be implemented in numerous other production environments where product proliferation increases production costs. Through the process of estimating complexity costs, the factors that most affect product line complexity are identified. Once these are known, managers can potentially better manage new customer orders and improve production scheduling. Our estimated complexity cost models suggest that consolidation of products may not always be the appropriate strategy for some firms because consolidation may actually increase the unit production cost in some cases. This is an invaluable insight for many struggling firms that attempt to reduce production cost by reducing product variety. This article has four major contributions. First, we describe how to formulate a production scheduling model for a continuous process through the use of sequenced cutting patterns. Second, we demonstrate how to apply a form of sensitivity analysis to a deterministic mixed-integer linear programming model to identify the key independent variables that drive the model results. Third, we show how to use regression techniques to estimate the complexity costs to the system of introducing a newly ordered product. Fourth, we explain how the firm can use the regression equations in real time to properly price new orders and potentially reject ones that would be too burdensome on the system. Although the specific complexity cost formulations in this article are company-specific, the techniques that we describe for selecting customer orders and assigning them to production runs can be used in many companies that have a large variety of made-to-order products that share production resources.
The rest of this article is organized as follows. We review some related literature in the next section. Section 3 describes the specific chemical processing line under study and the complexity of product lines. Section 4 presents the detailed formulation of the production scheduling problem for this company. Section 5 develops the complexity cost models. Section 6 illustrates how to use the regression models to price new customer orders, and it presents a method for assigning customer orders to production runs. We provide concluding remarks in Section 7.

Literature review
Marketing and operations often have conflicting goals: marketing wants to move toward a completely customized product line, whereas operations would prefer to exploit the efficiencies of producing only a few products. Companies that employ mass customization are able to successfully marry both goals, but mass customization may not be attainable in many industries given current technological constraints. In this section, we review works on product line selection and product complexity management, global sensitivity analysis of mixed-integer programs, production scheduling for continuous chemical processes, and customer order selection and rescheduling.

Product line selection and product complexity management
A few research papers provide approaches for identifying complexity drivers and estimating complexity costs. Based on interviews with six Fortune 500 firms, Closs et al. (2008) suggest that firms manage their product line complexity by coping with external complexity drivers and remedying internal organizational weakness and deficiencies in product lifecycle management. Chávez (2006) identifies the complexity drivers for Hewlett Packard's (HP) industry standard server. Cargille et al. (2005) analyze consumer personal computer product line complexity. These authors identify the complexity drivers by interviewing HP staff members and estimating the complexity cost via either empirical measures or inventory costs. Fisher and Ittner (1999) provide empirical evidence of losses due to product variety. Specifically, product variety adversely affects production planning, material handling, and assembly procedures, which in turn leads to higher overhead and a larger rework requirement. All of the above papers link product proliferation with complexity cost. At the same time, product proliferation may increase market share and brand quality. For example, via six experimental studies, Berger et al. (2007) find that larger numbers of product variants are associated with a perception of higher quality for a product line of food. We identify the product line complexity drivers for our company, and we estimate the complexity cost functions via sensitivity analysis of the detailed production scheduling model. Instead of reducing the number of products in a firm's product mix, we demonstrate how the firm can increase profits by optimally selecting customer orders and scheduling them with other products that are produced to stock.
Several papers have addressed various aspects of product line design. Nair et al. (1995) develop heuristics for solving the problem of product line design and selection. They construct products directly from attribute-level part-worth data. However, they do not consider manufacturing cost when selecting products for each line. Morgan et al. (2001) create product mix by selecting the most favorable products for each customer segment where they consider both marketing and manufacturing elements. Yano and Dobson (1998) introduce a product line design model that addresses the questions of which products to offer and how to price the products. The model considers the fixed and variable cost of technologies. Yunes et al. (2007) develop tools to reduce John Deere's complexity costs. In their model, a customer may migrate to an alternative configuration if his or her first choice is not available.
In the competitive global economy, firms need to carefully consider the appropriate trade-offs when determining product variety. One of the biggest challenges to effective variety management is a good understanding of the costs of variety (Otivson and Fry, 2007). The aforementioned product line complexity management papers use surveys to identify the complexity drivers; meanwhile, the product line design models do not directly link manufacturing cost with product line complexity. On the other hand, Cattani et al. (2010) propose a capacity allocation model where they introduce a "lost-focus premium" applied to standard items that are produced at a flexible facility instead of at a focused facility.

Global sensitivity analysis
For a situation such as that experienced by the chemical processing plant with which we have been working, complexity has a definite impact on production cost. In addition to interviewing plant employees to help us identify complexity drivers, we develop a systematic approach to determine which of those drivers is most important and to estimate their respective specific impact on complexity costs. We invoke an approach related to "global sensitivity analysis" (GSA; Wagner (1995)) in an attempt to identify the main factors that contribute to the cost of product line complexity for the chemical company. For the application in this article, we develop a large-scale mixed-integer linear program of a detailed production scheduling problem. We identify the factors that affect the production capacity and material waste through GSA of the mixed-integer linear program, and then we develop equations that estimate complexity costs based on the values of the key parameters, without the need to re-run the original detailed scheduling model. Our approach for identifying complexity factors differs from those previously described in the literature. Our method can identify drivers that may not be obvious to managers by examining all the possible ones and quantifying the associated complexity costs. Wagner (1995) introduces practical approaches for determining objective function sensitivity with respect to various parameter changes. When several parameters (i.e., the coefficients of objective function, the right-hand sides, or the coefficients in the data matrix) change, they do not necessarily affect the objective function value in an additive way. By running numerous replications of a deterministic model using different parameter values, we can analyze the effect of various combinations of parameter changes by expressing sensitivity in terms of R 2 , which is the fraction of variation in the optimal objective value attributable to the changes. A few other papers have applied GSA in other contexts, including Kouvelis and Munson (2004) and Kouvelis et al. (2013) on global network design. To our knowledge, this is the first attempt to apply GSA to determine complexity costs in a manufacturing environment. For the application of this method to continuous chemical processes, the sensitivity analysis data are generated through altering parameters in the schedule models. We introduce the regression analysis on panel data and the method of testing the fixed or random effects in the regression models to GSA.

Production scheduling for continuous chemical processes
Scheduling for continuous processes has received significant research attention, particularly in the chemical engineering literature. The typical problem is known to be NP-hard (Garey and Johnson, 1979). Most of the papers that we found attempt to schedule the problem in a quite general way based on a "resource-task network" (RTN) representation of the problem (Castro et al., 2004). In contrast, we use cutting patterns to represent the production slots for individual products in continuous time. Floudas and Lin (2004) and Kallrath (2005) provide excellent reviews of chemical process scheduling papers, which are typically formulated as either mixed-integer linear programs or mixed-integer nonlinear programs. Table 1 provides a representative sample of continuous process papers from the chemical engineering literature and presents some of the similarities and differences with our article. Due to the structure of the continuous production process that we analyze in this article, we can apply a less general model than those just described, which allows us to solve the scheduling problem for a relatively large number of products (100 or more products are scheduled in each production run). Our model schedules jobs on two parallel production lines, where lane widths change during the processing of the products in a campaign. Our process has no intermediate storage, and all of the stages work at the same pace. Our problem has two particularly defining features: (i) significant and costly sequence-dependent switchover times at the beginning of processing and (ii) two available lanes for scheduling jobs when the combined width of any pair of products allows. Since the manufacturer has limited capacity and uses expensive materials, our objective function minimizes the costs of capacity usage and wasted material (scrap).

Customer order selection and rescheduling
In the chemical process industry, demand is very uncertain and the order lead time is short. The firm determines whether to take orders with the lowest production costs or the highest sales prices (Kallrath, 2008). When firms have high utilization rates along with high product variety, they often accept customer orders via the workload policy or some other ad hoc method. Using a simulation based on industry data, Raaymakers et al. (2000) show that a regression model for accepting customer orders based on order characteristics outperforms the workload policy with respect to capacity utilization and the need for replanning. Kallrath (2005) presents a model for customer portfolio optimization. The model can select customers and ensure that a specified minimum percentage of demand from each selected customer must be satisfied. We select new customer orders and assign them to existing production runs to maximize the profit from those orders using our complex-ity cost functions. The need for rescheduling in response to unexpected changes is commonplace in modern flexible manufacturing systems (Hall and Potts, 2004;Roundy et al., 2005;Hall et al., 2007;Hoogeveen et al., 2012). Such rescheduling approaches either schedule the new jobs after the old jobs or insert them within the existing schedule, while we proactively allocate capacity in each campaign for customer orders.

The continuous chemical process production line
We analyze the production process at a U.S. firm that produces plastic films to be imbedded in auto windshields. Customers sandwich the films between two glass panels under heat and pressure to serve as a barrier to noise and ultraviolet light and to help prevent glass splinters from flying into and around the vehicle. The rectangular films emerge from the production process as either clear or colored gradient sheets with a colored band on one side (as seen at the top of many automotive windshields). Clear films have one of three adhesive levels, while the colored gradient bands come in a choice of three colors. In addition, the films have one of two thickness levels and have an embossed or non-embossed surface. The unique combination of adhesive level, colored gradient band, and surface treatment is called a product family. The production line currently produces 17 product families for sale. The sheets come out in two standard lengths, and each plastic sheet is wound around a core and packaged immediately after emerging from the production line.
In response to the extensive competition in this industry, the firm custom manufactures a large portion of its products. Within each product family, the products are distinguished by the following specifications: length (250 or 500 m), width (any width between 50 cm and 185 cm in one-centimeter increments), color band width (any width between 10 cm and 30 cm in 0.5-cm increments), unwinding direction with the color band on top (clockwise or counterclockwise), packaging type (one, two, or four rolls on a pallet), and customer-specific requirements. Each family typically includes between 10 and 200 products since some combinations of attributes are not used. The 17 families contain 1700 products altogether. All of the 1700 different products are produced on the same production line. The switchovers due to customization place a huge burden on this continuous production process.
The production line operates 24 hours per day and 7 days per week, only stopping for routine maintenance. As described in more detail later, the line continues to operate during a product switchover; thus, any material produced during the switchover represents waste. The system can produce seven to 11 rolls of 250-m sheets or 3.5 to 5.5 rolls of 500-m sheets per hour. The production line consists of seven main stages as shown in Fig. 1. The sheets move through every stage without stopping, and there is   no work-in-process inventory between any two stages. The length of the line is about 2500 m, and the line is housed in a four-story building.
If the combined width of two products is less than the effective width of the production line, the products can be produced side-by-side, one on Lane A and the other on Lane B as illustrated in Fig. 2. In this case, the color bands are produced on either shoulder of the line, while the clear portions are produced together in the middle and are separated at the cutting stage. If a product is wide enough such that no other product can be produced on the other side of the line, the line can only produce one product at a time. In this case, there will be significant capacity loss, and the material produced on the other side must be scrapped.
Production begins by mixing two major materials: resin and plasticizer. The raw materials are then blended into polyvinyl butyral (PVB) by two machines: machine 1 mixes and blends the materials for the clear section, while machine 2 simultaneously mixes and blends materials for the color bands. At stage 2, the clear and colored PVB compounds are forced through nozzles and formed into a sheet, the "master sheet," with color bands on either shoulder. Figure 2 displays the cross section of the process at this stage. One probe in Lane A and another in Lane B regulate the flows of the liquid color materials from stage 2 to produce color bands with the required width. At stage 3, the master sheet passes through several extrusion rolls and is formed into a sheet of the desired thickness. Stage 4 takes the sheet through a hot water bath and then a cool water bath. At stage 5, two side knives trim the edges of the cool sheet, while two middle knives cut the master sheet into two and remove the center strip (Fig. 3). Stage 6 applies treads with appropriate patterns to the surfaces of the sheets. The windshield manufacturers use these treads to remove air when they adhere two layers of glass to the sheet. At the last stage, the sheets are cut into the desired length and wound into rolls, which are immediately packaged and stored in a temperature-controlled warehouse.

Detailed production scheduling of a campaign
The continuous process is operated via a sequence of campaigns. Each campaign represents the production of 10 to 100 products that have an identical chemical formulation and gradient band color (i.e., they are from the same product family). The company incurs changeover costs between campaigns because the old formulation must be flushed out before the new one can begin. These changeover costs are relatively fixed because each product family must be produced periodically to meet customer demand. In addition, however, the company incurs what it labels as turnover costs within a campaign, as production switches from one product to another. A new product on a lane may have a different sheet width, color band width, and/or roll length from the previous product on that lane. The process keeps running during the switchover time. Thus, capacity is lost and materials are wasted as the machines adjust for the new product. Changes in sheet width along with small enough changes in color band width are accommodated by moving the knives. This adjustment takes less than 10 seconds and is essentially independent of distance. Larger changes in color band width, however, must be accommodated by moving the probes. This adjustment can be very expensive, as it takes 12-14 minutes to move the probes 4 cm. and 75-80 minutes to move 15.5 cm. For the probes, adjustment time is relatively proportional to distance. Finally, capacity and materials for a full lane can be lost if that lane is forced to run empty because the product on the other lane is too wide.
For a specific campaign, the scheduling problem becomes one of assigning jobs to two parallel processes to minimize the capacity usage and material waste due to turnover costs and empty lanes, subject to various restrictions on job assignment. The problem setting shares similarities with so-called p-batching problems with job incompatibilities known from machine scheduling (see, e.g., Uzsoy (1995)). (We can treat Lanes A and B as two parallel machines that are different.) This problem differs from the classic problem of scheduling jobs on parallel identical machines in the following ways. First, two products produced side-by-side must have combined widths less than the usable production line width, and they must have the same length and compatible packaging requirement since the two products come off the line simultaneously. Second, the sum of color band widths of the two products must lie within a certain range for quality control purposes. Third, a product may be produced in several places in the sequence. Fourth, some products can be produced in either lane or in both lanes, whereas other products must be produced on a specific lane due to special packaging requirements from customers. Fifth, one lane may be busy while the other is idle. In addition to these five differences, the capacity loss due to switchovers depends upon the sequence of the products. Based on these characteristics, we cannot solve this scheduling problem using models from the literature.
We ignore the due dates of individual products in formulating the problem because the firm schedules the campaigns so that the products for distribution center replenishment are completed before their due dates. We first assign lane-dedicated products to the required lanes and then assign the best match to the other lane after considering various restrictions.
To formulate the problem, we need to quantify the capacity loss due to probe movements (the less than 10-second knife movements can be ignored). The sheets produced while the probes move (a process lasting at least 12 minutes) do not meet the product specifications, so they must be threaded and recycled. Hence, the loss of production capacity is a step function of the size of the probe movement. We demonstrate how to approximate these step functions via piece-wise linear functions in the online supplement. Table 2 presents the notation used in this section.

Use of cutting patterns to represent production slots
We designate a way for assigning products on Lanes A and B as a cutting pattern. To create a production schedule, we select cutting patterns and then sequence them. The number of possible cutting patterns equals the squared number of products in the campaign, but we can eliminate infeasible patterns and duplicated patterns to reduce this amount. The set of feasible cutting patterns is generated as an input to the scheduling problem. In this way, we eliminate many width of the product to be produced on Lane A (B) in cutting pattern k (in cm) W effective width of the master sheet (in cm) P A k (P B k ) number of rolls of the product to be produced on decision variables and constraints that would otherwise be in the model, thus significantly reducing the problem size.

Methods for creating feasible cutting patterns
To define the cutting patterns, we first introduce the following additional notation: CB max = maximum sum of gradient band widths of products on both lanes (cm); CB min = minimum sum of gradient band widths of products on both lanes (cm); TM = maximum width of clear strip removed in the center of the master sheet (cm); E = minimum distance between the left (right) edge and the left (right) side knife (cm); L i = roll length of product i (m); B i = number of rolls of product i packed on a pallet, B i ∈ {1, 2, 4}; and CB i = gradient band width of product i (cm).
We next introduce the restrictions on lane assignments.
1. The lengths of the products on both lanes must be the same. 2. The packaging type of the product produced on Lane A must be compatible with that of the product produced on Lane B since the rolls coming off the production line are packaged immediately. 3. The combined width of the products produced on Lanes A and B must be smaller than the usable width W. 4. The combined color band width of products produced on Lanes A and B must lie within [CB min , CB max ].
Let S j be the set of products that can be produced on Lane A if product j is assigned on Lane B: If product j is produced on Lane B, the product produced on Lane A must be from set S j .
We place the assignment vectors k , k = 1, . . . , N p , into a matrix called assignment matrix . We arrange the assignment matrix first by the indices of the product assigned on Lane B and then by the indices of the product assigned on Lane A. That is, has (N+1) blocks, where the jth block represents the cutting patterns that assign product j on Lane B, j = 1, . . . ,N and the last block represents the cutting patterns, where only one product is produced and is assigned on Lane A. Each block except the last one has at least one column, but at most S j for j = 1, . . . , N. If S j is empty, block j has one column, whose jth element equals one with the other elements equal to zero. If S j = {j}, block j also has one column, whose jth element equals two with the other elements equal to zero. If S j has multiple elements, block j has S j elements, whose jth element equals one, ith element equals one, where i ∈ S j , and all other elements equal zero. In each block, the columns are arranged in ascending order of the indices of the product assigned on Lane A. We sequentially number the cutting patterns by columns in the assignment matrix.
When a product must be produced alone and it has no lane assignment restriction, we create two assignment vectors: one for the assignment of the product on Lane A and the other for the assignment of the product on Lane B. Since these two vectors are identical, we put the vector representing the cutting pattern that assigns the product on Lane A in the last block of and arrange them by the product indices. The number of cutting patterns N p is much smaller than N 2 (N p is less than 0.5N 2 on average).

Slots
We segment the schedule into slots of varying length. One or two products of 250 or 500 m in length are produced in a slot. The maximum number of possible slots equals the number of products N produced during the campaign. If any two products are produced simultaneously, the actual number of slots will be less than the maximum.
We can position the probes (and knives) for the production of products in the first slot while the production line flushes the previous formulation. We position the probes (and knives) on Lanes A and B for the production of products that have the widest gradient bands or the narrowest bands. Hence, the initial values of p 0 and q 0 are given.
To represent a cutting pattern, we use a configuration vector and an assignment vector k . The first vector represents the attributes of products in a cutting pattern and is given by where i k and j k represent products i and j that are assigned on Lanes A and B, respectively. The second vector describes the set of products that are produced in cutting pattern k and is given by k = (ϕ 1k , ϕ 2k , . . . , ϕ Nk ), where ϕ ik = 0, 1, or 2 means that product i is not produced, is produced on one lane, or is produced on both lanes in pattern k, respectively. If no product is assigned on Lane A in cutting pattern k, we set
In the objective function, the first term is the capacity usage due to probe movements, the second term is the capacity usage for producing products, the third term is the material waste due to both idleness on one lane and trims on the sides and the center, and the last term is the material waste due to over-production. We convert the capacity usage to value of capacity by multiplying the capacity usage by α, and we convert the material waste to value of materials by multiplying the waste by μ. The parameter α is expressed in dollars per meter, and μ is expressed in dollars per square meter. Since the production line runs at a constant speed, α equals the speed multiplied by the cost of running the machine for a unit of time, excluding the material cost. Since the total material costs for R i rolls of product i, i = 1, . . . , N, are constant, we do not include those costs in the objective function.
Constraint (2) defines the over-production of product i in number of rolls. (Using a cutting pattern, the manufacturer may produce one roll on each lane, two rolls on each lane, or four rolls on each lane. Depending on how the products are assigned to Lanes A and B, the quantities produced for product i in a campaign may exceed R i .) Constraint (3) defines the loss of materials (in m 2 ). (There are two types of loss. First, if only one lane is used, the other lane will not produce any product, and its materials will be wasted. Second, if the combined width of products on both lanes is smaller than the effective width of the production line, materials must be trimmed.) Constraint (4) links the integer variable x k with the binary variable z k . Constraint (5) enforces that cutting pattern k can be used in at most one slot. Constraint (6) enforces that at most one cutting pattern is used in slot t. Constraint (7) links the production capacity loss with the probe movement size on Lane A in slot t. Constraint (8) links the production capacity with the probe movement size on Lane B in slot t. We derive Constraints (7) to (14) in the online supplement.
Although this formulation is specific to the company we worked with, the concept of using cutting patterns as an input to a scheduling model is applicable to certain other processes. By removing the firm-specific constraints (9), (10), (13), (14), (15), and (16), Problem (DS) becomes a general model that can be adapted to many other continuous process production lines that produce products on two parallel lanes using either plastic or metal sheets. For example, our model is being adapted to the metal decorating industry. These firms print pictures onto large metal sheets through lithography and then cut the master sheets into small sheets for decorating cans and boxes. These processes face similar capacity usage and material waste considerations as our plastic film company experiences.
We generate a detailed production schedule by solving Problem (DS) via the branch-and-cut method using ILOG Cplex 11.2. The scheduling problems can become quite large. For example, for a campaign with 74 products, we solved an instance of (DS) with 8875 constraints, 132,883 variables, and 1,040,358 non-zero coefficients. A campaign with just 34 products has 20,000 variables and 2500 constraints. We solved instances of (DS) using a computer that runs Windows 7 and is equipped with 4.00 GB installed memory and CPU at 3.4 GHz. For small scheduling problems, the software took a few minutes to obtain the optimal solution. For large scheduling problems, the software generally took between 2 hours and 2 days to obtain a feasible and near-optimal solution, usually terminating when producing a solution within tolerance, reaching a maximum number of iterations, or running out of computer memory. We also developed heuristics to solve Problem (DS). Although our heuristics find a solution much more quickly than the branch-and-cut method, their solutions generally result in a prescribed use of more capacity and material than does the branch-and-cut method. To most accurately determine appropriate complexity factors and estimate complexity cost models, we need an optimal solution; hence, we solve model (DS) using the branch-and-cut method.

316
Tian et al. The schedules from model (DS) are implementable at the firm. Using those schedules, the firm can save an average of 1.5% of production capacity compared with the scheduling heuristic-based current practice. The capacity savings can be used to generate more revenue at this highly capacityconstrained factory.

Identification of primary complexity factors and estimation of complexity cost models
The product variety in a line influences the production cost of the products in a campaign. Any individual product's contribution to capacity usage and material waste of a campaign is clearly a complex interaction among its characteristics and those of all of the other products in the campaign. To identify the major complexity factors, we want to find which among many possible factors seem to drive complexity cost the most. Problem (DS) is a largescale mixed-integer linear program. The product attributes appear as coefficients in (DS) (in the objective function, in the right-hand sides, or in the constraint data matrix).
In this section, we seek to identify complexity factors via GSA of (DS). Once identified, these factors can be used in regression equations to estimate the capacity usage and material waste of a campaign, without the necessity of populating and solving the large-scale mathematical program for managing product line complexity. Before proceeding, we explain why GSA is used to identify the complexity factors and outline the main steps of GSA. As pointed out in Section 4, the current heuristic schedules used at the firm are not optimal. Therefore, the actual total production cost for each real campaign includes unnecessary elements arising from the non-optimal solution to the scheduling problem. Thus, it would be inappropriate to use actual scheduling data collected from the firm to identify the true complexity costs. Instead, to accurately identify the product line complexity factors, we conduct GSA of Problem (DS) in the following steps. First, we approximate the objective function by two regression models, capacity usage and material waste, since the objective function can be separated into the above two components. Second, we estimate the contribution to the objective function variance for each possible complexity factor. Lastly, we identify the most important complexity factors by ranking their relative contributions to the objective variance, and we estimate their respective costs.

Factors affecting product line complexity
We met with supply chain managers, production schedulers, logistics engineers, and production line managers at the company to jointly identify the factors that might contribute to product line complexity for their system. We focus on the product line complexity cost in a campaign due to turnovers (through which the production line repositions the probes), single-cuts (where only one lane is used), and trim on either side and in the center part of the master sheet.
Individual complexity cost factors for product i include its length L i , width W i , color band width CB i , packaging type B i , and order quantity R i . Those factors appear in the coefficients of the objective function of (DS), the righthand sides, and the data matrix. They may jointly affect the objective function value; hence, we include some interaction terms of those factors in the regression models. Note that the firm uses W i × L i × R i units of material and L i × R i units of production capacity to produce a product, plus additional units due to single-cuts and/or trims. The necessity of single-cuts is determined by the combination of L i , W i , CB i , and B i of all the products in the campaign. We thus include W i L i R i and L i R i in the capacity usage regression equation. Furthermore, as packaging type impacts scheduling and complexity costs, we introduce two indicator variables for that: I B2 i equals one for packaging type B2 (two rolls per pallet) and zero otherwise, while I B4 i equals one for packaging type B4 (four rolls per pallet) and zero otherwise.  We use the following regression model to identify the main factors of capacity usage for product i (using the normally distributed error term ε i ): We use the following model to identify the main factors of material waste for product i (using the normally distributed error term δ i ): The intercept terms in Equations (18) and (19) incorporate turnover and other campaign-level factors that are dependent upon the characteristics of the other products in the campaign. For instance, the costs of single-cuts and trims are included in the intercepts β i 0 and γ i 0 , which we will elaborate below.
The machine runs at the same speed for products with the same combination of chemical formulation and gradient band color. Hence, we (along with the engineers and analysts at the firm) expect that products in the same campaign should have the same coefficients in Equations (18) and (19). Therefore, we create campaign-level regression equations by adding models (18) and (19) across all products in a campaign and replacing the intercept terms with certain campaign-level factors.
To create campaign-level regression models, we introduce additional campaign-level factors that incorporate interactions among our individual product complexity cost factors. To help estimate turnover level, we introduce our first independent variable as the standard deviation of the product color band widths in campaign f : (N−1), where CB f is the mean of the CB i in the campaign. We also use IS i as an independent indicator variable, which equals one if the product is produced in a single cut and zero otherwise. Finally, to create certain interaction terms for the material waste model, we define s i as the total usable line width less the product width (for single cuts) and d i as the total usable line width less twice the product width (for double cuts).
In addition to SD f , we define 15 other campaign-level independent variables: Fourteen of the independent variables apply to our capacity usage model for campaign f : where ε f is the normally distributed error term. In this equation, SD f , SL f , and Dev f represent product interactions.
Eight of the independent variables apply to our material waste model for campaign f : where δ f is the normally distributed error term. In this equation, TRIM s f and TRIM d f represent product interactions. Equations (20) and (21) represent the capacity usage and material waste in campaign f . We obtain the capacity usage and material waste for each campaign from the solution of (DS). Hence, αC f + μM f represents the objective value of (DS). Due to product line complexity, there are variations of the objective values of (DS) among campaigns of the same product family. We want to identify the variables in Equation (20) that explain most of the variance of capacity usage, while we want to identify the variables in Equation (21) that explain most of the variance of material waste. Using the techniques introduced in Section 5.3.2, we identify the main contributors of capacity usage variance and material waste variance. Combining those identified factors, we find the variables that explain most of the variance of the objective function value of (DS).
Our capacity usage and material waste models are linear functions of the independent variables. Therefore, the coefficients can be interpreted as the marginal capacity usage and marginal material waste for the corresponding independent variables in the two models. We believe that linear models work well because capacity usage due to product size, turnovers, and single-cuts is generally additive; while material waste due to trim areas, turnovers, and factors included in the CA terms is also generally additive. Nevertheless, our linear models may not capture certain interactions that may exist between some factors that influence the product line complexity cost. Recognizing this, when we estimated Equations (20) and (21) (Section 5.3), we also included certain interaction terms in our regression models. Our estimations showed that the coefficients for the interaction terms were not significantly different from zero. (20) and (21) is a random variable with support being a closed interval of real numbers, as the products made in each campaign are often different. We need to choose all of the possible values for those variables from their respective supports. If we had randomly drawn values of independent variables, we would have created certain products that the firm's customers never want. Hence, we design based on the following observations.

Each of the independent variables in Equations
The firm runs the campaign of a product family every month, every 2 months, or over an even longer interval. The current campaign may differ from the previous campaign for the same family in two ways: (i) some products are removed while others are introduced to the campaign and (ii) order quantities change. We observe that approximately 50% of products in a given campaign were not produced in the previous campaign.
To generate data sets for conducting GSA of (DS), we collected 28 campaigns of various families that were produced over a period of 1 year and then created additional simulated campaigns by either changing the requirements for products or removing products from those 28 campaigns, based in part by observing how the campaigns for the same family are related to each other. We first removed approximately 20% of the products from each campaign to create a base campaign and then added back those products, one at a time, to create multiple runs for each campaign. We also randomly removed products from each campaign to create many runs of that campaign.
The combined data set is panel data where the cross sections are campaigns and each section contains multiple observations that are the data from multiple simulated runs of the same campaign. Since different campaigns have different numbers of products, the number of observations in one section is different from that in another section.

Model estimation and identification of primary complexity factors
We generated 225 simulated campaigns from the 28 base campaigns to estimate Equations (20) and (21). Section 5.3.1 describes how we estimated our regression models using techniques for analyzing panel data. Section 5.3.2 then explains how we selected our final set of primary factors to describe product line complexity for this company. Section 5.3.3 presents model validation data, and Section 5.3.4 describes how we used GSA to determine the most influential factors among the final set of primary ones. When the production line switches from one campaign to another, the chemicals for the previous campaign are flushed out of the system. The switching time depends upon the sequence of the gradient band colors. Thus, we (and the engineers and analysts from the company) anticipate that the coefficients for Equations (20) and (21) are the same for families with the same gradient band colors. Thus, we estimate the regression models by the gradient band colors of products in the campaigns: blue, green, and gray, which the firm produces.

Estimation techniques for the complexity cost models
Let m index the runs derived from campaign f . The error term in the capacity usage model (20) can be written as ε f m = μ f + ν f m . If μ f is constant over all observations of campaign f and specific to campaign f , we rewrite Equation (20) as follows: fm + β 8 CL B4 fm + β 9 CA B2 fm + β 10 CA B4 fm + β 11 CA fm + β 12 SD fm + β 13 SL fm + β 14 Dev fm + ν fm . (22) The above is an individual-effect model, a classic regression model. We called this model a fixed-effect model. Alternatively, if μ f is a group-specific disturbance, we write Equation (20) as The component μ f characterizes the f th observation and is constant across all runs derived from campaign f . Equation (23) is called a random-effect model. Taking the deviations from the group means, we can eliminate μ f . from Equation (23): where CB f ., CL f ., f. , CA B4 f. , CA f ., SD f ., SL f ., Dev f · , andν f · are averaged over observations in each campaign.
Since the variances of disturbances in Equation (24) are estimated, we use the feasible generalized least squares procedure to estimate the model. We use Hausman's test for fixed or random effects to select the fixed-effect model versus the random-effect model. In the random-effect model, we assume that there is no correlation between μ f and ν jm for all f , m, and j; between ν fm and ν js for all f = j or m = s; and between μ f and μ j for f = j. Under the hypothesis of no correlation, both ordinary least square (OLS) estimators in the fixed-effect model and generalized least square (GLS) estimators in the random-effect model are consistent but OLS is inefficient, whereas, under the alternative, OLS estimators are consistent but GLS estimators are not. Hausman's test is based on the difference between the two estimations. In addition, the covariance of the efficient estimator with its difference from an inefficient estimator equals zero by Hausman's result. Hence, the test statistic Wald (Wald criterion) follows a chi-squared distribution. In summary, the Hausman test checks a more efficient model against a less efficient but consistent model to make sure that the more efficient model also gives consistent results. One should run the random-effect model if the Hausman test favors the random-effect model.  Table 3 displays the Hausman test statistics, which for all three capacity models are smaller than the critical value for the chi-squared distribution with four degrees of freedom (9.49), i.e., Pr(Wald ≥ 9.49) = 0.95. The tests suggest that the error terms are uncorrelated with other variables in the regression model (23). Therefore, we selected the randomeffect capacity model for all three color bands.
For the material waste model, the Hausman test statistics are also smaller than their respective critical values (7.81 for three degrees of freedom) for the blue and gray material. Therefore, we selected the random-effect material waste model for those two color bands. For the green band, the Hausman test suggests that the error terms are correlated with other variables in the random-effect model. Meanwhile, the F-value of testing for no fixed effects and no intercept is F(N F -1, N F T -N F -K) = F(10, 73) = 9.46, while the 95th percentile of the F distribution is 1.966. Thus, we reject the hypothesis that there is no fixed effect and no intercept. However, the coefficient of variation of prediction errors for the fixed-effect model is 42%. We ultimately chose the random-effect model for the green bands because its coefficient of variation of prediction errors is only 21.8%.

Primary complexity factor identification
Although the resulting models generated high R 2 values, multicollinearity existed among the independent variables, suggesting unstable coefficient estimations. To remedy this problem, we eliminated some variables from the models by first invoking the forward and backward elimination methods. Next we applied the method proposed by Farrar and Glauber (1967), which is widely used to treat multicollinearity problems. The method uses two rules to eliminate independent variables under conditions of multicollinearity. First, the correlations between independent variables should be smaller than 0.8 or 0.9. Second, an independent variable should be removed if its multiple correlations with other members of the independent variable set are greater than the dependent variable's multiple correlation with the entire set. As an example, for the blue gradient band model, since the correlation between any pair of B2 f , CA B2 f , and CL B2 f was above 0.99, we only included one of them. Table 4 presents the coefficient estimates and R 2 values for the three capacity usage models, while Table 5 presents the respective data for the three material waste models. The last two columns of each table display data from the GSA, which is discussed in Section 5.3.4. We used 107, 87, and 27 records to estimate the coefficients of the capacity usage and material waste models for products with blue, green, and gray gradient bands, respectively. The ratios of records to independent variables of the capacity model are 27, 22, and eight for the products with blue, green, and gray gradient bands, respectively, whereas the respective ratios for the

Model validation
To validate the three capacity usage models, we compared the predictions by each model with the capacity usage, which we determined by solving Problem (DS) for the holdout data of campaigns. The scatterplots of the capacity usage obtained from (DS) versus the prediction are shown in Figs. 4(a) to 4(c). Observing the plots, we see that the points lie near the 45 • line for all three product types. This indicates that the estimated capacity models predict capacity usage well for each product type. We define the coefficient of variation as the ratio of the standard deviation of fit errors to the sample mean of the capacity usage. This statistic measures the relative error of capacity usage prediction. The coefficient of variation of the fit errors is 15, 9.4, and 3.1%, for products with blue, green, and gray bands, respectively. Figures 4(d) to 4(f) show the corresponding scatterplots for the validation of the three material waste models. Observing the plots, we see that the points again lie near the 45 • line for all three product types. This indicates that the estimated material waste models predict material waste well for each product type. The coefficient of variation of the prediction errors is 31, 21.7, and 12.3% for the products with blue, green, and gray bands, respectively.

Global sensitivity analysis
For each model we can invoke a form of GSA to attempt to determine which factors seem to have the most influence on product line complexity. We used the one-at-a-time method described in Wagner (1995) to perform the analysis. Specifically, for each model we fixed certain factors one at a time while re-estimating the model with the remaining factors. The resulting R 2 value with one factor fixed can be compared with the R 2 value of the full model. The larger the difference, the more impact the fixed factor appears to have on the model's dependent variable.
For the capacity usage models, the CL variable represents baseline demand. This variable certainly affects capacity usage strongly and directly, but we do not consider it to impact product line complexity per se, so we do not perform GSA on that variable. We treat the CA variable similarly for the material waste models. Table 4 contains the GSA information for the product line complexity variables of each capacity usage model. The ranking simply indicates the relative importance of each variable for impacting product line complexity. Notice that an individual variable's relative and absolute importance may change among product types. For example, SL is most important for the products with blue and gray gradient bands but least important for the products with green gradient bands. In an absolute sense, the difference of the R 2 values between the full model and the one with SL fixed is rather small for the products with blue and green bands, indicating that SL would not be of major importance in affecting capacity usage. On the other hand, for the gray bands, SL appears to have a more important influence and might be considered to be an important product line complexity factor for management to monitor. Table 5 contains the respective GSA information for the product line complexity variables of each material waste model. As compared with the capacity usage models, we observe generally larger impacts on the dependent variable from the product line complexity variables in these models.
Managers can use GSA information to better allocate products to campaigns. For example, the importance of the TRIM s variable in the material waste model for gray products indicates that single cuts often drive up costs for these products. Upon further inspection, most gray band campaigns contain fewer than 10 products, making it difficult to produce enough products in parallel. Management might consider either increasing the number of products in gray band campaigns or attempting to ensure that most of the products that are assigned to a particular campaign can be matched up. Similarly, the significance of the TRIM d variable in the model for green products indicates that the trims drive up costs for these products.

Management of product line complexity
Prior to our involvement with the company, salespeople accepted customer orders without considering the cost that those orders might impose on the production line, while planners and schedulers used an ad hoc method to allocate customer orders to campaigns. However, for any product, both the capacity usage and material waste that it imposes on the system are partially dependent upon the composition of the rest of the products in the campaign. Hence, the same product may incur a different production cost when produced in one campaign compared with another. Taking advantage of this, the manufacturer can appropriately assign products into different campaigns to minimize the total production cost of each family in a cycle. Alternatively, if the customer needs the product urgently, we can estimate the cost for producing this customer order in the first available campaign and charge the customer an appropriate premium. In the rest of this section, we first present a cost model for incorporating a new customer order into an existing campaign, and then we propose a model for allocating customer orders among a choice of campaigns.

Production cost of a customer order included in a specific campaign
An important use of the regression models from Section 5.3 would be to price customer orders. Those estimates of capacity usage and material waste can be multiplied by their respective cost coefficients to estimate the complexity costs imposed on the system of inserting an order into a campaign. These costs can be used to price a new order in real-time, without the need to re-generate the detailed schedule for an existing campaign to include the new order.
The capacity usage and material waste of a new order can be estimated by computing the difference of those values for the existing campaign with and without the customer order. Let the product ordered by a customer be product j = N + 1 in the campaign, where N is the number of products currently scheduled in the campaign, not including the new order. Inclusion of this product may alter the campaign-level factors based on the other products scheduled in the current campaign. Let IS i and CB represent IS i and CB, respectively, after including this customer order in the campaign. Using the coefficients from Table 4, we estimate the capacity usage expressed in meters of plastic film for the new customer order via the following equation: Similarly, using the coefficients from Table 5, we estimate the material waste expressed in squared meters of plastic film for the new customer order through the follow-ing equation: In addition to the cost of capacity usage and material waste, we introduce other costs associated with a customer order. While all orders incur a fixed setup cost, any order smaller than a minimum quantity threshold incurs an additional setup charge η. Let I min j be an indicator variable equal to one if the order quantity of product j is less than the minimum quantity threshold, and zero otherwise. Furthermore, 250-m products incur an additional labor cost of c L to produce as compared to 500-m ones. Finally, pallet packing costs differ depending on the pallet type. Let c B j be the cost of using a type B j pallet.
Combining the aforementioned costs, we have the final estimated cost of inserting a new customer order j into an existing campaign: A regression equation approach such as ours provides significant real-time benefits for salespeople (some of whom may not be well-trained in operations research techniques) who are talking live on the phone to customers. Salespeople could relatively easily and quickly plug in the few terms needed for our Equation (25) and obtain an instantaneous result.

Assignment of customer orders to campaigns
Each campaign includes some products designated for distribution center (DC) replenishment and other products based on specific customer orders. The firm pre-assigns products for DC replenishment to campaigns because it can predict those requirements with high accuracy. At the beginning of each month, the manufacturer then assigns specific customer orders to different campaigns for each product family.
We provide several examples in Table 6 showing how production cost can vary from one campaign to another. (Actual proprietary cost data are concealed.) Each row shows the marginal capacity usage and material waste of a new customer order in a campaign. In cases 1 and 2, for example, a customer orders 12 rolls of 250 m × 100 cm sheets with a 26-cm blue gradient band. The manufacturer uses 244.6 m of capacity per roll by producing this product in Campaign 1, whereas it uses 265.2 m of capacity in Campaign 2. In both cases, it wastes 14.3 m 2 of material per roll by producing the product. Presumably, the Table 6. Marginal capacity usage and material waste for sample cases been reduced. Furthermore, because some non-profitable customer orders have been eliminated and urgent customer orders are now charged a higher price, the manufacturer's profit has risen.
We now provide an estimation of cost savings and revenue increase. Our models can save the company on average 1.5% of production capacity by first optimally scheduling the DC replenishment products and then assigning and inserting the customer orders in campaigns. The company can save on average 1.5% of product cost, which in turn helps the firm retain and gain market share. Due to a large number of customer orders, the schedulers frequently revise schedules. (They revise the schedules two or more times each week.) In addition to scheduling, they need to communicate with the salespeople about the inability to schedule certain orders. By using our models, they can select customer orders and assign them in a very short time. Most importantly, they do not need to go back and forth with the salespeople to eliminate those non-schedulable customer orders. The reduction of such "fire-fighting" type of scheduling work results in the reduced need for one full scheduler. In addition to cost reduction, our models identify more profitable customer orders and provide support for pricing customer orders. This translates into an estimated 0.5% revenue increase.
To maintain profitability in the ever-intensive competitive environment, firms can conduct product line complexity analysis on an annual basis to reflect new product customization trends. The company described herein re-estimates the capacity usage model (25) and material waste model (26) every year using the method proposed in Section 5.

Conclusions
We had the opportunity to work with a firm employing a unique continuous-flow production process involving highly complex product scheduling. Even with proper scheduling, the product mix of any particular production run (campaign) can generate significant complexity costs in the form of capacity usage and material waste. We identified the major factors that affect the product line complexity for this firm through GSA of our large mixed-integer linear programming scheduling model. For this continuous chemical processes, the production cost of any particular product may vary from one campaign to another. Our regression models can be used by management to properly price new customer orders in this environment of constantly changing costs and product mix.
Although the specific results identified in this article would not be applicable to other firms, the method for identifying factors for complexity cost could very well be applied to many other processes. A deterministic math program is run many times using different parameter values to create a data set of dependent variables (outcomes from the program) and independent variables (which can be any coefficients of the program or the variables from which those coefficients are derived). If the data sets are panel data, the methods described in Section 5 are appropriate for estimating the regression models and performing GSA. Also, the customer order selection model from Section 6 could apply to other manufacturers that estimate models for capacity usage and material waste.
For our company, the regression models allow for much easier estimation of complexity costs than would use of the original scheduling model. From a planning and tactical standpoint, these cost estimates can greatly support proper pricing of new customer orders in this highly customized environment. From a strategic standpoint, since the number of products in each product family has increased in recent years, the firm has been experiencing rising production costs. However, consolidation of stock-keeping units may not be an appropriate strategy to deal with product line complexity because, in this production line, products are assigned on the two lanes and sequenced simultaneously. By consolidating products, the unit production cost can actually increase as seen with the gray products described in Section 5. The firm should choose the right product mix and reset the prices for those non-profitable products using the estimated models.

Supplemental Material
Supplemental data for this article can be accessed on the publisher's website.

Biographies
Zhili Tian has extensive work experience in information technology, transportation engineering, and supply chain management. He received his Ph.D. from Washington University in St. Louis. He currently is an Assistant Professor of Decision Sciences and Information Systems at Florida International University. His research has been published in IIE Transactions. He teaches courses on healthcare operations management, quality assessment and health outcome in healthcare, operations management, and supply chain management. His current research focuses on product and manufacturing process development in the pharmaceutical industry, healthcare supply chain management, and quality control and management in healthcare. His research interests include pharmaceutical and medical device new product development, pharmacoeconomics, pharmaceutical supply chain management, healthcare operations management and supply chain management, pharmaceutical production capacity investment, causes and mitigation of generic drug shortages, medical decision making, healthcare service operations management, and healthcare quality management and control.