Optimal joint deployment of flow and pressure sensors for leak identification in water distribution networks

ABSTRACT A multi-objective optimization methodology is proposed herein for accurate identification of leakage in water distribution networks (WDNs) using pressure and flow sensors. We first model leakage at potential nodes using the EPANET software, and then divide WDN into near-homogenous zones using k-means clustering algorithm based on geographic distribution of nodes. Finally, flow and pressure sensors locations are optimized using the NSGA-II algorithm to identify the leakage zone accurately. Novelty of the proposed approach lies in sequential optimization of flow and pressure sensors placement, which helps improve the accuracy of leakage zone identification in WDNs. The objective functions of this study are: 1) maximizing accuracy of identified leakage zone and 2) minimizing number of sensors (and hence operational costs). Simulation results of the Mesopolis WDN corroborate the efficiency and effectiveness of the proposed approach.


Introduction
The quantity and quality of water delivered to the consumers are directly related to the physical conditions of the water distribution network (WDN), pipeline systems, and maintenance and operation of the network (Brandt et al. 2017). The main cause of water loss in WDNs is leakage, usually due to excessive high loads, high internal pressure, corrosion or a combination of these factors (Sharma 2008;Puust et al. 2010). The quantity of water loss depends on the network complexity, water pressure, soil characteristics, and elapsed time to detection and repair of leaks (Sharma 2008). It is estimated that US $184 billion is spent annually to provide clean water around the world, of which US $9.6 billion is lost to water leakage (Moser, Paal, and Smith 2015). In the US alone, cost of leakage is more than US $2.8 billion annually (Gong et al. 2016). Utilities also face potential law suits due to distributing unsafe drinking water induced by cracks in the WDN. Negative pressure in the system due to leakage prompts backflow of contaminated water into the network and imposes public health risks (Mambretti and Orsi 2012;Gong et al. 2016). Moreover, water treatment and conveyance costs, as well as maintenance and control of the network are significant. Leak detection is therefore important to warrant a sustainable management of a water network.
Several approaches have been proposed to directly or indirectly detect leakage in WDNs. These methods use passive and/or active leakage detection strategies. The active (transient) methods include physical inspection of pipes, leak-noise correlation, transient monitoring, pig-mounted acoustic sensing, ground penetrating radar, and tracer gas (Moser, Paal, and Smith 2015). Wang and Ghidaoui (2018) applied a matched-field processing (MFP) approach for estimation of the leak location and the leak size. These approaches determine precise leakage location, but are not common for large networks for which operation costs are prohibitive (Sanz et al. 2015). Critical leak detection factors include environmental conditions, soil characteristics changes, soil moisture, groundwater level, water pressure, and physical properties of pipes (plastic or metal) (Sharma 2008). Passive (non-transient) methods, on the other hand, depend on secondary evidence of leakage such as divergence between actual and estimated demand levels (Sharma 2008;Sanz et al. 2015). The advantage of the latter lies in relatively low cost of numerical simulation of the system (Puust et al. 2010), which usually involves placement of flow or pressure sensors in the WDN (Asgari and Maghrebi 2016;Raei et al. 2018) and draw inferences about system failure using inverse transient approaches (Covas and Ramos 2010;Capponi et al. 2017), artificial neural network and wavelet analysis (Romano, Kapelan, and Savić 2010), coupled optimization and numerical simulation models (Nasirian, Maghrebi, and Yazdani 2013;Cugueró-Escofet et al. 2015;Sanz et al. 2015), error-domain model falsification (Moser, Paal, and Smith 2017), linear programming and mixed integer linear programming (Berglund, Areti, and Mahinthakumar 2017), and information theory (Khorshidi, Nikoo, and Sadegh 2018), among others.
Most leak detection studies use the passive methods given their lower operational costs, and compare deviance between observed and estimated flow and/or pressure levels in the WDN. In the present study, the efficacy of optimizing location and number of flow and pressure sensors to detect and identify leakages in a complex WDN is explored. The WDN is divided into homogenous zones using k-means clustering algorithm, in the perimeter of which the number and location of flow and pressure sensors are sequentially optimized using the NSGA-II multi-objective optimization algorithm. Flow and pressure sensors location are optimized in two stages based on the sensitivity of the network to each type and the efficiency of each sensor. In this research, both flow and pressure sensors are used, as opposed to the literature that usually employ only one. The proposed methodology contributes to improvement of leak identification in WDNs, and is applicable to large and complex WDNs.
This paper studies the water distribution network of Mesopolis, a widely-used virtual WDN, to investigate the performance and accuracy of the proposed methodology for different number of zones (four to 40). Our results show that the maximum percentage of identified leakage zones is more than 97% for five zones. By increasing the number of zones, number of sensors and identification time will increase; but percentage of the correctly identified leakage zones will not necessarily decrease. In fact, rate of accurate identification of leakage zone is not directly proportional to the number of zones, but more correlated with locations of nodes, pumps and other equipment, as well as demand of the nodes, and the WDN characteristics.

Methodology
The proposed framework consists of four main parts. In the first part (steps 1 to 4 in box of part 1 of Figure 1), potential leakage scenarios for different nodes are modeled using the EPANET model. Here, leakage from only a single node is taken into account in each scenario. The simulated flow and pressure values in the control state of the network without any leakage (hereafter control state) are then compared with the corresponding values in each leakage scenario. Divergence between each scenario and the control state indicates leakage. In the second part (steps 5 to 9 of Figure 1), the WDN is divided into homogenous zones using k-means clustering (Supplementary Information, SI) algorithm and appropriate locations for potential flow and pressure sensors in each zone are identified. In the third part (steps 10 to 13 of Figure 1), potential flow sensors are optimized in terms of location and number used, while all potential pressure sensors are incorporated in the network. In the last part, pressure sensors are optimized according to the most preferable flow sensor placement solution. Most preferable solution corresponds to maximum leakage identification ratio. The proposed joint flow and pressure sensor placement aims to identify the leakage zone, and narrowing down the exact location of leakage within each zone would require more sensors and further analysis.
Since our network is more sensitive to flow as opposed to pressure, highest priority in the sequential optimization process is assigned to flow sensors, and pressure sensors are optimized at the second stage to enhance the accuracy of leakage zone identification. One may sequentially repeat optimizing the location of flow and pressure sensors to minimize the impact of prioritizing flow sensors, but our analysis shows such impact is negligible. Thus, flow and pressure sensors are positioned in the WDN using NSGA-II multi-objective optimization to find a Pareto front with number of sensors and accuracy of the identified leakage zones as objectives. The elapsed time for leak identification (from the beginning of leakage until identification) is also investigated using the optimized sensors. In the next sections, the proposed methodology is explained in more detail.

Leak simulation
Leakage can occur in a variety of forms due to longitudinal cracks, lateral cracks, and holes (Greyvenstein and Van Zyl 2007;Shafiee et al. 2016). In this study, we assume leakage occurs due to holes in the network, and are modeled as orifice in EPANET. Equation (1) (Emitter equation) is used to simulate the leakage (Sanz et al. 2015;Gamboa-Medina and Reis 2017), where, qFlow (gpm) that leaks as a function of pressure p psi ð Þ, c Emitter coefficient (gpm=psi 0:5 ), γ Pressure exponent that is equal to 0.5 for orifice (Rossman 2000). Various leakages can be modeled by modifying the emitter coefficient (c) in this equation. It is noteworthy that the EPANET model requires a long run-time to reach equilibrium for the large Mesopolis WDN, and the software does not support modifying the emitter coefficient on the fly. The emitter coefficient is only to be modified for any leakage scenario in the model set up stage. Therefore, Equation (2) is utilized to model leakages at different times and different nodes, in which each node's demand is adapted by adding the leakage quantity: where, D new n i ; t ð Þ t!t l New water demand for node n i at any time after start of the leak (t ! t l ) (gpm), D n i ; t ð Þ t < t l Water demand for node n i at time t prior to leakage (gpm), c i Â p t γ Leakage quantity in node n i at time t (t ! t l ) (gpm), β t Coefficient of demand pattern at time t, t l The leak start time (hour), n i Node in which leakage occurs, N Number of nodes in the WDN, T Simulation time (hour). The emitter coefficient (c i ) is determined for each node to enable modeling leakages of 5 to 20gpm. This requires information about pressure at each node, and hence pressure (p) for each node is calculated at different times (one hour temporal resolution) under the control state (no leakage). Using Equation (2), N hydraulic models are constructed, each representing leakage at one node for the duration of simulation. Pressures at the nodes and flows in the pipes in the WDN, which are affected by leakage, can be obtained at all time-steps in the simulation period using these N leakage models. As a result, there are N matrices for pressure and N matrices for flow according to the N leakage models. Each pressure matrix has T Time step rows and N columns. Flow matrices have a similar number of rows and N p columns (N p is the number of pipes in the WDN). Subtracting each of the N matrices of the flow and pressure (corresponding to the N leakage models) from their corresponding scenarios in the control state will yield N divergence matrices for flow and N divergence matrices for pressure (step 4, part 1, Figure 1), which are in turn to be used in the optimization algorithm for sensor placement.

Leak detection and identification
Leakage in any node prompts a pressure drop, which in turn induces a pressure gradient from the adjacent pipes and draws flow toward the leaking node. As a result, a flow drop appears in the pipes adjacent to leaking node (main pipes), and consequently more connected pipes (secondary pipes) will be influenced. In the looped WDNs, water may reach the node from different routes. When a node leaks, network equipment such as pumps and tanks strive to compensate for the pressure drop, and the connecting pipes show inconsistency in the flow rate compared to the control state. As time passes from start of the leakage, more pipes and nodes will be affected. It is also noteworthy that pipes showing flow divergence may change in different time steps. In other words, different pipes will show change in flow rate in different time steps. It is not hence reasonable to specify a large number of pipes that have shown flow divergence from the control state in each leakage scenario for a short time (only a few time steps) as potential location of flow sensors. To address this issue, we introduce a tolerance threshold for flow divergence, where, M i; j ½ is element in row i and column j of the divergence matrices for flow and pressure due to the leakage at the target node. Absolute values smaller than the tolerance threshold are set to zero (step 6, part 2, Figure 1). This selection satisfies two goals: (1) Reasonably reducing the number of pipes (N P ) that have shown flow divergence from the control state, and therefore can be selected as potential location of flow sensors.
(2) Finding pipes (N P ) that have consistently shown flow change in a number of successive time steps (see pseudo code presented in Figure S2 of Supplementary Information, SI).

Optimal location of flow sensors
Now that homogenous zones (from clustering), tolerance threshold and potential pipes for sensor placement are identified, sensors are ranked in a descending order based on the common zones among all sensors. Highest rank is 'z À 1' (z: number of zones), corresponding to a sensor covering only one zone. Rank incrementally decreases to 'zero', for which sensor is common with all available zones. Sensors with the highest ranks of 'z À 1' and 'z À 2' are selected as potential flow sensors for the multi-objective optimization algorithm. This minimizes the overlap between sensors, and in turn operational cost. In other words, selected flow sensors for each zone should at the very maximum be common with only one other zone. Presence of sensors with rank 'z À 2' indicate that more than one zone may be selected for the leakage node. In this case, pressure sensors are examined to reduce the number of erroneously identified zones and enhance the accuracy of leakage zone identification.

Optimal location of pressure sensors
Pressure divergence matrices with absolute values greater than the tolerance threshold (which is different than that of the flow threshold) are used in this step. The tolerance threshold not only helps identify most appropriate nodes but also account for measurement errors of sensors. For each leakage scenario in this study, 10 nodes with greatest absolute values of pressure divergence from the control state are selected as potential pressure sensors. Pressure sensors in each zone are, then, compared with those of other zones to omit sensors that are common between two zones. This warrants specifying only the correct zone for each sensor (step 7, part 2, Figure 1). This finalizes selection of potential pressure sensors for each zone.

Multi-objective optimization algorithm
In order to select the optimal combination of pressure and flow sensors for the WDN, an NSGA-II multi-objective optimization algorithm (Deb et al. 2000) is used. For more details about this algorithm and its applications, refer to (Nikoo et al. 2014;Alizadeh et al. 2017). A set of potential pressure and flow sensors, as described in Sections 2.3 and 2.4, are considered as decision variables in the NSGA-II multi-objective optimization algorithm. At the first stage, only flow sensors are optimized, while all potential pressure sensors are incorporated in the network. At the second stage, location of pressure sensors is optimized given the optimal locations of flow sensors. The objective functions of the NSGA-II multi-objective optimization algorithm at the first and second stages are presented in Equations (4) to (7).
Where, F 1 Objective function that maximizes the accuracy of identified leakage zones using combination of flow sensors proposed by the multi-objective optimization algorithm and all potential pressure sensors, F 2 Objective function that minimizes the number of flow sensors, S 1 Objective function that maximizes the accuracy of identified leakage zones using combination of pressure sensors proposed by the multi-objective optimization algorithm and the optimized flow sensors at the first stage, S 2 Objective function that minimizes the number of pressure sensors, N Number of potentially leaking nodes in the WDN, n Leaking node, d n Value of identification: d n ¼ 1 if one zone is identified properly; d n ¼ 0:5 if two zones are identified and only one is correct; d n ¼ À1 if no zone is identified properly or more than two zones are identified for a node. K f Number of selected flow sensors, K p Number of selected pressure sensors, n sf =n sp Number of potential flow/pressure sensors that are considered as decision variables in the multi-objective optimization algorithm, At the first stage, the NSGA-II multi-objective optimization algorithm selects a number of flow sensors (K f ) among potential flow sensors (n sf ) for each chromosome. For each leakage scenario, the possibility of leak identification using the selected sensors is investigated. For each leakage scenario, a matrix of size T Time step Â K f is constructed for the selected flow sensors. The optimization algorithm searches for flow divergence from the control state to be greater than the tolerance threshold in six successive time steps. Each flow sensor that satisfies this condition is selected. In this stage, all potential pressure sensors with pressure divergence from the control state exceeding the tolerance threshold are also selected. Now that we have optimized flow sensor locations and all potential pressure sensors, the corresponding zones for the flow sensors are specified. A matrix of size N Â Z is then created, in which N is the number of nodes and Z is the number of defined zones. Each row of this matrix represents the frequency of identification of each zone using the combination of optimized flow sensors. By dividing the total number of times that each leakage zone is identified to sum of the elements in each row, the probability of identification of each zone is determined for each leakage scenario. Then, in each row, the zone(s) with identification probability greater than 80% of the maximum probability in the same row is/are selected. This can potentially select more than one zone for each node (each leakage scenario). In this case, pressure sensors are optimized to decrease the number of identified zones for each leaking node and increase accuracy of leakage zone identification. For this purpose, the corresponding defined zones for all pressure sensors, which exceed the tolerance threshold, are selected. Then, the identified leakage zone(s) by the flow sensors that is/are common with the identified leakage zone(s) by the pressure sensors is/are selected. If there is no common zone, the identified leakage zone(s) by flow sensors is/are chosen.
After determining the common identified zones by the combination of flow and pressure sensors for N nodes (Nleakage models), each identified zone is compared with the corresponding actual defined zone for each node. If the identified zone and the actual defined zone for a node are identical, then the value of identification is assigned as 'd n ¼ 1', otherwise 'd n ¼ À1' (Equations (4) to (7)). If two zones are identified for a node and one of them is correct, then the value of identification is 'd n ¼ 0:5', otherwise 'd n ¼ À1'. If more than two zones are identified for a node, the value of identification is 'd n ¼ À1'. Finally, sum of all values of identification greater than zero is divided by N. For more details, refer to Supplementary Information, SI.

Case study
We use the water distribution network of Mesopolis (a virtual city) that is developed for research projects. Since real-world networks are not readily available due to security issues, a large number of studies have been conducted on this virtual network including Drake and Zechman (2012), Shafiee and Zechman (2013), and Brumbelow (2014, 2015). This network provides service to industrial, commercial, and residential sectors, as well as a university and an airport (Rasekh and Brumbelow 2015). A river that flows from south to north in the city is the main source of water with two treatment plants in east and west side of the river (Shafiee and Zechman 2013). This complex network includes 1588 nodes, 2176 pipes, 13 tanks, 65 pumps, 53 valves and one reservoir ( Figure S3, SI). Minimum and maximum altitude of nodes in this network are 0 and 416 feet, respectively (see Figure S4, SI). Pumps are used in series to provide water to the university with an elevation of 390 feet in the eastern part. Pressure should remain between 35 to 80 psi in all parts of the city.
Highest water consumption occurs in the early morning and late afternoon, and lowest is at night. In order to ensure the water pressure for water supply and fire flow requirements, water towers and hillside tanks are employed. Tanks are responsible for water supply when water pressure drops in the network, and hence tanks' water level reaches lowest level during the day, and maximum level at night when excess water is pumped back into the tanks. Since the concentration of water disinfectants (such as chlorine) can diminish over time, which may increase risks of deteriorated water quality in the network, all stored water in the tanks is discharged during the day. In other words, tanks have a memory of only one day. Pumping stations can be turned off during the day and turned back on during the night in order to minimize operational costs (Johnston andBrumbelow 2008, Shafiee andZechman 2013).
Since the network of Mesopolis is large, reaching equilibrium condition in numerical simulation is time-consuming. We simulate a leak that initiates at hour 96 after the start of simulation, and continues for 360 hours (456 hours after the simulation start time) with hourly time step. The study area is divided into a different number of homogenous zones (four to 40 zones) using the k-means clustering method. The proposed methodology is repeated for each number of zones. While the pseudo code presented in Figure S2 (SI) is used to select a tolerance threshold for flow sensors (5 gpm in this study), a tolerance threshold of 1 psi is selected for pressure sensors to consider their intrinsic measurement errors. In the current research, the following assumptions are considered: (1) A widely-used virtual WDN (Mesopolis) has been used with no measured field data, and therefore the EPANET model can be used without any calibration. (2) A single nodal leakage appears in the WDN in each leakage scenario (not multiple simultaneous leakages) neglecting any background leakages; (3) Leaks are assumed to be from a network node, not a link between nodes (joints in pipes). (4) Leakage amount is considered to be between 5 and 20 gpm, proportional to the pressure of the node. (5) Leakage occurs due to holes in the WDN, and is modeled as orifice in EPANET; (6) Emitter equation is used to simulate the leakage; (7) Leakage in any node leads to flow divergence from the control state in a number of pipes; (8) A tolerance threshold is used to consider the measurement errors of sensors and changes in the consumption patterns.
For more details, refer to Supplementary Information, SI.

Results and discussion
We now apply the two stage optimization framework explained in Figure 1 to find optimal flow and pressure sensor locations in Mesopolis WDN. Numerical simulation of this network confirms that as the time elapsed from start of leakage increases, the number of pipes that are influenced increases as well. Moreover, the affected pipes are not necessarily constant in different time steps, a behavior that can be attributed to change in demand pattern, among other reasons. For example, the pipes in the Mesopolis WDN that show a flow divergence of more than 0.1 gpm from the control state due to leakage at node 1, at 1 hour, 1 day, 4 days and 8 days after starting the leakage are illustrated in Figure S5 (SI). It is noted that the Mesopolis WDN is less sensitive to pressure changes as compared to flow data. The effect of flow tolerance threshold values of 1, 5, and 7 gpm on the percentage of the accurate identification of leakage zone(s) is illustrated in Figure 2, when the study area is divided into 5 zones. In this figure, Pareto-optimal solutions obtained by the NSGA-II multi-objective optimization algorithm are compared. Expectedly, more data are removed by increasing the tolerance threshold value. Indeed, a tolerance threshold value of greater than 7 is associated with no selected sensors for most leakage scenarios in the network. In general, number of required sensors shows a monotonic relationship with the tolerance threshold value. Moreover, as threshold values increase, number of applied sensors render less impact on correct identification of leakage zone (Figure 2). The tolerance threshold obtained in the proposed methodology (5 gpmÞ yields maximum rate of correct identification of leakage zone-(s) in the Mesopolis WDN, while the lower and higher tolerance threshold values lead to lower accuracies.
We repeat the numerical simulation of the network with one leaking node and 0.1 gpm tolerance threshold, but now considering divergence in flow of greater than 5gpm, at 1 hour, 1 day, 4 days and 8 days after starting the leakage ( Figure S6, SI). Similar to previous analysis, number of pipes affected by the leak increases with time, but not even close to the extent of impact when tolerance threshold is 0.1gpm. This comparison demonstrates that the affected pipelines increase as tolerance threshold decreases. In other words, a leaking node can impact several different pipelines in the network, but the severity of impact on many of them might often be negligible. Hence the selection of the tolerance threshold is an important task. In this study, using the pseudo-code of Figure   S2, we determined 5 gpm as a proper tolerance threshold for flow (see also Figure 2). Now that we have identified the proper (5gpm) flow tolerance threshold, we analyze the effects of number of zones on the percentage of accurate identification of leakage zone(s). Pareto-optimal solutions for different number of zones obtained by the NSGA-II multi-objective optimization algorithm are compared in Figure 3. The horizontal axis shows the percentage of the accurate identification of leakage zone-(s) (first objective function) and the vertical axis represents the number of flow sensors (second objective function). Expectedly, the number of required sensors for identification of leakage zone(s) increases as number of zones grows. It is also visible here that increasing number of sensors leaves the most significant impact when number of zones is lower. As number of zones increases, the impact of adding more sensors to the network on correct identification of leakage zones decreases. Note, however, that lower number of zones corresponds to a rough estimation of actual leakage location.
We now present a solution with maximum correct identification rate (first objective function) among Pareto-optimal solutions for different number of zones in Figure 4. This figure shows that there is not a monotonic relationship between number of zones and percentage of accurate identification of leakage zones. This finding may be related to the nature of the network of Mesopolis, the geographical structure of the city, the location of the equipment (tank, pump, etc.) in the WDN, and water consumption pattern. The maximum percentage of correct identification of leakage zone(s) is 97% that is associated with five zones. Note that 97% refers to correct identification of zones, not the precise location of the leakage. As shown in Figure 4, the curve shows a non-uniform behavior with high gradient when the number of zones is low (less than 10 zones). By increasing the number of zones, the gradient of the curve will be lower and smoother.
We now focus on placement of optimized flow and pressure sensors when the network is divided into five homogenous zones using k-means clustering algorithm. These five zones are illustrated in Figure S7. First step is to identify the potential locations for flow and pressure sensors. We find locations of 163 potential flow sensors and 20 potential pressure sensors (see Figure S8), as discussed in Sections 2.3 and 2.4, which are subsequently used as decision variables in the optimization algorithm. Due to the nature of this network, which is less sensitive to pressure data, only a small number of pressure sensors are selected. Locations of the optimized flow and pressure sensors distributed in five zones using the NSGA-II optimization algorithm are presented in Figure 5. Using the proposed methodology with 36 flow and three pressure sensors that are distributed in the five zones, the leakage cannot be detected in only one node and the correct leakage zone cannot be identified for 29 nodes. Locations of these nodes (for which leakage occurrence or leakage zone cannot be identified) are illustrated in Figure 6, predominantly situated on the boundary of the zones.
We subsequently divide the study area into 10 zones using k-means clustering algorithm (see Figure S9 for zone configuration). In this case, 319 potential flow sensors and 18 potential pressure sensors can be identified (location of which is shown in Figure S10). As mentioned in Section 2.4, pressure sensors that are common between two zones are omitted, and consequently potential pressure sensors are distributed in only three zones instead of 10 ( Figure S10). The NSGA-II multiobjective optimization algorithm finds 80 flow sensors and two pressure sensors can maximize rate of correct identification of leakage zones ( Figure S11). In this case, the occurrence of leakage cannot be detected in only two nodes and the correct leakage zone cannot be identified for 328 nodes, locations of which are illustrated in Figure S12.
We now shift our attention to analyzing the impact of pressure sensors in improving correct leakage identification rate. The optimized number of pressure sensor(s) among Pareto-optimal solutions obtained by the NSGA-II multi-objective optimization algorithm for five and 10 zones is presented in Figure 7. Use of pressure sensors clearly improves correct leakage identification rate. For example, adding one pressure sensor to a network (with zero such sensors) will improve the correct identification rate by roughly 10% in the case of five zones. Such improvement, however, diminishes rapidly as the number of pressure sensors increases. Moreover, as number of zones increases (and thus the number of flow sensors) the impact of pressure sensors in improving the results diminishes.
Finally, the elapsed time for the identification of leaks using optimized flow and pressure sensors for all leakage scenarios  (1588 leakage scenarios corresponding to 1588 nodes) for five and 10 zones are presented in Figure 8, which are calculated regardless of whether the identified leakage zone(s) are correct or not. As shown in this figure, the elapsed time for leakage identification will increase by increasing the number of zones.

Conclusions
A novel methodology is proposed herein for leak detection and identification of the location of leakage in a water distribution network (WDN) using flow and pressure sensors, consisting of four main parts. Firstly, potential leakage scenarios are simulated in all nodes in the network using the EPANET model considering different users and corresponding demand patterns. Secondly, the WDN is divided into homogenous zones using k-means clustering algorithm and the appropriate potential flow and pressure sensors are selected. Since our network is more sensitive to flow compared to pressure, flow sensors are considered with highest priority for identification of leakage zone, and pressure sensors are optimized at the second stage of optimization.
Adding pressure sensors to the network aims to enhance the Figure 6. Locations of the nodes where the leakage cannot be detected or the correct leakage zone cannot be identified by dividing the study area into five zones. accuracy of leakage zone identification. The elapsed time for the identification of leaks for each leakage model is also investigated. A tolerance threshold is used to consider the measurement errors of sensors and changes in the consumption patterns. The pipes that show flow divergence from the control state greater than the tolerance threshold in multiple successive time steps indicate leakage in the network. The results show that by applying a tolerance threshold as an effective filtering tool, a large number of pipes (decision variables in the optimization algorithm) that do not play an important role in leakage will be omitted. This will only select main pipes that are effective in leakage. The number of required sensors for the network will increase by increasing the tolerance threshold value or increasing the number of zones. Moreover, increasing the number of zones, and hence flow sensor, reduces the need for pressure sensors. In comparison with previous studies, the computation time of the proposed method is lower because there is no need to compare the predicted and the observed values of water demands in all nodes during the operation of the WDN. The information acquired from the flow and pressure sensors is enough for operation of the network. Note that a potential shortcoming of the proposed strategy is misidentification of a significant demand pattern change, due to a holiday for example, as leakage in the network. However, water managers are aware of such potential changes, and a combination of expert-knowledge and this  approach can curb such misidentifications. Also the tolerance thresholds considered as acceptable divergence from normal state (no leak), can help mitigate misidentification of leakage in face of minor perturbations of the demand pattern.
The proposed approach is general and could be readily applied to different WDNs in future work. Also, the effect of uncertainty in the WDN (e.g. the uncertainty in the water demand), the sensitivity of flow and pressure sensors to leak start time (e.g. at midnight or at the peak hour of water demand), different clustering algorithms especially for the nodes on the boundaries between two zones, and involving potentially conflicting objectives of different stakeholders could be considered in future studies.

Disclosure statement
No potential conflict of interest was reported by the authors.