Large metropolitan water demand forecasting using DAN2, FTDNN, and KNN models: A case study of the city of Tehran, Iran

Abstract Efficient operation of urban water systems necessitates accurate water demand forecasting. We present daily, weekly, and monthly water demand forecasting using dynamic artificial neural network (DAN2), focused time-delay neural network (FTDNN), and K-nearest neighbor (KNN) models for the city of Tehran. The daily model investigates whether partitioning weekdays into weekends and non-weekends can improve forecast results; it did not. The weekly model yielded good results by using the summation of the daily forecast values into their corresponding weeks. The monthly results showed that partitioning the year into high and low seasons can improve forecast accuracy. All three models offer very good results for water demand forecasting. DAN2, the best model, yielded forecasting accuracies of 96%, 99%, and 98%, for daily, weekly, and monthly models respectively.


Introduction
Forecasting water demand is essential for managing urban water supply systems. We present daily, weekly, and monthly water demand forecasting for Tehran, Iran. Tehran has a residing population of 8 million, and a daily floating population of around 10 million. Tehran's climate is largely defined by its geographic location, with the towering Alborz Mountains to its north and a central desert to the south. The city's relatively high temperature (especially in the summer), low annual precipitation, ever increasing population, and limitations of renewable water resources have all caused this metropolitan city to be in a perpetual state of water crisis.
The surface water resources in Tehran are the Karaj, Lar, Latian, Mamloo and Taleghan reservoirs with the maximum annual water capacity of 330, 150, 180, 90, and 150 million cubic meters (MCM), respectively. Tehran supplements surface water with groundwater by wells (595 MCM/year) and aqueducts (96 MCM/year) (http:// tww.tpww.ir/fa/p8/p15). Tehran's population from 1966 to 2006 increased from 2,720,000 to 7,798,000 while annual water consumption increased from 98 to 1100 MCM. During this period, per capita water usage increased from 99 to 386 lit/day. If this trend continues, water consumption will reach 1290 MCM/year in 2026. In this situation, the city would be faced with a water shortage of more than 100 MCM/year in drought years. The maximum (in July) and minimum (in October, for daily, and late march for weekly and monthly) volumes of water consumption (MCM) from 21 March 2003 to 13 April 2010 were 3.27 and 1.82 for daily, 22.53 and 14.29 for weekly, and 97.87 and 66.74 for monthly. The average (standard deviation) values for water consumption for daily, weekly and monthly were 2.67(0.256), 18.33(1.63), and 79.67(7.847) respectively. The average daily water production for weekends were higher than weekdays. May through October months are characterized by higher than average daily water demand. Detailed characterization of the water demand data is presented in the associated supplement.
Based on official reports from Tehran Water & Wastewater Company (TWW), the Urban Water Supply Department faces many serious challenges, in part due to lack of timely planning to make the necessary provisions for emergency conditions (http:// tww.tpww.ir/fa/p8/p15). Accurate water demand forecasting can positively contribute to this challenge.
Water demand forecasting is inherently a challenging issue. This difficulty reflects the existence of a nonlinear relationship between the factors with the most direct effects on water demand. We employ a time-series based model and use the actual water demand values produced, or consumed, in Tehran as our primary independent variable(s).
When modeling the problem as a time-series, artificial neural networks (ANN), K-nearest neighbor (KNN) and autoregressive integrated moving average (ARIMA) are often employed. Recent studies have offered ANN based solutions as an effective modeling approach for water forecasting (Firat et al. 2009, Donker et al. 2014. Donker et al. (2014) recently reviewed urban water demand forecasting methods and concluded that "in general, authors find ANN to perform better than conventional methods" especially when the study targets multiple periodicities. Ghiassi et al. showed  as water demand volume and pressure. All models are evaluated with the commonly used mean absolute percent error (MAPE), R 2 , MSE, and SSE statistics. We next present a brief overview of each model and then compare their performance in the subsequent sections.

Focused time-delay neural network
We use the focused time-delay dynamic neural network to assess its effectiveness for water demand forecasting. The FTDNN is a dynamic network for supervised nonlinear forecasting.
An architecture with two hidden layersthe first layer with two neurons, and the second layer with fourof distinct time delays for daily, weekly, and monthly models with up to 100 training cycles, is used. The network is trained to perform multistep-ahead forecasts by feeding and repeating the forecasts back to the network input. Detailed discussion of the FTDNN is described in (http://www.mathworks.com/help/nnet/ug/designtime-series-time-delay-neural-networks.html.).

Dynamic artificial neural network model
We have developed an implementation of DAN2 algorithm in this research. DAN2 is a dynamic, feed forward algorithm with built-in knowledge accumulation capability. Ghiassi et al. (2008) have used DAN2 to successfully forecast water consumption for the city of San Jose, California. A recent survey of urban water forecasting methods has evaluated DAN2 against alternative models and has highlighted its effectiveness for forecasting urban water demand for multiple periodicities (Donker et al. 2014). In this research, we also use DAN2 to forecast water demand for Tehran. For a complete discussion of DAN2 see (Ghiassi et al. 2005(Ghiassi et al. , 2008.

K-nearest neighbor method
The third method used in this research is based on the KNN approach (Lall and Sharma 1996). The k nearest neighbors in time-series are selected in terms of the weighted Euclidean distance to the predictor variable. The selection of k is the most important factor in the KNN method. Its small value leads to variance in forecasting and its high value causes bias in the model. For the purposes of the present study, the value k = n 1/2 is used.

Water demand foresting results and discussion
To measure the efficacy of the models, we employed the accuracy metrics commonly used in water demand forecasting. These include MAPE, Accuracy (100-MAPE), R 2 , MSE, and SSE. An accurate forecasting system should always result in low MAPE values. high values of R 2 represent a model's superior explanatory power in forecasting sudden variations.
The predictive power of the models is evaluated using test data. In this research, 1.0%, 7%, and 28% of the total data is selected as the forecasting ranges for the daily, the second weekly, and the second monthly methods, respectively. For ANN models, the dataset is divided into two sets: the training and the validation sets. We use the generally accepted ratio of 80% for training and over traditional ANN models and the non-neural alternatives such as MLR and ARIMA (Ghiassi et al. 2008). We have followed the same trend and have selected two different ANN based solutions in our research. We have also included KNN as another alternative. KNN's simplicity, availability and ease of operation is used as a fast and viable alternative.
We have used the MATLAB interactive programming software for development and implementation of the DAN2, focused time-delay neural network (FTDNN), and KNN algorithms.

Literature review
The growing population and the entailing increase in water consumption warrant a growing need for water demand forecasting. Researchers claim that economic and demographic factors can be included in the models for long-term water demand forecasting while significant relationship between climate conditions and urban water demand for short-term forecasts is revealed (Ghiassi et al. 2008). however, the effects of climate conditions on water demand can be indirectly represented within the auto-regressions of past water demand data.
Zhou et al. developed a 24-hour forecast model for the Melbourne area and noted that "models that consider the summer and winter months separately exhibited considerable improvement over a single model" (Zhou et al. 2000). In the model developed by Ghiassi et al., however, the seasonal approach to medium and long-term water forecasting yielded only relatively low improvements over those of the equivalent single model (Ghiassi et al. 2008).
Tabesh et al. used a neural networks algorithm to forecast the daily water consumption in the city of Tehran based on meteorological parameters and historical water consumption data. They observed that the effects of different parameters on water consumption varied in different seasons (Tabesh and Dini 2009).
Ultimately, the effectiveness of the selected model is measured by its accuracy, where the accuracy measures forecast values against their corresponding actuals. however, reliable post implementation data is often not available and as suggested by Billings and Jones (2008), "most studies lack this crucial information. "

Research methodology
In this study we use a time-series approach with three different models (DAN2, FTDNN, and KNN) to forecast water demand for Tehran. We use the same input datasets to report the accuracy of each model and to compare their performance.
A comprehensive water demand forecasting approach spans multiple time horizons, each offering significant benefits to the operational efficiency of the Tehran water system. For the long term, we have developed models to forecast monthly demand values two years into the future, to serve as a basis for decision making on system expansion and major maintenance activities. For the medium term, we have developed models to forecast weekly demand values six months into the future, which allows TWW to better schedule maintenance activities and medium scale projects. For the short-term, we have developed models to forecast daily demand values four weeks into the future, to accurately schedule water production to minimize electricity and other utility costs while maintaining system requirements such 20% for validation. We have used a hold out set for testing all three models. The size of the hold out set depends on the forecasting horizon and is 28 days ahead for daily, 26 weeks ahead for weekly and 24 months ahead for monthly.
The KNN model did not require training. The dataset partitioning scheme for each forecasting approach are summarized in Table 1.

Sensitivity analysis
Sensitivity analyses determine which input parameter exerts the most influence on model results. A simple approach to sensitivity analysis is to vary one-parameter-at-a-time while keeping other parameters constant and measuring the impact of the change on the model effectiveness. This study uses sensitivity analysis to enable the testing of model structure and to better understand model performance.
The effects of changes in time delays, size of the input data, and the forecast time interval on the R 2 values are evaluated by varying one factor while keeping the others constant. Sensitivity analysis of daily models to varying forecasting time intervals, size of the input data and time delays revealed that changing the forecasting time interval led to greater variations in the accuracy of the model results. When the forecasting time intervals were changed from 3 to 15 days, the R 2 values for DAN2, FTDNN, and KNN models changed between 0.01-0.88, 0.04-0.94, and 0.04-0.83, respectively. All three models behaved reasonably the same. This behavior is to be expected since when the forecasting horizon increases all models show more variations in their accuracy. The daily model however showed much smaller sensitivity to time delays (with the range of about 0.29) and an even smaller range for the size of the input data (with the range of only 0.16). The weekly models behaved reasonably the same as the daily models; so that when the forecasting time intervals were changed from 3 to 50 weeks, the R 2 values for DAN2, FTDNN, and KNN changed between 0.02-0.76, 0.08-0.79, and 0.04-0.78, respectively. For the monthly models sensitivity to time delay was also investigated. When time delays were changed from 1, 2 months to 1, 2… 18 months, the R 2 values changed between 0.14-0.77 for FTDNN, 0.73-0.89 for KNN, and only 0.73-0.81 for the DAN2 model. Among the three models, only FTDNN was sensitive to these changes. When increasing the time delay up to 18 months, the total available training data is reduced. We note that all traditional ANN models are sensitive to reduced training data. The variation in training data size resulting from variation in time delay may explain why FTDNN produces larger R 2 range.

Description of Tehran water system data
Tehran Water and Wastewater Company (TWW) provided the data for this research. Daily water production data from March 2003 to April 2010 were obtained from this company (http://tww. tpww.ir/fa/p8/p15). The monthly data on Tehran domestic water production and consumption over the period of March 2003 to November 2009 were obtained from the Company's website. Water production is used as a measure of water demand in this study. Moreover, monthly demand forecasts are also made using the available data on monthly urban water consumption.
We use the daily water production data for daily, weekly, and monthly forecasting models. Two integration methods are used for weekly and monthly models. In the first approach, daily data are summed up to produce the corresponding weekly and monthly values which are subsequently used to forecast weekly and monthly demand over a six-month and two-year horizons, respectively. In the second approach, the direct forecast of daily values is performed for the same forecasting horizon of 182 days and 731 days. The predicted values are then summed up to determine the values of corresponding weekly and monthly demands.

Daily forecast model
Daily water forecasting allows utilities to have an accurate measure of water demand for the current and the next few days. This information can guide them in operating their treatment plants and wells, and pumping schedules economically (Billings andJones 2008, Donker et al. 2014). When modeling daily water consumptions, researchers have shown a significant difference between weekday and weekend water demands (Ghiassi et al. 2008). We also observed that average daily water production on different weekdays was lower than on weekends. We developed two models for daily water forecasting. In the first model we did not differentiate between weekdays and weekends whereas in the second model weekends were treated separately. As shown in Table 2, the accuracy and R 2 values between actual and predicted demands in the partitioning method exhibit lower values than those in the former method. SSE and MSE indices are doubled in the partitioning method and the forecasting error is increased. Clearly, for this dataset, partitioning of week into weekdays and weekends has not improved forecasting accuracy. This may be due to the fact that daily models are highly sensitive to varying forecasting time intervals, as discussed in Section 4.1. Table 2, shows partitioning data into weekdays and weekends, and subsequently reducing forecasting intervals, actually reduces forecasting accuracy in the second method  -term  daily  day  2576  2048  497  28  1, 2, 3  weekday  day  1840  1462  355  20  1, 2, 3  weekend  day  736  584  142  8  1, 2  Medium-term  weekly  week  368  275  67  26  1, 2, 3, 4  daily summed  day  2576  1925  469  182  1, 2, 3  long-term  monthly  month  80  42  12  24  1, 2  daily summed  day  2557  1458  365  731  1, 2, 3  High season  day  1295  737  185  370  1, 2, 3  low season  day  1262  718  180  361 1, 2, 3 approaches for the inclusion of weather in their models. In the first approach, temperature values are used as additional model variables. Although past weather information may be available, forecasting future weather values can still be inaccurate and problematic. Alternatively, researchers have used a seasonal approach to account for the weather effect (Ghiassi et al. 2008).
Our third method uses a seasonal approach instead of the direct use of weather information. Spring and summer are not only warm seasons of the year in Tehran, but contain many holidays, both of which can cause great variations in water demand. high temperatures in these two seasons cause water consumption to rise while holidays cause it to decrease. Therefore, average daily water demand in different months of the year was investigated for this study. The results show that the months of June through October are characterized by higher than average daily water demand and production volumes. These months are selected as the high demand season and the remaining months of the year are designated as the low demand season. The daily demand data were subsequently divided into high and low demand seasons. In the forecasting stage, these two series are predicted on a daily basis and are finally summed up into their corresponding months. The results for each month are inserted in their original arrangement inside the year so that the results from the third method could be compared with those obtained from the two former methods. We compare the performance of the three integration methods using all three modeling approaches and present the results in Table 2. Results show that the third monthly forecasting method provides better results for all three models than the others with respect to all performance metrics. The best results are obtained using the DAN2 model. The improvements as measured by the R 2 values, using DAN2, are more pronounced. DAN2's improvements over the first two integration methods are 27% and 15% in the R 2 values, respectively. Similarly, the reduced values of SSE and MSE indices also indicate a reduced forecasting error in the third model. The third method improves upon the result of the first and second monthly forecasting methods by 72% and 66% in average SSE and MSE values, respectively. as compared to the first method. In the first method, the DAN2 model yielded slightly better results with an accuracy of 96%. This result was only marginally (about 1-2%) better than the accuracy of FTDNN and KNN models (94% and 95%).

Weekly forecast model
Accurate weekly demand and consumption information allows water companies to optimize their decisions about short term maintenance activities such as drawdown on reservoirs and wells, and pump repairs (Billings and Jones 2008).
Two integration methods described in Section 4.2 are used for weekly models. Table 2 shows that the accuracy and the R 2 values of forecasting results are better in the second approach. The MSE values of 0.82 and 0.67 for FTDNN and KNN models in the first integration approach, decreased in the second integration approach to 0.09 and 0.05, respectively. Similarly, the R 2 values for DAN2, FTDNN, and KNN models in the second integration approach compared to the first one, increased by 0.72, 0.78, and 0.73, respectively. This result can be explained by the sensitivity of the models to different forecasting intervals, as shown in Section 4.1.

Monthly forecasting model
The monthly forecasting models provide water demand information for the few years ahead. Management can use this information for system improvements, revenue forecasts, and rate setting.
Three integration methods are employed in the monthly models to forecast monthly water demand for the two years 2008 and 2009. In the first method, monthly consumption data are used with time delays of the last two months. In the second method, direct forecasting of daily values is performed using daily water production data. The predicted daily values are then summed up to obtain the corresponding monthly demand values. For the third integration method, we note that weather can impact monthly water consumption. Researchers have used two analysis of the policies after implementation, using operational and financial metrics and their comparison with the ex-ante versions. Unfortunately, most water companies (private or public), are not willing to share current operational or financial information, citing propriety and confidentiality concerns (Billings and Jones 2008). Therefore an ex-post analysis of the operational and financial efficacy of these models is not possible at this time and is the subject of future research.
Using an MSE metric in the third integration method, DAN2 improves upon the result of FTDNN and KNN models by 21% and 14% (Table 2).

Conclusions
Daily, weekly, and monthly forecasting of Tehran urban water demands were performed using the data on daily water production and monthly water consumption as inputs.
The best results for forecasting daily water demand were obtained from the model that did not distinguish weekdays from weekends (accuracy of 96% with DAN2). The results obtained from weekly models based on partitioning demand into weekdays and weekends did not increase forecast accuracy. This result may be due to the sensitivity of the models to shorter forecast intervals. In the weekly model, direct forecasting of daily data followed by integration of the demand values into corresponding weekly values offered much better results (accuracy of 99.06% with DAN2). For the monthly models, the R 2 values and the accuracy increased significantly with partitioning the year into high and low seasons. The best accuracy of 98.39% was obtained using DAN2 with the seasonal approach. Generally, DAN2, FTDNN, and KNN models produced good results for different time intervals. however, DAN2 enjoyed a higher accuracy with the first daily, the second weekly, and all monthly methods.
Prior to this study, the TWW used an ad-hoc approach for managing their operations. The approach recommended in this study allows them to augment their ad-hoc decision making with an effective model-based approach. however, the true measure of effectiveness of any water policy must include an ex-post