Bayesian Monte Carlo Simulation Driven Approach for Construction 1 Schedule Risk Inference

ABSTRACT


1st round focus group discussion 229
The 1 st round focus group discussion was designed for developing key risk network of case 230 project. It lasted for around 2 hours and mainly focused on two aspects: (1) the identification 231 and verification of construction schedule risks within the context of case project; (2) the 232 identification and assessment of links among schedule risks within the context of case project. 233 Before the 1st round focus group discussion, a brief introduction about network theory-based 234 analysis has been presented to the participants, and a list of potential construction schedule 235 risks from literature review (Table 1) has also been provided to participants as a reference to 236 break their cognitive limitations. 237 The post-discussion log and notes were kept well, which recorded the information related to 238 the double-check of whether the recording is functioning properly, the researcher's reflections 239 and elaborations about the focus group discussion, and the learning from the discussions. The 240 post-discussion notes coupled with the main data collected from the discussion notes ensure 241 the quality and reliability of the data for analysis. 242

2 nd round focus group discussion 243
The 2 nd round focus group discussion was designed for developing Bayesian  notes were also kept well with those from the 1 st round focus group discussion. 249

Identification of the network boundary 251
As the foundation of developing key risk network, the boundary (i.e., specific risks) should be 252 identified and examined at first. 253 The classical experience-based method is one of the most popular methods for risk 254 identification. It includes only core stakeholders to perform the risk identification process, 255 which is conducted based on a stakeholder's or a small group of stakeholders' experiences on 256 'what are the risk categories' and 'what are the risks' by interviews, surveys or focus group 257 discussions. It is convenient and highly efficient to provide insights into risks according to the 258 rich experience of core stakeholders, but it is difficult for the core stakeholders to break the 259 cognitive limitations and draw the whole set of boundaries (Chen, 2019;Yang and Zou, 2014).
In this research, the classical experience-based method was adopted to identify risks for 261 constructing risk network through the 1 st round focus group discussion. Before this focus group 262 discussion, a list of potential construction schedule risks from literature review (Table 1) was 263 also provided to participants as a reference to help them break their cognitive limitations and 264 draw comprehensive boundaries. 265

Establishment and assessment of links 266
After defining the risk network boundary, the links between risks in this research are considered 267 between each pair of risks . The risk structure matrix (RSM) method is 268 commonly adopted to analyse risk links, which was also adopted in this research. 269 The RSM (i.e., adjacency matrix) is defined as a square matrix with entry = = 270 In order to moderate the confusion and divergence of links establishment and assessment, the 277 1 st round focus group discussion was held to develop the RSM with quantitative assessment 278 (Yang and Zou, 2014). The outcomes can identify and quantify the links between risks. 279

Visualisation of network 280
Once the nodes and links have been identified and assessed, a construction schedule risk 281 network for the target infrastructure project can be developed and mapped as a graph ( , ), 282 where the identified risks are mapped as nodes connected by weighted arrows. 283 In this research, the NetMiner 4 was used to visualise the risk network for its high competence 284 in the processing and exploratory analysis of huge networks (Furht, 2010). In the network graph 285

Topological analysis of risk network 289
With risk network mapped as ( , ), the structural configuration is explored and explained 290 by the metrics of topological analysis (Table 2). This analysis consists of three levels. Firstly, through the network-level analysis, the network 292 density and cohesion are calculated out to unravel the network structure quantitatively. The 293 value of density indicates how closely the risks are situated in a network, and the value of 294 cohesion implies the complicated of network configuration in terms of node reachability. Then 295 the node-level analysis is further conducted to determine the key risks through examining the 296 direct and/or propagating impacts of nodes, as well as their functions and properties in the 297 influence network. Five node-level metrics were calculated and analysed in this research, 298 namely, degree difference, ego network size, node betweenness centrality, out-status centrality, 299 and total brokerage (Table 2). Finally, the link-level analysis is conducted to measure the extent 300 that a risk link plays a gatekeeper role in governing the influences passing through it based on 301 betweenness centrality (Chen, 2019;Yang and Zou, 2014). A greater centrality value implies a 302 more critical link. 303 [Insert: Table 2 Definition of metrics for topological analysis] 304

Interpretation of the results 305
Based on the results of analysis at three levels, the key risks and key risk links can be identified. 306 The key risks are distinguished from the risk network with high values in one or more of nodal 307 metrics, including degree difference ( ), ego network size ( ), betweenness centrality ( ), 308 out-status centrality ( ), and brokerage. Meanwhile, the key risk links are identified with high 309 values in betweenness centrality at the link-level. 310 The key risk network thus consists of (1) key risks, (2) key risk links, (3) non-key risks involved 311 in key risk links, and (4) non-key risk links involving key risks. This developed key risk 312 network provides essential information of construction schedule risks for infrastructures but 313 with more concise and manageable structure (Chen, 2019). 314

The construction of DAG structure 316
The construction of DAG can provide a network structure for Bayesian network model, where 317 two kinds of methods have been commonly adopted, namely the expert knowledge driven 318 structure construction method (Hu et al., 2013;Luu et al., 2009), and the observational data 319 driven structure learning method ). However, the structure learning method is 320 not appropriate to be applied in the field of infrastructures due to (1) the uniqueness and 321 uncertainty of construction schedule risks for infrastructures; and (2) the data provided for 322 training process. Although the structure construction method conforming to the verified 323 causalities is more suitable for risk analysis of infrastructures, it can be time-consuming for 324 construction process and inevitably introduce subjective bias from experts (Hu et al., 2013). 325 Due to limited time and data, it is reasonable to use the key risk network from network theory-326 based analysis as basis to generate the DAG structure considering that the topological structure 327 of key risk network is similar to that of DAG in Bayesian network model, where the nodes 328 represent risks and links represent the cause-effect relationships. This novel approach 329 integrating the network theory and Bayesian network is not only more convenient and resource-330 saving but also reliable for incorporating both expert knowledge and analysis metrics ( Table  331 2). 332 In order to transform the network from directed cyclic graph (DCG) to DAG properly, it is the 333 key to find the directed cycles in network and eliminate these cycles without essential 334 information loss, where the directed cycles are formed through 'starting at any vertex ν and 335 following a consistently-directed sequence of edges that eventually loops back to ν again'. In 336 this research, there are two steps developed to construct DAG from key risk network, including 337 (1) searching cycles by DFS algorithm, and (2) constructing DAG by A-MWST algorithm. 338

Searching cycles by DFS algorithm 339
The DFS algorithm is adopted as the searching strategy on account of its convenience and if the back edge is existed, through which the cycles can be identified in the risk network. 364

Constructing DAG by A-MWST algorithm 365
With the cycles identified, the spanning tree transformed from key risk network need to be re-366 developed to construct DAG structure through eliminating the identified cycles without 367 essential information loss. 368 The MWST algorithm can highly reserve the structure properties and provide the associated 369 probability distribution closest to the probability distribution of the original network, as 370 measured by the Kullback-Leibler divergence (KLD) (Pearl, 1988). Based on the MWST 371 algorithm, the A-MWST algorithm has been developed in this research to re-developed DAG 372 model from spanning tree: 373

374
In this A-MWST algorithm, the betweenness centrality of link, ( → ) , is a reliable 375 metric for mutual information ( → ) of corresponding edge and has been defined as the weight of corresponding edge. The process of applying A-MWST algorithm to constructing 377 DAG based on spanning tree is designed as follows: (1) starting from the empty tree over all 378 variables (i.e., nodes); (2) inserting the largest-weight edge (i.e., link); (3) finding the next 379 largest-weight edge and adding it to the tree if no cycle is formed; otherwise, discarding the 380 edge and repeating this step; and (4) repeating the third step until all edges have been selected 381 and an associated DAG is finally constructed whose weight has the maximum value of 382 . 383

The development of CPTs 384
The development of CPTs is another obstacle preventing the adoption of Bayesian network in 385 the practice of construction schedule risk management with limited time and data, whose function (Diez, 1993). The noisy-MAX model just requires a small number of parameters to 410 specify the entire CPTs, which is linear in the number of conditioning variables rather than 411 exponential (Wisse et al., 2008). It significantly reduces the efforts in knowledge elicitation 412 from experts (Wisse et al., 2008), improves the quality of distributions learned from data 413 (Oniśko et al., 2001), and reduces the special and temporal complexity of algorithms for 414 Bayesian networks (Diez and Galán, 2003). 415 However, in practice, it is neither feasible nor desirable to model all variables influencing a 416 certain node (Diez and Druzdzel, 2006). According to Diez and Druzdzel (2006), in this 417 case, assuming that there is a large Bayesian network that properly represents the real-world  In order to start this risk inference process with JT algorithm, the Monte Carlo simulation (MCS) 468 is adopted for simulating the occurrence of risk as evidence according to the updated 'risk sate 469 probability boundary'. This 'risk sate probability boundary' can demonstrate the probability of 470 occurrence of different risk states (i.e., state 1, 2 and 3), which is also the marginal probability 471 of each risk state. 472 The random number within the scale [0, 1] is firstly generated by MCS based on uniform 473 probability distribution, which is the index of determining the state of target risk. The equal 474 chance of getting any stochastic value between 0 and 1 can model the real system more 475 realistically and accurately (Ökmen and Öztaş, 2008). avoid the occurrence of construction delay. It is thus necessary for this project to conduct the 512 construction schedule risk inference in advance. In the past projects, the project management 513 team mainly relied on experience to manage construction schedule risks, which can identify 514 and classify key risks but cannot quantify probabilities of such risks. The developed approach 515 is needed by the project management team for quantitative risk inference. 516

Case data collection 517
Two rounds of focus group discussions have been held in August 2017 and October 2017 518 separately to collect data for analysis. 519 In the 1 st focus group discussion, the construction schedule risks were firstly provided and 520 verified by five participants from the project management team using the classical experience-521 based method. Totally 32 construction schedule risks have been identified as the network 522 boundary of risk network (Table 3). The links between risks were then identified and assessed 523 according to the expert experience and opinions using the RSM method. Totally 262 links 524 between risks have been identified, which constructed the risk network (32, 262) together 525 with 32 risks. The link strength was also provided for each identified link (Table 4). 526 [Insert: Table 3 Construction schedule risks of case project] 527 [Insert: Table 4 Example of RSM with link strength of case project] 528 In the 2 nd round focus group discussion held after the construction of DAG structure, the DAG 529 structure was verified and canonical parameters for all risks involved in the DAG were 530 collected to develop CPTs (Table 5 and 6).

Results and analysis 534
Based on the data collected, the construction schedule risk network can be constructed and 535 visualised as (32, 262) shown in Figure 4(a), where 32 risk (nodes) in different categories 536 (shapes) and sub-categories (colours) were connected by 262 links (arrows). In order to 537 construct the key risk network, the topological analysis has been conducted based on the 538 metrics defined in Table 2. 539 Based on the results of topological analysis at node-level, it was observed that the top three 540 risks with high values of nodal metrics ( , , , and total brokerage) were highly 541 overlapped and consistent. Totally 7 key risks have thus been identified (Table 7). Meanwhile, 542 according to the results of link-level topological analysis, it was observed that a sharp decline 543 was occurred at 10 in the L-shape curve of link betweenness centrality, where ( → ) = 544 10 was set as the cut-off point to distinguish key risk links. Totally 20 key risk links have been 545 selected for their values of betweenness centrality ( ) were higher than 10 (Table 7). 546 In order to construct the key risk network with a simple structure but retaining most of essential 547 information, besides key risks and links, other components involving key risks or involved in 548 key risk links are also necessary to be included in the key risk network, where 14 non-key risks 549 involved in the key risk links and 19 non-key risk links between key risks have thus been 550 counted in (Table 7). As shown in Figure 4 [Insert: Table 7 Components of key risk network] 556 After developing the key risk network (21, 39), DFS algorithm was firstly adopted to search 557 for the cycles if existed in the network through transforming the network into a spanning tree. 558 It was observed that five back edges (i.e., links) were existed in the network, including Tw-559 R2→S7R4, Tw-R1→Tw-R2, Tw-R1→S7R4, S1R6→S0R2, and S7R4→S1R5, indicating that 560 there were cycles existed in the spanning tree. In order to further transform the cycled spanning 561 tree into DAG structure, the A-MWST algorithm was then applied to eliminating these cycles and constructing the DAG structure. According to A-MWST algorithm, four risk links have 563 been eliminated from (21, 39) due to their low weights (i.e., the betweenness centrality of 564 link), including Tw-R1→Tw-R2, S1R5→S7R4, S1R6→S0R2, and S7R4→Tw-R2. Finally, the 565 DAG structure (21, 35) consisting of 21 risks and 35 risk links has been developed ( Figure  566 5), which however has no cycle existed compared to the key risk network (21, 39). Following the development of DAG structure, the leaky-MAX model was further adopted to 569 generate the CPTs based on the determined canonical parameters ( Finally, the MCS-driven risk inference can be conducted to identify key construction schedule 574 risks and predict the probability of risk occurrence. Based on the construction process (time 575 sequence), the simulation sequence was determined as 'S1R3, S4R5, S4R6, S9R3→ S4R4, 576 S1R2→ S0R2, S1R6→ Tw-R2→ Tw-R1→ Sp-R4, S6R1, S6R2, S7R4→ S1R5, S7R3→ S3R1, 577 S7R2→Tr-R3→S4R8, S2R1'. After 3,000 iterations of the simulation, the results provided a 578 good estimation of risk occurrence of case project, quantifying the probability of three states 579 for each risk (Figure 6). 580 According to the simulated probability of 'State 3: Worse than expected', these 21 risks of case 581 project can be classified into three categories, including high-risky, medium-risky and low-582 risky, which require different risk management strategies from the project management team. 583 The high-risky ones represent the nine risks with probability of state 3 higher than 50% (i.e., 584 50% ≤ ≤ 100% ) (e.g., Figure 6a-6c), including S7R3 (89.6%), S7R4 (86.9%), S7R2 585 (84.1%), Tw-R1 (76.7%), Tw-R2 (72.8%), Sp-R4 (69.6%), S1R6 (66.7%), S4R4 (50.0%), 586 S6R2 (50.7%). The management team should focus on these risks to avoid risk occurrence and 587 prepare plans for mitigating the risk impact if happens. The medium-risky ones represent the 588 eight risks with 20% ≤ < 50% (e.g., Figure 6d-6f), including S6R1 (49.1%), S0R2 589 (44.2%), S3R1 (43.4%), Tr-R3 (37.6%), S2R1 (36.4%), S1R5 (35.2%), S1R2 (27.9%) and 590 S4R6 (20.2%), where the team need to pay attention to these risks after the high-risky ones and 591 also prepare plans for risk mitigation. The low-risky ones represent the four risks with 0 ≤ 592 < 20% (e.g., Figure 6g-6i), including S9R3 (0%) and S4R8 (9.3%), S1R3 (13.1%) and 593 S4R5 (19.4%) where it is not necessary for the team to pay much attention to them but have risk mitigation plans prepared. 595 Based on the results, the risk management and mitigation for these 21 risks were prioritised for 596 decision-makers to draw a risk management benchmark with four different levels (from 0 to 3) 597 and resources input accordingly. Specifically, the level-0 indicates that no risk happens, and 598 the construction progresses as expected. The level-1 indicates that one or more low-risky risks' 599 states are 'Worse than expected', but the construction schedule is just impacted slightly. The 600 level-2 indicate that one or more medium-risky risks' states are 'Worse than expected', and the 601 construction schedule is impacted moderately. The level-3 indicate that one or more high-risky 602 risks' states are 'Worse than expected', and the construction schedule is impacted seriously. 603 This benchmark can help decision-makers understand the risk interdependencies and dynamic 604 nature of risk propagation, and avoid rippled disruption of project construction and delivery. 605 To verify the results, these probabilities of risks were back to the project team for further 606 discussion, where the five experts participating the focus group discussion before were invited 607 to review the probability of each risk and assess how appropriate these probabilities are based 608 on their rich experience on similar projects. Each expert was asked the same question "How 609 appropriate are these probabilities to be used to predict risk states?" Through using the five-610 point Likert scale (from 1 = "Not appropriate at all" to 5 = "Very appropriate"), the results 611 showed that all the probabilities (of state 1, 2 and 3) have been scored over 3 averagely, 612 indicating that the simulation results were believed to be appropriate to be used to predict the 613 risk states and occurrence of this case project.