PMU Network Routing for Resilient Observability of Power Grids

Smart grid technologies have been transforming the power grid operation paradigms by integrating smart sensing devices, advanced communication networks, and powerful computing resources. In addition, data-driven applications have significantly increased in recent years, accelerating the use of smart sensors, such as phasor measurement units (PMU), in power grid monitoring. It necessitates a well-functioning communication network (CN) for PMU measurement data transfer to the control center even in the event of failures. This paper proposes a PMU network routing algorithm to ensure data transfer for control center's resilient observability to the power grid. The interdependent roles of PMUs in power grid observability is first identified based on the power grid topology. Then, a failure-tolerant routing algorithm is proposed to find data transfer paths in the CN that meets the power grid monitoring needs. The resultant routing paths ensure resilience against single link failure, where the resilience is defined in terms of grid observability. Besides, a cost metric is defined to minimize end-to-end delay in the network to facilitate real-time data transfer. Simulation results verify the superiority of the proposed routing algorithm compared with conventional fault-tolerant routing algorithms that are agnostic to the domain knowledge of power grid observability.


I. INTRODUCTION
In recently years, wide-area measurement systems (WAMS) are being deployed in power grids for enabling granular realtime monitoring capabilities [1].It consists of three main components: phasor measurement unit (PMU), phasor data concentrator (PDC), and communication network (CN).PMUs are placed at some optimal locations over the grid that measures electrical variables in a synchronous manner with the help of global positioning system (GPS) [2].The PDC at the control center consolidates PMU measurements transferred via the CN [3] and processes them for further applications [4].A conceptual diagram of the WAMS is shown in Fig. 1, where the installed PMUs at different bus locations of the power grid transfer measurement to the control center PDC via CN.
With the increase of PMU measurement-based applications such as state estimation, event detection, voltage stability and control, and fault localization, the PMU network is playing a critical role in power grid monitoring [5].To accomplish the This work relates to the Department of Navy award N00014-20-1-2858 and N00014-22-1-2001 issued by the Office of Naval Research.The United States Government has a royalty-free license throughout the world in all copyrightable material contained herein.highly time-sensitive monitoring tasks, a large volume of highresolution PMU measurements needs to be gathered in the control center via CN with strict end-to-end delay requirements [6], [7].Furthermore, both PMUs and the CN are exposed to the risk of failures due to internal or external reasons.
Therefore, an efficient and resilient routing algorithm is needed to find effective paths between the PMUs and the PDC, fulfilling the power grid monitoring needs even under component failures [8].Many existing routing protocols have been adopted for PMU networks, and IP multicast is one of the most widely used [9], [10].Most existing multicast routing is based on the assumption that each source, i.e., PMU, sends the measurement to a group of destinations, i.e., PDCs, via an optimal tree built between the PMU and the PDCs.Multicast is adopted in WAMS because of its unique features.In particular, to deliver data to multiple PDCs, the source PMU transmits a single data packet that is replicated at the routers where the tree splits towards distinct PDCs.Ref. [11] formulates the PMU-PDC multicast as bandwidth constrained delay minimization problem and proposes a Lagrangian Relaxation based solu-tion.A multicast tree formation is formulated as an integer linear programming (ILP) in Ref. [12] and the solution is provided to suppress the effect of cyber-attack in the PMU networks.In spite of many advantages, multicast may not be the most efficient routing protocol for PMU measurement aggregation.The reason can be better understood based on the needs of power grid monitoring.The WAMS architecture is hierarchical and the final goal is to gather PMU data at the PDC in control center, not to send a single PMU data to a group of destinations.Therefore, the usage of multicast without considering the need of grid monitoring may generate unnecessary duplicate traffic in the network.Ref. [13] proposes a solution to this problem by identifying the intermediate routes as the multicast groups keeping the control center (PDC) as the final destination.Similarly, a minimum Steiner tree can be formed by keeping the control center (PDC) as root and the PMUs as leaves considering the hierarchy of WAMS with the objective of delay minimization [14].Alongside the multicast, some other methods are explored for PMU measurement aggregation [15].For instance, software defined networking (SDN) has been used in [16] to ensure quality of service (QoS) in the network increasing reliability and fault tolerance.Recently, content-centric networking (CCN) is evaluated for data aggregation in WAMS, which utilizes multiple transmission interfaces and in-networking caching to ensure low latency and reliability [17].
The existing literature on PMU network routing largely inherits well-established routing principles in the general networking domain.The main drawback is that they lack the consideration of specific power grid monitoring needs, and cannot fully ensure resiliency with the highest possible efficiency.Specifically, they consider PMUs as absolute data sources without considering how their roles interact over the power grid topology, which determines their roles on power grid observability.As a result, the method for enhancing resiliency is to uniformly increase the connectivity among PMUs and PDC.However, being agnostic to the effect of PMUs on power grid operation, the above approaches cannot ensure topological observability of the power grid under component failures.Without attaining grid observability, the state estimation function becomes infeasible and cannot recover all the state variables necessary for proper grid operation and control [18].Even if routing algorithms are able to satisfy power grid monitoring needs under failures, it usually comes at a cost of overly conservative protection which generates unnecessary traffic in the network, increasing end-to-end delay that affects the timeliness of the grid monitoring tasks [13].This owes to the lack of understanding of the power grid observability theory and blunt application of general routing principles.
To address the aforementioned shortcomings, this study proposes an efficient and resilient routing algorithm that meets the power grid monitoring needs.Instead of treating the PMUs as absolute data sources, they are considered as a bridge to input information for power grid observability.With the cross-layer information, the proposed routing algorithm en-sures fault-tolerant transfer of information to make the power grid observable while minimizing data transfer delay in CN.Furthermore, by exploiting the interdependent roles of PMUs over the power grid topology, the method inheres resilience to maintain grid observability against single link failure without the need for an overly conservative protection with backup paths for every PMU.As such, a drastic reduction of endto-end delay is achieved with guaranteed resiliency for power grid monitoring.

II. DEFINITIONS, CONCEPTS, AND PROBLEM STATEMENT
Power grid observability is an important concept that needs to be achieved for both monitoring and control applications.In a power grid, when the voltage phasor of a bus is known, the bus is said to be observable.Accordingly, the observability of the grid means that the voltage phasors of all the buses are either known or can be obtained from the available measurements related to the grid.PMUs are installed to obtain the measurements making the grid observable.The voltage phasors of a bus and its neighboring buses can be obtained using the measurements from a PMU installed at the bus [3].For instance, as shown in Fig. 1, the PMU installed at bus 4 makes bus 4 and its neighboring buses, buses 1, 5, and 6, observable.Therefore, it is not necessary to install a PMU at every bus.The optimal placement of PMUs over the grid is extensively studied to ensure grid observability [2].However, the placement of PMUs cannot alone ensure the observability, because a CN is required to transfer the measurement to the PDC at the control center.Suppose, a PMU set P = {u 1 , u 2 , . . ., u k , ..., u p } is installed at different bus locations in power grid, then a CN for the connecting PMUs to the PDC c at the control center can be represented by a graph G c (P ∪ c ∪ S, E c ), where S is the set of routers in the CN and E c represents the set of communication links.However, in the CN, it often happens that a component in graph G c fails due to device malfunction, natural disasters, or man-made attacks.Under these failure scenarios, the grid may lose observability even though the PMUs remain functional.Therefore, it is essential to maintain the grid observability under the failure of any link in set E c .This is referred to as the resilience requirement for grid monitoring and control.In addition, due to the strict end-to-end delay requirements in PMU measurement based applications, it is needed to transfer PMU measurements to the PDC at the control center with minimal delay.This paper proposes a routing algorithm to address the two needs for power grid monitoring simultaneously: i) reducing data transfer delay to enable real-time PMUbased grid applications; and ii) ensuring observabilitydefined resilience against any single communication link failure.

III. PROPOSED PMU ROUTING FRAMEWORK
The proposed routing framework will be described in the following subsections.First, the interdependent roles of PMUs in observability will be determined based on power grid topology, and they will be grouped into subsets (i.e., SOPG) followed by an explanation of how the SOPG concept improves resilience.Then, the routing algorithm exploiting the SOPG concept for resilient and efficient PMU data transfer will be described.
A. Shared Observability PMU Group (SOPG) For a given PMU configuration P = {u 1 , u 2 , . . ., u k , ..., u p }, we can break down the PMUs into smaller groups based on their contribution to the bus observability in the power grid.A subset of PMUs u k ϵP observing bus i can be grouped together based on the property that they observe the same bus i.This subset of PMUs can be obtained using the connectivity information of the buses in the power grid.Suppose A i = [a i1 , a i2 , ..., a ij , ..., a in ] ⊤ is the connectivity vector of bus i with n other buses, where an element a ij is 1 if bus i is connected to j th bus, otherwise 0; and x = [x 1 , x 2 , ..., x i , ..., x n ] ⊤ is the given PMU configuration vector, where an element x i is 1 if a PMU is installed on bus i, otherwise 0. If an element of vector y i is 1 for a PMU being a member of the PMU group observing bus i and 0 for otherwise, then y i can be obtained as below: where ⊙ performs element-wise multiplication and y i = 1 yields the PMU subset Ω i observing bus i, which is defined as Shared Observability PMU Group (SOPG).This is named as SOPG because the PMUs in set Ω i share the same functionality: making bus i observable.There are some properties of SOPG constructed out of the PMU configuration.i) All the PMUs in a SOPG observe the same bus.Therefore, as long as one PMU is Ω i survives, bus i remains observable.ii) The same PMU can be an element of multiple SOPGs, as a PMU usually observes more than one bus.As an example, for the given PMU configuration of the power grid shown in Fig. 1, SOPG Ω 9 contains PMUs installed at buses {3,9,6} because they observe the same bus i.e., bus 9 (i = 9).

B. Enhancing Resilience with SOPG
The unique feature of SOPG is that it maps the availability of PMUs to the observability of power grid buses.This observability awareness feature of the PMUs captured in SOPG can be used to achieve observability-defined resilience when finding paths from the PMUs to PDC in the CN.Without the SOPG concept based on power grid observability theory, PMUs are regarded as data sources, and existing methods in the literature aim to improve connectivity from the sources to the destination, i.e., PDC.Even though the PMU-PDC connectivity increases the probability of obtaining grid observability, it does not guarantee the observability, especially under failure conditions.
In the proposed resilient routing algorithm, the main purpose is not to ensure the PMU-PDC connectivity rather to ensure grid observability.An example is shown in Fig. 2 to demonstrate how the concept of SOPG builds a observability-defined resilient routing paths for the PMUs.In the figure, PMUs 1 and 2 observe the upper three grid buses, whereas PMUs 3 and 4 observe the lower three buses.In Fig 2(a), the PMUs are considered as only data sources without considering the power grid topological information.There is no restriction on the PMU's routing path to the PDC, and it can happen that PMUs 1 and 2 forward data along a shared route and PMUs 3 and 4 to another, as shown in Fig. 2(a).In this routing configuration, the bus observability will be lost if a single common link in the routing path fails, as both of the PMUs observing a bus use the same data forwarding path.However, with the awareness to power grid observability, it could be found that PMUs 1 and 2 are in the same SOPG, whereas PMUs 3 and 4 are in another.If we enforce that the two PMUs of a SOPG cannot use the same route, then the routing configuration will be changed, where PMUs 1 and 3 will share one path and PMUs 2 and 4 another, as shown in Fig. 2(b).With the new routing configuration, the bus observability is ensured even if a single common link fails.This is because even if the path carrying the data from PMUs 1 and 3 fails, the other two PMUs are able to report their data to the PDC with the active path, ensuring observability of all buses.Based on the above discussion, it is clear that the incorporation of SOPG concept in PMU routing ensures resilience, i.e., maintaining power grid observability, against single link failure without a disjoint backup path or path recovery.This feature can be achieved only when the power grid topology and communication network topology are both considered.

C. Routing Algorithm with SOPG Concept
Based on the SOPG concept described above, a routing algorithm will be described in this subsection to find resilient paths from the PMUs to the PDC.The fundamental idea of the routing is that the PMUs of each SOPG do not share the same path to the PDC as described in Fig 2 .The principle of the proposed SOPG-based routing algorithm is illustrated in Fig. 3.As shown in the figure, several link disjoint trees (T 1 , T 2 , ..., T j , ..., T n ) are formed in the CN to connect the PMUs to the PDC considering the SOPG sets, Ω i .While forming the tree, the only criteria to embed the observabilitydefined resilience into routing is that a PMU u k can join a tree T j as long as T j does not contain all the PMUs of a SOPG Ω i .Mathematically, the criteria for a PMU u k ϵΩ i to join a tree T j can be expressed as below: where U Tj is the set of PMUs already joined to tree T j .The criteria of ( 2) is defined as resilient observability criteria, as fulfilling this criteria preserves the grid observability under single link failure in the network.In Fig. 3, it is seen that the resilient observability criteria is fulfilled.Because even though a single link failure can disconnect some of the PMUs, there is still at least one of the PMUs of each SOPG connected to the PDC, which ensures the observability of every bus in the power grid.The main feature of the routing is that individual PMU does not require backup path or path recovery to provide resilience against single link failure, rather the resiliency is embedded into the routing problem by considering the SOPG concept.As such, unnecessary utilization of network resources can be avoided.Before presenting the routing algorithm, a cost metric is defined below to ensure real-time PMU measurement transfer in the network.An important aspect in PMU network routing is to gather the PMU measurement with minimal delay, which is required by the PMU-based real-time applications.Therefore, we define a cost metric to reduce the end-to-end delay in the network while connecting the PMUs to different trees.Four types of delays contribute to the end-to-end delay in a network, which are propagation, processing, transmission, and queuing delays [12].Among them, transmission and queuing delays have highest impact on the end-to-end delay; propagation and processing delays, on the other hand, become negligible as the technology advances.We express the cost metric combining two factors that are correlated to the transmission and queuing delays.The first factor is hop count, which is correlated to the transmission delay, and the second factor is load-balancing, which is correlated to the queuing delay.The load will be more balanced if we select path between source and destination except shortest hop-count, which will result in lower queuing delay but higher transmission delay, and vice versa [12].Therefore, a trade-off between hop-count and load-balancing is obtained in the cost metric, as defined below: where, σ Tj u k is the cost for connecting a PMU u k to a tree T j , h Tj u k is the hop count; ν T u k is the variance of load, l Tj u k is the load of tree T j , and l T is the average load of all trees after connecting PMU u k to tree T j ; and N is the initialized trees.The cost metric can be utilized to minimize end-to-end delay minimizing both hop count and load-variance in the network by tuning the parameter α within range [0,1].
With the definition of the cost metric, the PMU routing algorithm will be discussed next.We need to deal with several challenges in the routing algorithm.For instance, as one PMU might be an element of multiple SOPGs, it should ensure that the resilient observability criteria of ( 2) is met for all SOPGs having common PMUs.We propose heuristic-based routing algorithm to overcome the challenges and to meet the resilient observability criteria.The main steps of the algorithm are shown in Fig. 4. The algorithm starts with the digraph G c , installed PMU and PDC locations, and all SOPG sets.It contains two separate parts, namely tree initialization and tree expansion.The former initializes the trees in G c , where each tree starts at each incident link of the PDC and connects the shortest hopped PMU.During the tree initialization, it is ensured that the trees become link disjoint.The PMU set is updated to exclude the connected PMUs to the trees.The tree expansion part of the algorithm starts with the initialized trees and the connected PMUs from the tree initialization part.This part mainly connects the remaining PMUs to the trees such that the resilient observability criteria of ( 2) is met for all the grid buses.In this process, an optimization is performed to connect Find potential PMU set from R to be connected to Tj such that the resilience observability criteria of ( 2) holds for each SOPG.
Filter the PMUs that can join Tj within the bandwidth capacity of the links.
Find the cost for connecting PMU uk from the filtered PMUs to PDC according to Equation ( 3) and record the connected PMU uk and its used links Lj having minimum cost .
Expand the tree Tj having minimum cost, i.e., and update Tj = Tj {{uk}, Lj} and R = R\{uk} In Gtemp, set the weights of links in Tj to zero to reuse links and links of other trees to infinite to be disjoint.a PMU to the least cost tree, where the cost is defined in (3) to minimize the delay.The tree expansion terminates when there is no remaining PMU to connect, and the trees with links and connected PMUs are returned as outputs of the algorithm.

IV. SIMULATION RESULTS
Simulation studies are carried out on IEEE standard power grid test systems for verifying the performance of the proposed routing algorithm.For the simulation, the CN topology is synthesized for a given power grid topology.For synthesizing the CN, we map the power grid topology onto a geographic region and consider that a communication link potentially exists between any two power grid buses if their geographic distance is below a certain limit.It is considered that the redundant PMU configuration is given, which is obtained following the PMU placement problems in the literature [2].The data rate of a PMU is proportional to the number of its measurement channels i.e., the buses the PMU observes [19].For simplicity, the data rate of a PMU is set to the number of buses it observes, which can be obtained by the summation of the PMU-installed bus and its neighboring buses.The bandwidth of the communication links is set to a certain multiple of the average data rate of the PMUs so that they can forward at least a certain number of PMU measurements.The simulation results are carried out in a personal computer with 32GB RAM and Intel Core i7-11800H processor using MATLAB R2022a software.

A. Test Cases for IEEE Standard Bus Systems
The routing algorithm is applied on IEEE standard bus systems.At first, IEEE 14-bus system is considered assuming that it covers a geographic area of (100×100) sq.miles.A redundant PMU configuration is obtained for IEEE 14-bus system such that each bus is observable by at least two PMUs [2] and is shown in Fig. 5(a).The SOPG set Ω i is obtained from the PMU configuration according to (1).Fig. 5(b) shows the synthesized CN for the IEEE 14-bus system, where a possible connection is established between two nodes if their geographic distance is less than 30 miles.As the PDC node 8 has three incident links, three disjoint trees are initialized in the network by connecting three shortest hopped PMUs installed at buses 2, 3, and 10.Then, each tree is expanded by connecting the next PMU according to the cost metric of (3) with an α of 0.2, which minimizes the cost most.It is also ensured that the resilient observability criteria of ( 2) is met i.e., the PMUs of SOPG observing each bus join at least two different trees.For instance, bus 5 is observed by the SOPG Ω 5 , which contains PMUs {1,2,6}.Though two of the PMUs {2,6} join the same tree (tree ID. 1 in the figure), PMU 1 joins the other tree (tree ID. 2).Therefore, the observability of bus 5 sustains even if a single link fails in the network, which is true for any other buses as well.Here, the cost metric of (3) ensures that the end-to-end data transfer delay is minimum along different trees.To further show the effectiveness of the proposed algorithm, it is applied for a medium scale system i.e., IEEE 57-bus system assuming it spans over a geographic area of (1000×800) sq.miles.The synthesized CN for the system is shown in Fig. 6, where a potential link between two node is considered if their geographic distance is less than 155 miles.For clarity in the figure, the routers are not shown with the PMU nodes.Similar to the previous test system, redundant PMU configuration is obtained, and PDC is installed at node 36.As the PDC node has nine incident links, nine disjoint trees are built in the network to find routes for all the PMUs.In this case, the resilient observability criteria is also ensured.For instance, from the 57-bus system topology [3], it is seen that bus 1 is observed by SOPG Ω 1 containing PMUs {1,2} and they join two different trees i.e., tree ID. 1 and tree ID. 9, respectively, as depicted in Fig. 6.The simulation run-time of the routing algorithm for the above two systems is 0.0139 and 0.276 seconds, respectively, which is computationally very efficient.

B. Performance Comparison with Baseline
For the performance comparison, a baseline method with comparable resilience level to the proposed framework is The baseline method treats the PMUs as individual and unrelated data sources, similar to the literature on PMU routing, without taking into account their interdependent roles in grid observability as determined by grid topology [11] [13].
Treating the PMUs as unrelated data sources, the baseline aims to transfer data from the PMUs to the PDCs with independent protection for each PMU against single link failure.For ensuring the resilience against single link failure, a PMU observing bus i needs two disjoint paths to the PDC, which is achieved using Suurballe's algorithm that minimizes the hop count of both primary and backup paths.After determining the paths, some static metrics such as number of used links in routing, the average load of each link, and the average hop count between PMU-PDC are evaluated to compare the performance of the proposed and baseline methods.The metrics are shown in Table 1, where it can be observed that the proposed SOPG-based method performs better in terms of all the metrics in both test systems.Note that both the number of links used and the average load of links are higher for the baseline method because it does not exploit the interdependent roles of PMUs in power grid observability, and thus transfers the data of each PMU with a primary and a backup path, which unnecessarily increase the traffic in the CN.The endto-end delay determined in [12] is also used to compare the performance of the proposed and baseline methods.Fig. 7(a) plots the cumulative observability level of the power grid (i.e., cumulative percentage of observable buses) against the end-to-end delay.The proposed method requires around 2.5-unit time to receive measurement data from a sufficient number of PMUs to make the power grid observable, whereas the baseline requires approximately 3.5-unit time, suggesting that the proposed framework outperforms the baseline.The cumulative percentage of PMU channels against the end-toend delay is also illustrated in Fig. 7(b), where it is also observed that the proposed method receives data from any given percentage of PMU channels with significantly lower delay.The proposed method outperforms the baseline for two primary reasons: i) avoiding unnecessary backup path by exploiting domain knowledge on power grid observability; and ii) establishing a cost metric to reduce both queuing and transmission delays.Because of its understanding of grid observability, the proposed routing algorithm requires less redundant data transfer without sacrificing resiliency.

V. CONCLUSION
The paper proposes an innovative PMU network routing algorithm for efficient and resilient monitoring and control of power grids.The key innovation lies in the incorporation of power grid observability theory into the routing algorithm.Based on the power grid topology information, the concept of SOPG is proposed, which contains the PMUs observing a given bus.In order to protect the observability of each bus, the PMUs in the same SOPG are required to join different communication trees such that they do not lose connections simultaneously when a communication link fails.The effectiveness of the proposed framework is using several standard test cases, which shows that the framework significantly reduces end-to-end delay in the network while assuring resilience.This level of performance cannot be achieved by general resilient routing principles without understanding of PMUs' interdependent roles in power grid observability.
More features will be added to the routing algorithm in fuwork, such as considering multiple PDCs in the network, ensuring resilience against multiple component failures, and considering realistic PMU traffic and link bandwidth.

Fig. 2 .
Fig. 2.An explanation of how the SOPG improves the grid resilience combining power grid bus connectivity and communication network data forwarding.PMU data forwarding (a) without the consideration of SOPG (b) with consideration of SOPG.

Fig. 3 .
Fig. 3.The conceptual diagram describes the proposed routing algorithm.The N disjoint trees, represented by distinct colors, are created to join the PMUs to the PDC.The dumbbell shape represents the SOPGs.
Input: Digraph Gc, installed PMUs P, PDC, and SOPG sets .Incident link of PDC, j = 1 Connect shortest hopped PMU uk to PDC c in digraph Gtemp and record the connected PMU uk and used links Initialize a tree Tj = {{uk}, Lj},

Fig. 7 .
Fig. 7. Performance comparison between the proposed and baseline methods in terms of delay for 57-bus system: (a) Percentage of observability vs. delay and (b) Percentage of reported PMU channels vs. delay.

TABLE I PERFORMANCE
COMPARISON OF SOPG-BASED AND BASELINE ROUTING