The design and implementation of a distributed program monitor
Version 2 2024-06-18, 01:18Version 2 2024-06-18, 01:18
Version 1 2017-08-04, 12:16Version 1 2017-08-04, 12:16
journal contribution
posted on 2024-06-18, 01:18authored byW Zhou
One of the reasons that debugging distributed programs is much more difficult than sequential programs is the communication among processes. The ability to provide for communication events (as well as other events) as they happen during program execution is fundamental to any debugging tool. Most articles about distributed debugging and monitoring are message passing oriented. As the remote procedure call (RPC) method becomes more popular, the need to debug RPC-oriented programs increases. This article presents the design and prelimnary implementation of an RPC-oriented program monitor that can record all events of an RPC-oriented program's execution in the monitor's data base. Facilities are provided for programmers to define, choose, and combine events that will be recorded. Partial ordering among events is built after the program's execution. A user can use this relation to trace and replay the program's execution. The monitor has been tested on networks consisting of Apollo/Sun/Digital Equipment workstations.