The Peer-to-Peer Human-Robot Interaction Project

The Peer-to-Peer Human-Robot Interaction (P2P-HRI) project is developing techniques to improve task coordination and collaboration between human and robot partners. Our hypothesis is that peer-to-peer interaction can enable robots to collaborate in a competent, non-disruptive (i


I. Introduction
A key element of NASA's Vision for Space Exploration is that humans and robots will work as partners, leveraging the capabilities of each where most useful. 1 Basic mission tasks, both in-space and on planetary surfaces, will demand close collaboration of humans and robots.But, because cost pressures and other mission constraints (e.g.risk minimization) will keep astronaut teams small, the effectiveness of human-robot interaction (HRI) will have a major impact on the productivity and performance of future missions.
The objective of the "Peer-to-Peer Human-Robot Interaction" (P2P-HRI) project is to significantly advance the state-of-the-art in HRI to facilitate sustained, affordable space exploration.Specifically, we are developing a range of HRI techniques so that humans and robots can work as partners across a range of team configurations: side-by-side, line-of-site remote, and far remote (over the horizon or even interplanetary distance).While improving both teleoperated and autonomous robot operations is important for space exploration, in this research project we focus on peer-to-peer interaction as a mechanism for facilitating human-robot coordination and teaming.
There are three primary components in our approach.First, we are developing a novel interaction framework called the "Human-Robot Interaction Operating System" (HRI/OS).The HRI/OS is based on the II.Peer-to-peer HRI Conventional human-robot interaction is limited to "master-slave" commanding (i.e., goal/task specification) and monitoring (e.g., of status information).More precisely, the interaction model is essentially one-way: the human "speaks" and the robot "listens" (perhaps asking for clarification).As a result, system performance is strictly bound to the operator's skill and the quality of the user interface.To improve system capability, increase flexibility, and create synergy, human-robot communication needs to be richer and occur in both directions.
Our approach is to develop an interaction model in which humans and robots communicate as peers.Specifically, we are building a dialogue system that allows robots to ask questions of the human when necessary (urgent) and appropriate (human at a work breakpoint), so that robots are able to obtain human assistance with cognition and perception tasks.Two key benefits of this system are that it: (1) allows humans and robots to communicate and coordinate their actions and (2) provides interaction support so that humans and robots can quickly respond and help the other (human or robot) resolve issues as they arise.
A key challenge is enabling robots to perform tasks on their own, but giving them the ability to ask for (and make use of) human expertise and assistance when necessary.Another challenge is enabling robots to understand task-oriented commands in the same way that human teammates do.For example, human work crews routinely use spatial references (e.g., "move that panel to my left") when performing work.

III. Human-Robot Interaction Operating System
In order for humans and robots to work effectively together, they need to be able to clearly converse about goals, abilities, plans and achievements. 5Such communication is especially required for humans and robots to jointly solve problems or when situations exceed autonomous capabilities.
To address this need, the HRI/OS provides a structured software framework and set of core interaction services for building human-robot teams.We have designed the HRI/OS to support a variety of multimodal and perceptual interfaces (including shared workspaces and handhelds) and to facilitate integration of third-party UI's and robots through an extensible API.

A. Related work
7][8][9] In particular, the HRI/OS provides a variety of infrastructure services (event/data distribution, delegation, etc.) to distributed and mixed teams.The HRI/OS differs from infrastructures because the "devices" (i.e., robots) have physical embodiment and can move and act in the real world.As a result, the HRI/OS includes dialogue support services (e.g., spatial perspective taking) not normally found in infrastructures.
The HRI/OS is also related to a number of recent HRI architectures, [10][11][12][13] all of which are designed to facilitate task performance by human-robot teams.The HRI/OS, however, differs from these architectures in three ways.First, it is designed to seamlessly support human-robot collaboration across multiple spatial ranges.Second, the HRI/OS includes a task executive, which provides loose coordination between humans and robots working in parallel.Finally, it allows robot control authority to pass between different users (i.e., no operator has exclusive "ownership" of a robot).This improves flexibility because which robot(s) supports (i.e., works with) which human(s), and vice versa, can vary dynamically based on the situation.

B. Teamwork Model
With the HRI/OS, humans and robots work on tasks independently of each other.Tasks are delegated by a task executive, which assigns work to agents (i.e., human or robot) it believes capable of satisfying the task, and which are not currently performing other work.Agents are expected to take care of their own planning, execution, and monitoring.Once that task has been assigned to an agent, that agent is responsible for its execution and eventual completion.If it encounters a problem, however, an agent will first try to resolve it through dialogue, before reporting failure.
When an agent makes a request for help, the request is delegated to another agent (human, robot, or software) that is capable of providing assistance.Satisfying a "help" request typically requires communication.A human, for example, may request a tool or a service that can be provided by a robot, while a robot might require a human's image processing or reasoning ability.The HRI/OS provides dialogue services such as text-to-speech, speech recognition, spatial relationship context resolution (as a step in natural language processing), and contextual data transport.Embodied agents can make use of these services to facilitate communication with each other, with the ultimate goal of resolving a problem with their own assigned tasks.
If a robot is interrupted, it suspends its task before addressing the reason for its interruption.In the case where a human has requested the assistance of the robot, the robot will provide the needed assistance and then resume its previous task.With our teamwork model, as with purely human interactions, only one human is able to interrupt the robot at a time.A human cannot, for example, interrupt a robot while it is already providing assistance to another human.Instead, the human must wait until the robot becomes available.
In our system, human-robot dialogue is coordinated with an interaction manager.A robot can specify particular user interface modalities (e.g. a graphical user interface or a speech interface) as part of its request for dialogue.In general, humans are selected solely based on their published domain(s) of expertise.Agents working on behalf of the human can display robot dialogue using whichever interface modality is most appropriate for the human's situation, taking into consideration the available display, the human's workload, etc.
In order to take the best possible advantage of the particular skills of humans and of robots, it is important that robots be able to reason about how to communicate with humans for maximum effect.][16]

C. Implementation
The HRI/OS is implemented as a collection of agents using the Open Agent Architecture (OAA). 17Embodied agents describe their skills at a coarse level, rather than with the detail typically used by robot planners.Software agents are designed to provide a single capability, rather than a large set of related capabilities.For example, an agent that tracks objects may provide pose information without handling coordinate frame transformations, which could be provided by other supporting agents.This design approach helps improve flexibility while reducing system brittleness.Figure 1 shows the primary components in the HRI/OS.Software and embodied agents communicate via OAA messages, which are delegated and routed via a central OAA facilitator.Direct, point-to-point communication (used primarily to transport non-text dialogue, such as images) is performed using the "ICE" object-oriented middleware. 18The following sections describe the major agents that make up the core of the HRI/OS.

Task Manager
The Task Manager (TM) is a task executive that coordinates execution of well-defined, operational tasks by one or more agents.The TM also provides for limited recovery in the face of unforeseen task failures.For example, if any given task fails, the TM automatically respawns another instance.The TM is written in the Task Description Language (TDL), an extension of C++ that allows for principled and managed task execution, coordination and management. 19

Resource Manager
The Resource Manager (RM) works with the underlying agent system to determine which agent is best suited for performing a given task or handling dialogue.As such, it receives all requests to be delegated and may reprioritize the list of agents that will be consulted to perform a task or answer a request.The RM is designed to consider numerous factors (including the relative positions of embodied agents, their workload, and statistics such as fuel level) in order to refine the decision about which agent should be assigned a task.
The RM also supports switching of robot control authority between system operation (execution of tasks assigned by the Task Manager) and interrupt servicing (i.e., temporary use of a robot).For example, a user can take temporary "ownership" of a robot (e.g., so that the user can teleoperate the robot for a specific use) by making a request to the RM.When the robot reaches a breakpoint, the RM will then grant "ownership" to the user.Multiple requests are pushed onto an interrupt queue, which allows multiple users to share "ownership" of a robot.

Interaction Manager
The Interaction Manager (IM) coordinates dialogue between humans and robots.Whenever an embodied agent needs to communicate with another agent, it contacts the IM, which works with the RM to generate a list of agents that are able to handle the communication request.Once it has this list, the IM informs the best-matching agent that it has an interaction request.
If this agent is a robot, then it immediately handles the request.If it is a human, however, he is notified that a message is waiting for him, and the IM waits for acknowledgment before passing along the robot's request.If the human does not respond in a reasonable amount of time, the IM iterates through the list of agents returned by the RM to see if another one is available.If none are, it then notifies the requesting agent that the request cannot be handled.
Whenever an agent responds to a request, it becomes the responsibility of the two parties to set up and handle the rest of the communication between them.The IM will only get involved further in the dialogue if the original requesting agent notifies the IM that the communication has failed.In this case, the IM will try to connect the agent to someone else in order to process its request.

Dialogue Agents
There are a number of supporting software agents in the HRI/OS that are used to enable and facilitate spoken dialogue.The text-to-speech agent (TTS) allows the robots' responses to humans to be verbalized for the humans to hear; the speech recognizer agent (SR) allows the humans to verbally respond to the robots; and the spatial reference agent (SRA) allows the robots to understand utterances such as "on your left" and "in front of me."

Robot Agents
Robot Agents (RA's) provide an interface between robot controllers and the HRI/OS.RA's are responsible for handling messages and requests from other agents, managing and executing their own atomic tasks, and enabling communication and interaction with others.The RA provides a programming interface for integrating robots into the HRI/OS.This interface includes support for registering robot capabilities with the Resource Manager, for broadcasting event and state information, for dispatching service requests and problem solving queries, and for processing query responses.

IV. Computational Cognitive Architectures
For a robot to work side by side with an astronaut, collaborating in a shared workspace, the robot must be able to do something that humans do naturally: understand how another person perceives space and the relative positions of objects around them -the ability to see things from another person's point of view.To give robots this ability, we are building computational cognitive models (CCMs) of certain high-level cognitive skills that humans possess and that are relevant to collaborative tasks.We then use these models as reasoning mechanisms for our robots.
Why do we propose using CCMs as opposed to more traditional programming paradigms for robots?We believe that by giving the robots similar representations and reasoning mechanisms to those used by humans, we will build robots that act in a way that is more compatible with humans.
In the P2P-HRI project, we are developing computational, cognitive, and linguistic models that can deal with spatial perspective-taking and frames of reference.Issues include dealing with constantly changing frames of reference, changes in spatial perspective, and maintaining common ground among team members.Perspective taking, in particular, is a critical cognitive ability for humans, particularly when they want to collaborate.

A. Spatial perspective in space
To determine just how important perspective and frames of reference were in collaborative tasks in shared space, we analyzed a series of tapes of two astronauts and a ground controller training in the Neutral Buoyancy Lab at NASA JSC for an assembly task for Space Station mission 9A.We performed a protocol analysis of several hours of these tapes focusing on the use of spatial language and commands from one person to another.We found that the astronauts changed their frame of reference approximately every other utterance.
As an example of how prevalent these changes in frame of reference are, consider this following utterance from ground control: ...if you come straight down from where you are, uh, and uh, kind of peek down under the rail on the nadir side, by your right hand, almost straight nadir, you should see the...
Here we see five changes in frame of reference (highlighted in italics) in a single sentence!These rates in the change of reference are consistent with work by Franklin, Tversky and Coon. 20In addition, we found that the astronauts had to take other perspectives, or forced others to take their perspective, about 25% of the time.Obviously, the ability to handle changing frames of reference and being able to understand spatial perspective will be a critical skill for robots such as NASA's Robonaut 21 and, we would argue, any other robotic system that needs to communicate with people in spatial contexts (i.e., any construction task, direction giving, etc.).

B. Models of perspective taking
Consider the following task, as illustrated in Figure 2.An astronaut and his robotic assistant are working together to assemble a structure in a shared space.The human, who can see one wrench, says to the robot, "Pass me the wrench."Meanwhile, from the robot's point of view, two identical wrenches are visible, while the human has a partially occluded view and can only see one wrench.What should the robot do?Evidence suggests that humans, in similar situations, will pass the wrench that they know the other human can see 22 since this is a jointly salient feature.5][16] Currently, we are building a model of perspective-taking that can handle the above scenario in a general sense.The approach uses the Java version of the ACT-R 23 cognitive architecture system, jACT-R, to model frames of reference and perspectivetaking.
In essence, whenever the model wishes to take the perspective of a person, it performs the equivalent of a mental simulation, virtually placing the robot in the human's position in order to reason about the human's perspective (e.g., what objects the human can see).This mental simulation is accomplished using the Stage component of the Player/Stage robot simulation system. 24pecifically, we model the current world in the Stage simulation using information obtained from the robot (e.g,.current pose) and other sensors (e.g., object trackers).When we need to understand the human's perspective, or resolve a frame of reference, we "imagine" a scene from an appropriate perspective and then resolve any ambiguous references, as shown in Figure 3.

A. HRI metrics
The development of human-robot systems needs to be based on proven theories and guidelines.Although metrics from other fields (human-computer interaction, human factors, etc.) can be applied to satisfy specific needs, HRI has characteristics (e.g.physical interaction by an embodied robot) that set it apart.Thus we need to develop metrics and evaluation procedures that are appropriate for HRI.
Our approach focuses on assessing our overall progress in improving the productivity of EVA with humanrobot teams.Thus, we have chosen metrics to evaluate task effectiveness, teamwork efficiency, and astronaut workload.As our research progresses, we intend to implement techniques for assessing situational awareness of both humans and robots.
To develop our evaluation method, we first defined a hierarchy of metrics and ground truth data to be collected (Table 1).For each of these metrics we then identified lower level measures that could potentially inform these metrics.For example, to assess teamwork, we can look at measures such as the number and types of problems that occur during a task and determine how these problems impact overall success.

Metric
Measured by

Effectiveness
Task success Teamwork efficiency Analysis of breakdowns, subjective questionnaire data, time measures Astronaut workload NASA TLX 25 Efficient teamwork measures include several workflow measures: • Number of problems (breakdowns) encountered during the task • The percentage of breakdowns detected by the robot and the percentage detected by the human • The level(s) of autonomy used to perform the tasks and a classification of events that necessitate changes in autonomy level.
Breakdowns are further divided into four categories: robot-task, robot-tool, human-robot, and robot-human.In the robot-tool category, we focus on the ability of the robot to use the tool correctly, assuming the tool is adequate for the task.In our current work, an appropriate tool is always used.In future work, however, when novel tasks are presented, this may not be the case.For the other three categories, we will examine lower level dialogue measures and system logs.Breakdown classifications are summarized in Table 2.We will use time measures to determine if there is any relationship between workload and efficient teamwork.The time measures that we will calculate include: time to complete the task, percentage of time the robot worked alone, percentage of time the human and the robot interact, percentage of time the human worked alone, and percentage of time the human is able to provide assistance remotely (e.g., from inside the habitat).
To examine breakdowns that result from communication issues, we will use a variety of dialogue measures.These include: number of dialogue turns, percentage of turns used by human and by robot, percentage of turns that contain content, percentage of turns needed for clarification (how, what, where, who, or when), percentage of inappropriate messages, and percentage of communications not handled in time.
By design, these lower-level measures can inform multiple higher-level metrics.

B. Methodology
During the P2P-HRI project, we plan to enact several use cases with our human-robot team.During each of these scenarios we will collect both video and audio data, log files from the system, body position of the astronauts, position of the robots, and position of the tools.In addition, the astronauts will be given the NASA TLX questionnaire to assess workload and a questionnaire to assess their opinion of the efficiency of the teamwork.Finally, observers will be present to make notes of any unexpected events or breakdowns that occur.The early scenarios that we enact will be scripted so we will be able to easily note deviations that occur.
We will then divide the overall scenario into smaller tasks.Example tasks might be a robot performing a weld task, a robot inspecting a weld, or an astronaut indicating a location for a robot to weld.We will calculate the measures per task and will also collect our higher level metrics of success, workflow questionnaire, and workload by task.Once we have calculated these measures, we will determine how the lower level measures correlate with our observations of breakdowns, the astronauts' ratings of workflow, and the NASA TLX scores for workload.We will do this over the range of primitive tasks.This should enable us to identify the best measures, i.e., those that correlate well and are lowest cost to collect.Dialogue measures, for example, are much more expensive to compute than time measures.Workflow measures can be obtained in real-time by observers, but necessitate several observers who have been trained in the coding scheme.Log files of the actions of the robots, including dialogue interactions, are inexpensive to calculate as they can be processed automatically.As we conduct evaluations with more scenarios we will follow the same procedure.That is, we will collect multiple measures to determine those that allow us to discriminate tasks that have successful human-robot interactions from those that do not compared to our ground truth measures.

C. Initial use case
During Fall 2005, we will study a use case that centers on seam welding and inspection by a human-robot team.7][28][29] For example, linear welds might be used to construct pressure vessels, work hangers, and emergency shelters too large to raise into space in one piece.In this study, three humans (two in EVA and one in a habitat) will work with two robots (Figure 4), the ARC K-10 and JSC Robonaut-B, to weld * panels to a truss.The humans will act as master welders and provide initial panel mounts (e.g, spot welds).The robots will perform two types of tasks.Robonaut-B will work as a junior welder and seam weld once the panels are placed.K-10 will work as a seam inspector and inspect the quality of the welds.Humans and robots will work in parallel, supporting each other as necessary.
This work scenario provides numerous opportunities for dynamic and flexible human-robot interaction.For example, a variety of communication acts are useful: human generated commands, questions from the robots to the human, etc.Additionally, the human may remotely interact with the robots (e.g., he may deploy the inspection robots via a control room inside a habitat) as well as work side-by-side with others (e.g., leading welders to a new site and showing them where to weld).

VI. Conclusion
We believe that peer-to-peer HRI will enable more effective and productive human-robot teams for space exploration.In particular, we believe that tools such as the HRI/OS and computational cognitive models will enable humans and robots to work efficiently and effectively together, regardless of spatial distribution, communication channel, and user interface.
In more general terms, we expect that peer-to-peer HRI will be appropriate whenever humans and intelligent systems must collaborate in order to execute complex tasks such as inspection and maintenance.It is also probable that in low-bandwidth situations and scenarios (e.g., lunar robots operated from terrestrial ground control) that the interaction technology developed by P2P-HRI will help reduce data transmission demands.
Thus, during the next few years, our goal will be to apply peer-to-peer HRI directly to a wide variety of mission systems, including the Crew Exploration Vehicle (CEV) and lunar landers.

Figure 1 .
Figure 1.The Human-Robot Interaction Operating System (HRI/OS) is an agent-based system.

Figure 2 .
Figure 2. Perspective taking example.The astronaut can only see one wrench.The robot can see both wrenches.The astronaut asks the robot to "Pass me the wrench".

Figure 3 .
Figure 3. Computational cognitive model for perspective taking and spatial referencing.

Figure 4 .
Figure 4. Left, the K-10 rover (NASA Ames) is a low-cost rover designed to operate at human interaction speeds; right, Robonaut-B (NASA JSC) is a mobile manipulation system attached to a Segway RMP differential-drive base.

Table 1 .
Top level metrics