Learning Task-Oriented Dialog with Neural Network Methods

2018-10-22T20:53:47Z (GMT) by bing liu
Dialog system is class of intelligent system that interacts with human via natural language<br>interfaces with a coherent structure. Based on the nature of the conversation, dialog systems<br>are generally divided into two sub-classes, task-oriented dialog systems that are created<br>to solve specific problems, and chit-chat systems that are designed for casual chat and<br>entertainment. This thesis focuses on task-oriented dialog systems.<br>Conventional systems for task-oriented dialog are highly handcrafted, usually built with<br>complex logic and rules. These systems typically consist of a pipeline of separately developed<br>components for spoken language understanding, dialog state tracking, dialog policy,<br>and response generation. Despite the recent progress in spoken language processing and dialog<br>learning, there are still a variety of major challenges with current systems. Firstly, the<br>handcrafted modules designed with domain specific rules inherently make it hard to extend<br>an existing system to new domains. Moreover, modules in current system are interdependent<br>in the processing pipeline. Updating an upper-stream module may change its output<br>distribution which can make other down-stream modules sub-optimal. Last but not least,<br>current systems are mostly configured and trained offline. They lack the flexibility to learn<br>continuously via interaction with users.<br>In this thesis, we address the limitations of the conventional systems and propose a datadriven<br>dialog learning framework. We design a neural network based dialog system that<br>can robustly track dialog state, interface with knowledge bases, and incorporate structured query results into system responses to successfully complete task-oriented dialogs. The system<br>can be optimized end-to-end with error signals backpropagating from system output to<br>raw natural language system input. In learning such system, we propose imitation and reinforcement<br>learning based methods for hybrid offline training and online interactive learning<br>with human-in-the-loop. The system is enabled to continuously improve itself through the<br>interaction with users. In addition, we address several practical concerns with interactive<br>dialog learning. In addressing the impact of inconsistent user ratings (i.e. the rewards) for<br>dialog policy optimization, we propose an adversarial learning method which can be used<br>to effectively estimate the reward for a dialog. In addressing the sample efficiency issue<br>in online interactive learning with users, we propose a method by integrating the learning<br>experience from real and imagined interactions to improve the dialog learning efficiency. We<br>perform the system evaluation in both simulated environments and real user evaluation settings.<br>Empirical results show that our proposed system can robustly track dialog state over<br>multiple dialog turns and produce reasonable system responses. The proposed interactive<br>learning methods also lead to promising improvement on task success rate and human user<br>ratings. <br>