Converse Task-Oriented Dialogue System Simplifies Chatbot Building, Handles Complex Tasks

20 min read

Tian Xie

Xinyi Yang

Donald Rose

AUTHORS: Tian Xie, Xinyi Yang, Angela Lin, Donald Rose

Introduction and Background

Creating a system capable of conducting a meaningful conversation with a human and helping them accomplish tasks is one of the ultimate goals of Artificial Intelligence (AI), and has been since AI’s beginnings. Meanwhile, as real conversational AI research has progressed, science fiction has built up popular expectations of what an intelligent chatbot can do for us. Building a powerful chatbot that can help people with highly complex tasks through natural and fluent conversations (like J.A.R.V.I.S. in the Marvel films, or HAL 9000 in the movie 2001) has been a dream of many researchers for decades.

A great deal has been accomplished in this area recently, with voice assistant products entering our daily lives and chatbots becoming commonplace in customer service. Task-oriented dialogue systems use conversation with users to help complete tasks, and these systems (often referred to as “chatbots”) now assist us in many of our regular activities. For example, task-oriented dialogue systems can help you reserve a table at restaurant, book flight tickets, check vaccination appointment availability, or check your order status.

Problem: Chatbots Often Frustrate Users and Developers

Despite their many benefits and successful applications, chatbots often cause us great frustration. If you need something urgently, and a chatbot gets stuck in an infinite loop or misunderstands what you are talking about, you will immediately want to bypass the chatbot and connect to a human agent — which defeats the purpose of having the chatbot in the first place.

Science fiction visions of perfect AI voice assistants are all well and good, and will come in time, but the reality is that today’s existing chatbots usually can handle only relatively simple tasks. In addition, chatbot developers often find it difficult to build a powerful chatbot that can deal with multiple complex tasks.

Solution: Create Smart Chatbots with Converse

In an effort to address the challenges faced by people who interact with chatbots — as well as by chatbot developers -- we created Converse, a flexible and modular task-oriented dialogue system that bot builders can use to easily create smart bots that help users complete tasks.

To complete a meaningful task:

Converse figures out the task that the user wants help with.
The system collects information from the user and processes it.
The system then provides useful information directly to the user.

Bot builders can quickly design tasks in Converse using our interactive configuration tool and a bit of code. For each task, bot builders only need to specify:

Examples of how a user will ask for the task
- For instance, “I want to see a doctor” or “Can I see a doctor?”
Types of information to collect from the user
- Examples: appointment date or medical department
Actions Converse will execute after the info is collected
- Examples: look up appointment availability or book appointment

Task-oriented vs. Chit-chat Dialogue Systems

Task-oriented dialogue (TOD) agents, like Converse, use conversations with users to help complete tasks; that is the goal.

By contrast, chit-chat (non-task-oriented) dialogue systems are designed for extended conversations, to mimic the unstructured conversations or ‘chats’ characteristic of human-human interaction, often for entertainment purposes.

The overall purposes of TOD systems and non-TOD systems are different. People use TOD systems to get help finishing certain well-defined tasks, so the TOD systems are expected to accelerate the process, cut costs, and get the task done in an efficient way. People usually interact with non-TOD system to make conversation for fun — without a specific end goal in mind -- so non-TOD systems are usually expected to engage in more human-like, natural conversation.

In short, TOD systems like Converse should be:

expected to help us get tasks done fast, by engaging in focused conversations designed to accomplish user goals
not expected to carry on extended general conversations about any topic.

What Makes Converse Unique and Powerful

Simplifies Bot Building While Handling Complex Tasks

Other chatbot building tools require bot builders to script every step of the conversation, requiring a great deal of effort to handle the meandering, dynamic nature of human conversation — for example, changing topics and adapting when the user provides incorrect information.

In contrast, Converse aims to simplify the bot building process while handling complex tasks. We observed that task-oriented dialogues have a common structure: collect information from the user and then provide them with information they want. Thus, we tried to remove the redundant parts that are shared between tasks from the bot configuration, such as switching between tasks and repeating tasks.

In Converse, bot builders define the “happy path” of the conversation — the ideal conversational flow if the user answers the bot’s questions correctly. Converse provides the “guardrails” to steer the conversation towards completing tasks, handling the messiness of real conversations for bot builders.

Low-Code - or Even No-Code

Converse is a low-code system, or even no-code in some scenarios. For example, you can in some cases create a chatbot without any programming at all. In other cases, around 10~20% of the bot building effort may involve programming, depending on how complex the tasks are.

Examples of Converse’s Capabilities

Before we dive into the technical details of Converse, let’s look at a few examples of what the system can do.

Improving customer experience by reusing sub-tasks

Let's begin with an online shopping assistant bot that can help users check their order status and update their order. For security, the bot will verify the user's identity first before starting these two tasks.

User: Hi, I would like to check my order status.
Bot: Oh sure, I'd be happy to help you check your order status. First, I need to pull up your account. Could you please tell me your email address?

Checking the order status is the main task, which has a sub-task to verify the user's identity. We also use this sub-task in the task for updating the order. However, bot builders only need to define a sub-task once. Then, the sub-task can be reused in other tasks.

Reusing sub-tasks reduces the amount of configurations that the bot builder must write. In addition, reusing sub-tasks allows Converse to skip the sub-task if it has already been completed, which improves the user experience. For example, the chatbot would verify the user's identity only once per session: if verifying the user's identity is completed when the user checks the order status, the bot won't verify the user's identity again when they wish to update the order.

Handling branching conversations with Task Trees

Tasks can be very complex. Completing a task may not be as simple as filling the slots one by one. There might be multiple ways to complete a task under different conditions. For example, if we want to authenticate the user, we can verify the user’s identity by sending a push to the user’s authenticator app, or sending the user a verification email, or verifying the user’s personal information such as their name and birthday. Building a chatbot that can handle tasks with complex conditions and branches can be a headache for many chatbot developers.

To handle branching conversations, Converse uses an and-or tree structure to represent and track the tasks -- a unique design. We call the and-or tree structure in Converse the Task Tree. With the Task Tree, we can convert almost every task into a combination of and/or relations.

Let’s look at the following conversation (a continuation of the earlier example), in which the bot adjusts its response based on whether the user provides the correct email address:

Bot: Oh sure, I'd be happy to help you check your order status. First, I need to pull up your account. Could you please tell me your email address?

Trajectory 1	Trajectory 2
User: [the correct email address] *Bot*: I have verified your identity. Please provide your order id to obtain your order status.	*User: I don't remember it. Bot: I am sorry, but I could not recognize your email address. Could you please tell me your zip code? User: [the correct zip code] Bot*: I have verified your identity. Please provide your order id to obtain your order status.

Note that the bot builders only need to create the and-or relations (in this case, verify email address or zip code), instead of specifying every possible linear conversational path. The Task Tree in Converse is able to handle more nested and complex structures than in the above example, as long as they can be represented by and-or relationships.

Handling task interrupts (switching tasks during a task)

Here's an example of a flight-booking bot that illustrates Converse's multi-task conversation handling:

User: I'd like to book a round-trip flight.
Bot: I'd be happy to help you book a flight. Where will you depart from?
User: Oh wait, could you please help me check tomorrow's weather first?
Bot: I'd be happy to help you check the weather. What is the zip code of your area?

The user started a flight-booking task, but after the bot asked about the departure location, the user changed their mind and started a new weather query task. The bot successfully detected the new intent, switched to the new task (weather query), and asked for the zip code.

User: [a zip code].
Bot: The weather in [city name of the zip code] is [some weather]. That's all I have about the weather. Where will you depart from?

The bot helped the user get a weather report, then picked up where it left off in the previous conversation; it successfully switched back to the original flight-booking task and asked the user about the origin location again.

Configuring responses to frequently asked questions

Converse can not only switch between tasks, but also answer frequently asked questions (FAQs). The following conversation continues from the previous example:

User: San Francisco.
Bot: Got it. Where is your destination?
User: Do I have free checked bags?
Bot: All frequent flyer program members will have one free checked bag. Where is your destination?

Converse defines FAQs as one-turn dialogues that can happen anywhere in a conversation. FAQs can be easily configured in the task configuration by listing the question and answer pairs, such as “Do I have free checked bags? - All frequent flyer program members will have one free checked bag.” In our example, the bot detects the question from the conversation and fetches the answer from its FAQ configuration, then continues the flight booking task.

Deep Dive: System Architecture

We constructed Converse with a design typical of modern task-oriented dialogue systems:

The Orchestrator module acts as the “receptionist” of Converse. It handles communications between the user and the system, and also coordinates with other modules.
The core of Converse is the Dialogue Management (DM) module, which is connected to a unique component in Converse called the Dialogue Tree Manager.
The Dialogue Tree Manager manages the Task Tree, and tree-related operations.

Main System Components

Natural Language Understanding (NLU)

Natural Language Understanding (NLU) focuses on understanding language input from users. In Converse, NLU consists of:

two deep learning models (the intent detection model and the named-entity recognition model)
one rule-based model (the negation detection model), and
rule-based intent resolution logic.

The intent detection model helps Converse understand which task the user wants to do (for example, checking the order status or adding more items to the cart). Converse needs to understand the user’s intent to decide which task to complete. Bot builders only need to provide a couple of example sentences of intents for each task to start using their bots, making it easy to create new bots. In Converse, the intent model is based on Natural Language Inference (NLI). Bot builders don’t need to retrain the intent model when they add new tasks to an existing bot.

The named-entity recognition model extracts relevant information from the user’s response to complete the current task. This information could be, for example, the number of pizzas a user wants to order, the delivery address, or the delivery time for the pizza order. Users may say this information in different ways; for example, the delivery time can be expressed as “6 pm today”, “tomorrow at 2”, or “01/08/2022 22:00”. The named-entity recognition model extracts this information from the user’s response despite variation in expression, and standardizes it to make it easier to digest by the rest of Converse (for example, converting all dates to the same format). We trained the named entity-recognition model, therefore bot builders can use it directly without adding data. Besides the named-entity recognition model, Converse also supports other entity extraction methods, like regular expressions and picklists. A picklist contains selectable options for an entity. For example, if we have a flight booking bot, we may define a fare_class entity, and let users choose from the picklist: Economy, Premium Economy, Business.

The negation detection model determines whether negation words (also called negation cues) exist, and then finds the corresponding negation scopes. Negation words are words or phrases that have negative meaning. Negation scopes are the words or phrases whose meaning is inverted by the negation words. For example, in the sentence X does not Y, not is the negation word, and Y is the negation scope.

It is fairly common to have an intent detection model that treats “X does Y” and “X does not Y” as having similar meanings. To prevent this, the NLU module uses a rule-based intent resolution logic to combine the results from the intent detection model and the negation detection model to decide the final intent. The basic idea of the resolution logic is that, if the negation detection model finds negation, and the intent model treats “X does not Y” as “X does Y”, then the bot can use the negation model’s result to fix the intent detection results and output None as the final intent, instead of the intent detected by the intent detection model.

In the diagram above, NLU classifies “I want to check my order status” as the “check order” intent (the final intent after going through the rule-based intent resolution logic) and found no entities.

Dialogue Management (DM)

Dialogue Management updates Converse’s memory based on new information from the user and decides how to respond to the user. In all conversations, context is essential for interpreting the other person’s response. Dialogue Management stores the conversation context as the dialogue state, runs external function calls to the knowledge base or backend based on the dialogue state, then updates the dialogue state based on new information from the user using the Dialogue State Manager. Using the updated dialogue state and a set of rules based on the dialogue state, the Dialogue Policy decides the next action and the type of response to generate.

Dialogue Tree Manager and the Task Tree

Based on the new information provided by the Dialogue State Manager, the Dialogue Tree Manager traverses the Task Tree, checking if the new information matches the current entity and moving on to the next entity and task that has not been completed. This process is similar to checking off an item on your to-do list and checking what task to do next.

The and-or tree is a classic concept in AI — a structure for complex problem solving — and we can use the and-or tree structure to represent tasks in a task-oriented dialogue system. With the and-or tree, a problem can be represented as a combination of several subproblems, so we can break down a complex problem into easier subproblems and maintain the original problem’s search space in a tree structure.

Completing a task in Converse is equivalent to traversing the Task Tree. We have a generic way to traverse the Task Tree, therefore we have a generic way to solve all tasks defined in Converse.

The And-Or Task Tree keeps track of the information that the user has provided and information still needed to complete the current task. The Task Tree is a representation of a logical expression that evaluates as true if the task is completed successfully and false otherwise. The tree includes logical expression nodes (And nodes / Or nodes) and entity leaf nodes.

The logical expression nodes include And nodes and Or nodes that combine the results of their child nodes. For And nodes, all child nodes must be true for the node to be true. For Or nodes, at least one child node must be true for the node to be true.
The entity nodes represent the information Converse collects from the user, such as their phone number and address. An entity node also contains an operation type that can instruct the Dialogue State Manager on how to process the entity information, such as verifying the entities provided by the user in the database. After the entity information being processed by the Dialogue State Manager, the Dialogue Tree Manager will update the Task Tree by storing the processed information in the entity nodes and evaluating the logical expression nodes.

The figure below shows a simple example of a Task Tree:

There is one task, Make an appointment, made up of two subtasks (Verify user identity and Appointment details)
- Make an appointment is an And node: both of its child nodes must be completed to complete the task.
The numbers show the order that the tree manager visits nodes during tree traversal while completing this task.
Verify user identity is an Or node: when one of its child nodes (Verify birthday) was completed successfully, the other child node (Verify zip code) can be skipped.
The Appointment details node is an And node: none of its child nodes can be skipped.
The current entity is time; once Converse gets the info from the user, the Make an appointment task will be finished.

Natural Language Generation (NLG)

We use a simple template-based Natural Language Generation (NLG) module to generate natural language responses. NLG fills in simple templates with information from the Dialogue Policy to generate responses.

For example, “What is your <Info>?” is a template for asking the user for information. NLG replaces <Info> with the current entity provided by the Dialogue Policy, such as “What is your name?”.

In many existing chatbot frameworks, if you want to configure a new chatbot, you need to create the response templates for each task. Converse is different. There are two kinds of responses: general responses for all tasks, and task-specific responses. You only need to add the necessary task-specific responses for each task, and use the general responses for all tasks. You can even use just the general responses we provide, so you don’t need to add any responses when you create a new chatbot. These templates can be changed in configuration files without writing code.

Benefits of Converse’s Modular/Flexible Design

Converse follows a pipeline design, as shown in the system diagram. Each component is independent from the others.

For example, Converse uses independent modules for intent and entity detection. If a user doesn’t like the intent and entity detection model we provide, they can simply replace them without affecting other parts of the system. (The modules for intent and entity detection detect the intent and extract common entity information — address, number, name, etc. -- from the user’s utterance. For example, for an input “tell me about the weather in SF“, the intent module takes the input and compares it with tasks the admin defined for our system and outputs the most likely one — or None; the entity detection module outputs “SF, city_name”.)

Besides entity and intent detection, other parts of the system are also designed to be as independent as possible. For example, Dialogue Management is managed by several modules. Some extract info from the current user’s utterance, some track the progress of the current task, and some decide what responses the system should give.

Some of the benefits and advantages of Converse’s flexible and modular design:

Converse’s design makes it an easy-to-maintain and easy-to-develop system.
Non-professional developers can make their own chatbots without creating/editing too many configurations.
- Converse has many default settings and response templates, so less-experienced developers can create a new bot just by changing a few configurations.
Professional chatbot developers can just customize part of the system to satisfy their own use cases.
- Many components in Converse (like the NLU models, the response generator, etc.) can be replaced by professional developers’ own modules.
If a better model/technology is developed for a module (e.g., a new advance in NLP research), that module can be improved independently, by replacing or revising it, without affecting other parts of the system.

More Examples of Converse Chatbots

Let's look at two more examples of Converse chatbots, to give you a more detailed "peek under the hood" of how Converse works and illustrate the chatbots' internal states.

Our first detailed example is for the task of making health appointments. The figure below shows the Task Tree on the left, and a typical dialogue between user and chatbot on the right.

Below is another detailed example. This time, the main task is booking a flight. The chatbot's internal states are shown in the middle green section, as it collects relevant user information and flight choices are confirmed.

To learn more about how to build a bot using Converse, please check out our GitHub repo.

Future Work: Expanding Converse’s Abilities

While we have been striving to make Converse a powerful framework, the system does have a few limitations, which could provide fruitful avenues for future research. Examples of areas where we hope to expand Converse's abilities include:

Multilingual support

Currently, Converse only supports English. However, you can add your own models and response templates to support other languages.

Canceling an ongoing task

Converse doesn’t support canceling an ongoing task. Here is an example conversation:

Bot: Hi there, I am the digital assistant for Northern Trail Information Center. What can I do for you?
User: I want to check order status
Bot: Oh sure, I'd be happy to help you check your order status. First, I need to pull up your account. What is your email address?
User: I don't want to check order status anymore
Bot: I am sorry, but I could not recognize your email address. What is your zip code?

In this example, the user changes their mind, and doesn’t want to continue the “check order status” task anymore. However, the bot is still trying to extract entity (slot) information from the user.

Discriminate between entities that have the same entity type

Here's an example of a flight booking conversation that has multiple entities with the same entity type mentioned in one user's utterance.

User: I'd like to book a round trip flight from San Francisco to Los Angeles for 2 people.
Bot: I'd be happy to help you book a flight, I got multiple possible answers for origin: San Francisco and Los Angeles, which one did you mean? Could you walk me through the details?

In the above conversation, the user already specified the origin and the destination. Unfortunately, Converse detects entity based on type instead of name; it treats San Francisco and Los Angeles equally as location entities and fits them both into the first required location entity (origin).

Expand Commonsense Reasoning

Converse can handle only a limited amount of commonsense reasoning. In some cases, this may cause the bot to incorrectly process user input. Here’s an example:

Bot: And here are your returning flights:
Oceanic 443, Depart at 4:16am, 800USD
Ajira 232, Depart at 15:42pm, 1500USD
Qantas 424, Depart at 23:08pm, 2300USD
User: the late night one
Bot: Alright, your returning flight is Oceanic 443.

In this case, the bot is not able to relate “late night” to “23:08pm”, even though this is very easy for humans. However, the bot is able to understand “the last one” or “Qantas 424” based on the rules we have implemented.

Handling unexpected conversation changes (widening the scope of user dialogue)

In Converse, the bot always leads the conversations. Sometimes, users may not follow the bot, and may provide something not expected by the bot. Here’s an example:

Bot: Hi there, I am the digital assistant for Northern Trail Information Center. What can I do for you?
User: I want to check order status
Bot: Oh sure, I'd be happy to help you check your order status. First, I need to pull up your account. What is your email address?
User: Can I use my name to verify my identity?
Bot: Oh sure, I'd be happy to help you check your order status. First, I need to pull up your account. What is your email address?

Converse doesn’t allow users to lead the conversations, like other existing task-oriented dialogue system frameworks, hence the conversations may be not as natural as human conversations when users don’t follow the bot.

Talk to users about open-ended topics

Converse is not an open-domain dialogue system. The conversations are restricted by the defined tasks.

However, bot builders can define FAQs to make Converse able to handle chitchat. Since Converse does not have a model-based Natural Language Generation (NLG) module, the system's ability to chitchat is limited by the FAQs defined by bot builders.

The Bottom Line

Converse is a flexible, modular task-oriented dialogue system that bot builders can use to easily create smart bots that help users complete tasks.
Other bot building tools require bot builders to script every step of the conversation, requiring a great deal of effort to handle the dynamic nature of human conversation. In contrast, Converse aims to simplify the bot building process while handling complex tasks. Our approach is designed to remove redundant parts shared between tasks from the bot configuration, such as switching between tasks and repeating tasks.
Converse is a low-code system, or even no-code in some scenarios.
When conversing with the bot, the user must (usually) follow the conversation, not lead with new unexpected questions. However, there are some exceptions, such as when one wants to know the weather. The user can interrupt the conversation flow to ask about the weather (or other concepts Converse knows about), and the system will go back to the original flow (where it left off in the dialogue) after the weather query is answered.

Explore More

Salesforce AI Research invites you to dive deeper into the concepts discussed in this blog post (links below). Connect with us on social media and our website to get regular updates on this and other research projects.

Deep Dive: Read more about Converse in our research paper
Code: Check out the Converse Github page
Feedback? Questions? Email Tian Xie at txie@salesforce.com
Follow us on Twitter: @SFResearch
Learn more about all of the projects we’re working on at our main site: https://www.SalesforceAIResearch.com

Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. TinyBERT: Distilling BERT for natural language understanding. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4163–4174, 2020. https://arxiv.org/abs/1909.10351
Daniel Jurafsky and James H. Martin. Chapter 24: Chatbots & dialogue systems. In Speech and Language Processing, 2021. https://web.stanford.edu/~jurafsky/slp3/
Alexander Miller, Will Feng, Dhruv Batra, Antoine Bordes, Adam Fisch, Jiasen Lu, Devi Parikh, and Jason Weston. ParlAI: A Dialog Research Software Platform. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 79–84, Copenhagen, Denmark, September 2017. Association for Computational Linguistics. doi: 10.18653/v1/D17-2014. https://aclanthology.org/D17-2014
Alexandros Papangelis, Mahdi Namazifar, Chandra Khatri, Yi-Chia Wang, Piero Molino, and Gökhan Tür. Plato dialogue system: A flexible conversational AI research platform. CoRR, abs/2001.06463, 2020. URL https://arxiv.org/abs/2001.06463.
Stefan Ultes, Lina M. Rojas-Barahona, Pei-Hao Su, David Vandyke, Dongho Kim, Iñigo Casanueva, Paweł Budzianowski, Nikola Mrkši ́c, Tsung-Hsien Wen, Milica Gaši ́c, and Steve Young. PyDial: A Multi-domain Statistical Dialogue System Toolkit. In Proceedings of ACL 2017, System Demonstrations, pages 73–78, Vancouver, Canada, July 2017. Association for Computational Linguistics. https://aclanthology.org/P17-4013
James Paul White. UWashington: Negation resolution using machine learning methods. In *SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pages 335–339, 2012. https://aclanthology.org/S12-1044
Chien-Sheng Wu, Andrea Madotto, Ehsan Hosseini-Asl, Caiming Xiong, Richard Socher, and Pascale Fung. Transferable multi-domain state generator for task-oriented dialogue systems. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2019. https://arxiv.org/abs/1905.08743
Caiming Xiong, Nishant Shukla, Wenlong Xiong, Song-Chun Zhu, "Robot learning with a spatial, temporal, and causal and-or graph," 2016 IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 2144-2151, doi: 10.1109/ICRA.2016.7487364. https://ieeexplore.ieee.org/document/7487364
Jianguo Zhang, Kazuma Hashimoto, Wenhao Liu, Chien-Sheng Wu, Yao Wan, Philip Yu, Richard Socher, and Caiming Xiong. Discriminative nearest neighbor few-shot intent detection by transferring natural language inference. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5064–5082, 2020. https://arxiv.org/abs/2010.13009
Zheng Zhang, Ryuichi Takanobu, Qi Zhu, MinLie Huang, and XiaoYan Zhu. Recent advances and challenges in task-oriented dialog systems. pages 1–17. Springer, 2020. https://arxiv.org/abs/2003.07490

About the Authors

Tian Xie is a Senior Research Engineer at Salesforce AI Research. He works on building intelligent task-oriented dialogue systems, and making the bot-building process easier and smarter.

Xinyi Yang is a Senior Research Engineer at Salesforce AI Research. She works on conversational AI research and applications.

Angela Lin is a Senior Research Engineer at Salesforce AI Research. She works on machine learning and natural language processing.

Donald Rose is a Technical Writer at Salesforce AI Research. He works on writing and editing blog posts, video scripts, media/PR material, and other content, as well as helping researchers transform their work into publications geared towards a wider (less technical) audience.