Lead Author: Xi Ye
TL;DR: We propose RnG-KBQA, a Rank-and-Generate Approach for Question Answering over Knowledge Bases, which enables answering natural language questions over large-scale knowledge bases. Our approach is capable of answering questions about topics never seen in the training data, which makes it generalizable to a broad range of domains. RnG-KBQA exhibits strong zero-shot and compositional generalization capabilities, setting new SoTA (state of the art) on most widely used KBQA benchmarks.
Question Answering over Knowledge Bases, or KBQA, is a user-friendly way to interact with large-scale knowledge bases. A typical knowledge base (KB) contains information structured as a set of nodes (where various names, titles, or other entities are stored) connected by links (the relationships among those nodes).
For example, a KB node where the city name “Italy” is stored might have a link for “Capital” and that link would point to a node containing “Rome” - and a link for “Population” might point from “Rome” to the number “2.88 million.” If a user asks, “What is the population of the capital of Italy,” the Question Answering (QA) process might start at the node for “Italy”, then follow the “Capital” link to the node “Rome” and then follow Rome’s “Population” link to the stored population number, “2.88 million” - the answer to the question. In other words, the KBQA process starts by matching part of the question to a node in the KB, then proceeds to search the KB’s knowledge graph (the stored network of nodes and links) until the answer is found.
One concept used in KBQA is the KB schema item, which refers to nodes (entities) and links to other nodes (relations between entities or properties of entities). In the example above, Italy and Rome are entities, Capital is a relation between them, and Population is a property of Rome whose value is 2.88 million. Note that the term only refers to things stored in the KB, so a question is not a KB schema item.
What happens when users want to ask questions about topics that were never seen in a KBQA system’s training data?
Fortunately, KBQA systems can answer questions correctly on topics unseen during training thanks to their generalization power. Generalization involves making the system’s original knowledge go further – applying that knowledge to new questions it may never have encountered before, and answering them by, for example, composing or combining existing KB items in novel ways. This helps to expand the space of answerable questions.
In other words, if a KB system that has undergone training can apply its existing knowledge (set of facts) to answer never-before-seen questions, then that system has, to some degree, the ability to generalize – to go beyond its original knowledge and offer answers over a larger space of potential questions.
For a KBQA system to be truly useful, we would not want its QA ability to be limited just to what it saw during training. First, training sets are always limited to some degree, and cannot cover every single example that might be relevant for answering future questions. Second, there may be many more facts “hidden” in the KB, which just need to be extracted (during the question answering process) with a generalizable approach.
Hence, systems that can generalize are more useful because they can take previous experience (existing knowledge) and apply that to solve problems in new situations (answer new user questions that the system never encountered before). Ultimately, being able to answer never-before-seen questions, through generalization, means the system can cover a larger percentage of the space of all possible answers stored inside the KB – which helps users answer more questions.
Unfortunately, the ability to generalize remains a significant challenge in the KBQA domain. While there have been attempts to solve this generalization problem in KBQA, existing methods have limitations:
At this point, you may be wondering: why can’t we combine the best aspects of ranking-based and generation-based methods into a new approach, designed to reduce or eliminate the aforementioned limitations? The answer is: we can!
To address the limitations of other KBQA systems, and explore techniques for improving the process, we developed RnG-KBQA, a novel framework targeted at generalization problems in the task of Question Answering over Knowledge Bases.
Our approach’s “secret sauce” is combining a ranker with a generator (hence, RnG) when performing KBQA, which addresses the coverage issue in ranking-only approaches while still benefiting from their generalization power.
The result: RnG-KBQA can answer questions related to a broader range of topics than previous strong approaches.
To illustrate how our approach works, let’s look at a detailed example:
Figure 1: Overview of our rank-and-generate (RnG) approach. Given a question, we rank logical form candidates obtained by searching over the KB based on predefined rules. Here, the ground truth logical form is not in the top-ranked candidates, as it is not covered by the rules. We solve this coverage problem using a generation step that produces the correct logical form based on top-ranked candidates. The final logical form is executed over the KB to yield the answer.
As shown in Figure 1, our method uses three main steps to form the best answer to a question:
The core idea of our approach is the interplay between the ranker and the generator:
Figure 2: The ranker that learns from the contrast between the ground truth and negative candidates.
Our ranker is a contrastive ranker that learns to score each logical form candidate by maximizing the similarity between the question and the ground truth logical form while minimizing the similarities between the question and the negative logical forms.
Specifically, we use a BERT-based encoder that takes as input the concatenation of the question and the logical form and outputs a score representing the similarity between them.
The ranker is then optimized to promote the ground truth logical form while penalizing the negative ones via a contrastive objective. Thanks to such an objective, our ranker is more effective in distinguishing the correct logical forms from spurious ones (similar but not equal to the ground truth ones) as opposed to parser-based models used in prior work [2,3] that only leverage supervision from the ground truth.
Figure 3: The generation model conditioned on the question and top-ranked candidates returned by the ranker.
Our generator is a T5-based  seq-to-seq model that consumes (takes as input) the output of the ranker and makes the final prediction.
Its role is to fuse semantic and structural ingredients found in the top-k candidates to compose the final logical form.
To achieve this, we feed the generator with the question followed by a linearized sequence of the top-k candidates. The generator then distills a refined logical form that will fully reflect the question intent by complementing the missing pieces or discarding the irrelevant parts without having to learn the low-level dynamics.
Our model's generalization power comes from two key components:
Let’s take a closer look at generalization in our approach. Figure 4 below shows two types of generalization that our system can perform to answer questions: compositional generalization and zero-shot generalization.
Figure 4: Two examples of generalization in RnG-KBQA – compositional and zero-shot.
With compositional generalization, the idea is to probe the models’ meta-ability to rank and generate unseen compositions of facts in the knowledge base required to successfully answer novel questions. In other words: if Fact-A and Fact-B are in the KB, but their composition has not been seen in the training data, can the KBQA system still successfully reason about this novel composition A + B?
EXAMPLE: If Steven's age is 70 (Fact A), and Steven directed Jaws (Fact B), then the answer to "What's the age of the person who directed Jaws?" is 70.
In this case, the system would have seen the functions age() and director() in training – but this particular combination is something new. This would be regarded as a compositional generalization: the individual relations were seen before, however the specific combination of those individual relations was not seen in the training data.
In the zero-shot case, generalization would involve working with new relations that the model may have in its KB but were completely unseen during training.
For instance, in the Figure 4 example, although our model has not seen the relation tv.tv_song as a whole as part of its training data, it still has knowledge of primitives like tv, song, composition, lyricist – both from pre-training (BERT, T5 models are pre-trained models) and also from potentially similar concepts encountered during training. In other words, thanks to pre-training, the model already knows about certain primitive concepts before training begins.
So, tv.tv_song already exists in the KB, and even though this relation was not seen during training, our model is able to:
Step 2 and Step 3 are thanks to our model's generalization power, which is partly due to the underlying model’s knowledge about primitives like tv and song.
Two quick examples to show the benefits of our generation approach:
As suggested by the first example below, the generation model can remedy some missing operations (like adding in ARGMIN) not supported during enumeration of candidates.
In the second example below, the generator is capable of patching the top-ranked candidate with implicit constraints: the (JOIN topic.notable_types college) in this example is not explicitly stated, but our generator is able to add this constraint.
We mainly evaluate our approach on GrailQA that focuses on judging generalization capability. Overall, our approach sets the new state-of-the-art performance and ranks #1 on GrailQA leaderboard, achieving a 68.8% EM score and a 74.4% F1 score in aggregation. This exhibits a large margin over other approaches: RnG-KBQA outperforms ReTrack  by 10.7 EM and 8.2 F1.
In addition, RnG-KBQA performs generally well for all three levels of generalization and is particularly strong in the zero-shot setting. Note that ReTrack fails in generalizing to unseen KB schema items and only achieves poor performance in the zero-shot setting, whereas our approach is generalizable and beats ReTrack with a margin of 16.1 F1 score.
We also test our approach on WebQSP, a popular KBQA benchmark. RnG-KBQA achieves 75.6% F1, surpassing the prior state-of-the-art (QGG) by 1.6% in absolute improvement. Our approach also achieves the best EM score of 71.1%, surpassing CBR . The performance of our approach is obtained using ELQ-predicted entity linking, but it still outperforms all the prior methods, even when they use oracle entity linking annotations (denoted as * in the figure below).
The results suggest that, in addition to outstanding generalization capability, our approach is also as strong in solving simpler questions in i.i.d. setting.
Salesforce AI Research invites you to dive deeper into the concepts discussed in this blog post (links below). Connect with us on social media and our website to get regular updates on this and other research projects.
 WebQSP: The Value of Semantic Parse Labeling for Knowledge Base Question Answering
 GrailQA: Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases
 Sempre: Semantic Parsing on Freebase from Question-Answer Pairs
 ReTrack: A Flexible and Efficient Framework for Knowledge Base Question Answering
 T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
 CBR: Case-based Reasoning for Natural Language Queries over Knowledge Bases
Xi Ye (lead author) is a Ph.D. student at the University of Texas at Austin. He works on natural language processing, focusing on interpretability and semantic parsing. (Note: the work described in this blog was done during Xi’s internship at Salesforce Research.)
Semih Yavuz is a Lead Research Scientist at Salesforce, conducting AI research to advance state-of-the-art on NLP with particular focus on different aspects of question answering, semantic parsing, and conversational AI while also embedding the resulting technology across Salesforce clouds for customer success.
Donald Rose is a Technical Writer at Salesforce AI Research. Specializing in content creation and editing, Dr. Rose works on multiple projects, including blog posts, video scripts, news articles, media/PR material, social media, writing workshops, and more. He also helps researchers transform their work into publications geared towards a wider audience.
A review of some terms used in our discussion: