RnG-KBQA: Rank-and-Generate Approach for Question Answering Over Knowledge Bases

23 May 2022

15 min read

Semih Yavuz

Donald Rose

Lead Author: Xi Ye

TL;DR: We propose RnG-KBQA, a Rank-and-Generate Approach for Question Answering over Knowledge Bases, which enables answering natural language questions over large-scale knowledge bases. Our approach is capable of answering questions about topics never seen in the training data, which makes it generalizable to a broad range of domains. RnG-KBQA exhibits strong zero-shot and compositional generalization capabilities, setting new SoTA (state of the art) on most widely used KBQA benchmarks.

Background: What is KBQA?

Question Answering over Knowledge Bases, or KBQA, is a user-friendly way to interact with large-scale knowledge bases. A typical knowledge base (KB) contains information structured as a set of nodes (where various names, titles, or other entities are stored) connected by links (the relationships among those nodes).

For example, a KB node where the city name “Italy” is stored might have a link for “Capital” and that link would point to a node containing “Rome” - and a link for “Population” might point from “Rome” to the number “2.88 million.” If a user asks, “What is the population of the capital of Italy,” the Question Answering (QA) process might start at the node for “Italy”, then follow the “Capital” link to the node “Rome” and then follow Rome’s “Population” link to the stored population number, “2.88 million” - the answer to the question. In other words, the KBQA process starts by matching part of the question to a node in the KB, then proceeds to search the KB’s knowledge graph (the stored network of nodes and links) until the answer is found.

One concept used in KBQA is the KB schema item, which refers to nodes (entities) and links to other nodes (relations between entities or properties of entities). In the example above, Italy and Rome are entities, Capital is a relation between them, and Population is a property of Rome whose value is 2.88 million. Note that the term only refers to things stored in the KB, so a question is not a KB schema item.

Generalization: Handling Questions Not Seen in Training

What happens when users want to ask questions about topics that were never seen in a KBQA system’s training data?

Fortunately, KBQA systems can answer questions correctly on topics unseen during training thanks to their generalization power. Generalization involves making the system’s original knowledge go further – applying that knowledge to new questions it may never have encountered before, and answering them by, for example, composing or combining existing KB items in novel ways. This helps to expand the space of answerable questions.

In other words, if a KB system that has undergone training can apply its existing knowledge (set of facts) to answer never-before-seen questions, then that system has, to some degree, the ability to generalize – to go beyond its original knowledge and offer answers over a larger space of potential questions.

Why Generalization is Important

For a KBQA system to be truly useful, we would not want its QA ability to be limited just to what it saw during training. First, training sets are always limited to some degree, and cannot cover every single example that might be relevant for answering future questions. Second, there may be many more facts “hidden” in the KB, which just need to be extracted (during the question answering process) with a generalizable approach.

Hence, systems that can generalize are more useful because they can take previous experience (existing knowledge) and apply that to solve problems in new situations (answer new user questions that the system never encountered before). Ultimately, being able to answer never-before-seen questions, through generalization, means the system can cover a larger percentage of the space of all possible answers stored inside the KB – which helps users answer more questions.

KBQA Challenges: Generalization and Coverage

Unfortunately, the ability to generalize remains a significant challenge in the KBQA domain. While there have been attempts to solve this generalization problem in KBQA, existing methods have limitations:

Generation-based approaches (such as a seq-to-seq parser) are not effective enough to handle practical generalization scenarios.
- This is due to the difficulty of generating KB schema items that were not seen during training.
- This difficulty limits a KBQA system’s generalization ability.
- Many KBQA systems are limited to reasoning only about items explicitly seen in the training data.
Ranking-based approaches, which first generate a set of candidate logical forms using predefined rules and then select the best-scored one based on the question, have recently shown considerable success on the GrailQA benchmark [2]. However, they suffer from the coverage problem.
- Coverage refers to the ability of a KBQA system to answer (or cover) the largest possible set of potential questions - and some KBQA systems may be limited in their coverage due to their design or the scale (large size) of a KB.
- One of the main reasons for this coverage limitation for ranking-based approaches is that it is often impractical to exhaust all the rules in an attempt to cover the desired logical form of an answer, due to the KB’s scale.

At this point, you may be wondering: why can’t we combine the best aspects of ranking-based and generation-based methods into a new approach, designed to reduce or eliminate the aforementioned limitations? The answer is: we can!

Our Approach: Rank and Generate with RnG-KBQA

To address the limitations of other KBQA systems, and explore techniques for improving the process, we developed RnG-KBQA, a novel framework targeted at generalization problems in the task of Question Answering over Knowledge Bases.

Our approach’s “secret sauce” is combining a ranker with a generator (hence, RnG) when performing KBQA, which addresses the coverage issue in ranking-only approaches while still benefiting from their generalization power.

The result: RnG-KBQA can answer questions related to a broader range of topics than previous strong approaches.

Example

To illustrate how our approach works, let’s look at a detailed example:

Figure 1: Overview of our rank-and-generate (RnG) approach. Given a question, we rank logical form candidates obtained by searching over the KB based on predefined rules. Here, the ground truth logical form is not in the top-ranked candidates, as it is not covered by the rules. We solve this coverage problem using a generation step that produces the correct logical form based on top-ranked candidates. The final logical form is executed over the KB to yield the answer.

As shown in Figure 1, our method uses three main steps to form the best answer to a question:

Enumerate Candidates: obtain a pool of candidate logical forms by searching over the KB's knowledge graph.
Rank: our ranker selects a set of related logical forms from that pool of candidates.
- The selected logical forms are not required to exactly cover the correct one, but are semantically coherent and aligned with the underlying intents in the question.
Generate: our generator composes the final logical form, based on both the question and the top-k ranked candidates.

The core idea of our approach is the interplay between the ranker and the generator:

The ranker provides the essential ingredients of KB schema items to the generator.
The generator then further refines the top candidates by complementing potentially missing constructions or constraints, and hence allows coverage of a broader range of logical forms (that is, it can virtually extend the logical form search beyond the coverage of the candidate enumeration step).
The generator can distill a refined logical form without having to learn the low-level dynamics, and hence better handles unseen compositions or KB schema items.

Deep Dive

Ranker

Figure 2: The ranker that learns from the contrast between the ground truth and negative candidates.

Our ranker is a contrastive ranker that learns to score each logical form candidate by maximizing the similarity between the question and the ground truth logical form while minimizing the similarities between the question and the negative logical forms.

Specifically, we use a BERT-based encoder that takes as input the concatenation of the question and the logical form and outputs a score representing the similarity between them.

The ranker is then optimized to promote the ground truth logical form while penalizing the negative ones via a contrastive objective. Thanks to such an objective, our ranker is more effective in distinguishing the correct logical forms from spurious ones (similar but not equal to the ground truth ones) as opposed to parser-based models used in prior work [2,3] that only leverage supervision from the ground truth.

Generator

Figure 3: The generation model conditioned on the question and top-ranked candidates returned by the ranker.

Our generator is a T5-based [5] seq-to-seq model that consumes (takes as input) the output of the ranker and makes the final prediction.

Its role is to fuse semantic and structural ingredients found in the top-k candidates to compose the final logical form.

To achieve this, we feed the generator with the question followed by a linearized sequence of the top-k candidates. The generator then distills a refined logical form that will fully reflect the question intent by complementing the missing pieces or discarding the irrelevant parts without having to learn the low-level dynamics.

Our Approach to Generalization

Our model's generalization power comes from two key components:

Model-agnostic enumeration step:
- Fetches all relevant KB links (e.g., “ALBUM”) anchored at the linked entity (e.g., “Samuel Ramey”).
- So, even if the KB link “ALBUM” has not been seen in any of the training examples, this component can still fetch it as soon as we identify the “Samuel Ramey” entity in the KB.
Strong interplay between ranker and generator built off of pre-trained language models (LMs):
- We canonicalize KB items into natural language forms. For example: [music.album -> "music album"], [album.artist -> "album artist"], [recording.length -> "recording length"], and so on.
- Part of the generalization power of the model (both ranker and generator) comes from its ability to still compute a strong representation for KB items even if they are unseen during training thanks to text canonicalization.
- That is because, although these model components have not seen, say, tv.tv_song as a whole as part of its training data, it still has knowledge of primitives like tv, song, composition, and lyricist – both from pre-training (BERT, T5 models are pre-trained models) and also from potentially similar concepts during training.
- Another major factor in the model’s generalization power is seamlessly disentangling the complex reasoning task into two complementary sub-tasks (that is, ranking and generation), each of which is tackled by a strong model specialized only for solving its own sub-task. This way, the ranker can focus on ranking only the core part of the full logical form, while the generator can focus solely on learning how to distill a refined logical form, directly leveraging such core ingredients along with the question context without having to learn the low-level dynamics. Such interplay between ranker and generator enables the RnG-KBQA model as a whole to have a stronger generalization capability.

Two Examples of Generalization

Let’s take a closer look at generalization in our approach. Figure 4 below shows two types of generalization that our system can perform to answer questions: compositional generalization and zero-shot generalization.

Figure 4: Two examples of generalization in RnG-KBQA – compositional and zero-shot.

Compositional generalization

With compositional generalization, the idea is to probe the models’ meta-ability to rank and generate unseen compositions of facts in the knowledge base required to successfully answer novel questions. In other words: if Fact-A and Fact-B are in the KB, but their composition has not been seen in the training data, can the KBQA system still successfully reason about this novel composition A + B?

EXAMPLE: If Steven's age is 70 (Fact A), and Steven directed Jaws (Fact B), then the answer to "What's the age of the person who directed Jaws?" is 70.

In this example, we're answering a question within a question:
- "What is the age of (the director of (Jaws))?" -- or, age(director(Jaws))
System answers inner-question Qi first, then uses Qi to answer outer question Qo:
- Qi = director(Jaws) = Steven
- Qo = age(director(Jaws)) = age(Qi) = age(Steven) = 70.
So, the final answer is the composition of the answer to Qi + answer to Qo.

In this case, the system would have seen the functions age() and director() in training – but this particular combination is something new. This would be regarded as a compositional generalization: the individual relations were seen before, however the specific combination of those individual relations was not seen in the training data.

Zero-shot generalization

In the zero-shot case, generalization would involve working with new relations that the model may have in its KB but were completely unseen during training.

For instance, in the Figure 4 example, although our model has not seen the relation tv.tv_song as a whole as part of its training data, it still has knowledge of primitives like tv, song, composition, lyricist – both from pre-training (BERT, T5 models are pre-trained models) and also from potentially similar concepts encountered during training. In other words, thanks to pre-training, the model already knows about certain primitive concepts before training begins.

So, tv.tv_song already exists in the KB, and even though this relation was not seen during training, our model is able to:

fetch it with the candidate enumeration step
rank it high among the candidates
generate the final logical form containing this relation.

Step 2 and Step 3 are thanks to our model's generalization power, which is partly due to the underlying model’s knowledge about primitives like tv and song.

More Examples: Things The Generator Can Solve

Two quick examples to show the benefits of our generation approach:

Fixing Uncovered Operations

As suggested by the first example below, the generation model can remedy some missing operations (like adding in ARGMIN) not supported during enumeration of candidates.

Complementing Implicit Constraints

In the second example below, the generator is capable of patching the top-ranked candidate with implicit constraints: the (JOIN topic.notable_types college) in this example is not explicitly stated, but our generator is able to add this constraint.

Experimental Results

GrailQA

We mainly evaluate our approach on GrailQA that focuses on judging generalization capability. Overall, our approach sets the new state-of-the-art performance and ranks #1 on GrailQA leaderboard, achieving a 68.8% EM score and a 74.4% F1 score in aggregation. This exhibits a large margin over other approaches: RnG-KBQA outperforms ReTrack [4] by 10.7 EM and 8.2 F1.

In addition, RnG-KBQA performs generally well for all three levels of generalization and is particularly strong in the zero-shot setting. Note that ReTrack fails in generalizing to unseen KB schema items and only achieves poor performance in the zero-shot setting, whereas our approach is generalizable and beats ReTrack with a margin of 16.1 F1 score.

WebQSP

We also test our approach on WebQSP, a popular KBQA benchmark. RnG-KBQA achieves 75.6% F1, surpassing the prior state-of-the-art (QGG) by 1.6% in absolute improvement. Our approach also achieves the best EM score of 71.1%, surpassing CBR [6]. The performance of our approach is obtained using ELQ-predicted entity linking, but it still outperforms all the prior methods, even when they use oracle entity linking annotations (denoted as * in the figure below).

The results suggest that, in addition to outstanding generalization capability, our approach is also as strong in solving simpler questions in i.i.d. setting.

The Bottom Line

The goal of KBQA (question answering over knowledge bases) is to search a KB to find answers to questions presented by users. Answers are constructed using facts in the KB. The KBQA system searches the KB for a match (information stored in the KB that matches part of the question), then follows the connected graph (network of links and nodes) from that match to build an answer.
However, existing KBQA systems have shown problems related to generalization - the ability to answer questions on topics unseen during training. The ability to generalize remains a significant challenge in the KBQA domain; while there have been attempts to solve this generalization problem, existing methods have limitations.
To address these limitations, we propose a new framework called RnG-KBQA, which consists of a ranking step and a generation step. RnG-KBQA’s core strength is the interplay between these two steps. The ranker provides essential ingredients of KB schema items to the generator, which allows the generator to distill a refined logical form without having to learn the low-level dynamics and hence better handles unseen compositions or KB schema items.
The RnG method has achieved one of the major goals in KBQA: solve the coverage problem while enabling robust generalization. Our approach enables asking questions in various topics over modern knowledge bases without having to collect training data covering all the topics in the knowledge bases.
Experiments show that RnG-KBQA can generalize better than other methods. Results on two key benchmarks suggest the strong performance of our approach, which is ranked in first place on the GrailQA leaderboard (https://dki-lab.github.io/GrailQA), surpassing the prior state-of-the-art (SoTA) by a large margin. We also set the new SoTA on WebQSP.
We believe our paradigm can be generally useful in other tasks involving handling generalization using generation models – a promising path for future research.

Explore More

Salesforce AI Research invites you to dive deeper into the concepts discussed in this blog post (links below). Connect with us on social media and our website to get regular updates on this and other research projects.

To learn more about our work, check out our research paper: RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering
Contact: email Semih Yavuz at syavuz@salesforce.com
Code: https://github.com/salesforce/rng-kbqa

[1] WebQSP: The Value of Semantic Parse Labeling for Knowledge Base Question Answering

[2] GrailQA: Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases

[3] Sempre: Semantic Parsing on Freebase from Question-Answer Pairs

[4] ReTrack: A Flexible and Efficient Framework for Knowledge Base Question Answering

[5] T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

[6] CBR: Case-based Reasoning for Natural Language Queries over Knowledge Bases

About the Authors

Xi Ye (lead author) is a Ph.D. student at the University of Texas at Austin. He works on natural language processing, focusing on interpretability and semantic parsing. (Note: the work described in this blog was done during Xi’s internship at Salesforce Research.)

Semih Yavuz is a Lead Research Scientist at Salesforce, conducting AI research to advance state-of-the-art on NLP with particular focus on different aspects of question answering, semantic parsing, and conversational AI while also embedding the resulting technology across Salesforce clouds for customer success.

Donald Rose is a Technical Writer at Salesforce AI Research. Specializing in content creation and editing, Dr. Rose works on multiple projects, including blog posts, video scripts, news articles, media/PR material, social media, writing workshops, and more. He also helps researchers transform their work into publications geared towards a wider audience.

Appendix: Terms and Definitions

A review of some terms used in our discussion:

RnG: Rank-and-Generate
KBQA: Question Answering (QA) Over Knowledge Bases (KB)
Compositional generalization: combine existing data (seen in training) in new, novel ways when forming the answer to a question
Zero-shot generalization: incorporate new relations (that the model may have in its KB, but were unseen during training) when forming the answer to a question
WebQSP, GrailQA: performance benchmarks.
i.i.d.: independent and identically distributed. Example: coin flipping; each new flip is not dependent at all on any previous flips, so each coin flip is independent. Also, each flip has the same chance of getting heads or tails (50/50), so each flip is governed by the same identical probability distribution. In AI/ML, if all the data points in a training set are IID, then all the data points are generated using the same probability distribution and each data point is independent of all other data points.
Ground truth: Information known to be true or real; knowledge considered to be fundamental; empirical evidence; data based on direct observation or measurement, rather than being inferred from other data.
KB: knowledge base
KB schema: the structure of a knowledge base; defines how information is organized within the KB
KB schema item: refers to nodes (entities) and links to other nodes (relations or properties) in the KB – a subset of the KB’s knowledge graph. In other words, a KB schema item refers to entities, their properties, and relations between entities. (Example: Italy and Rome are entities, Capital is a relation between the two, and Population is a property of Rome, whose value is 2.88 million.) This term always refers to knowledge (entities, their properties, and relations between entities) that is stored in the KB – so a question is not a KB schema item.
Logical form: an expression (such as a statement or question) that has been abstracted into its essence -- its syntactic structure; an abstraction of content into logical terms. Note that several different expressions may share the same logical form.
Seq-to-seq model: Sequence to Sequence (or seq2seq) models transform one sequence (fixed-length input) into another sequence (fixed-length output). These models are a family of deep/machine learning methods that have shown success for various kinds of language processing tasks.
Entity linking: the process of linking a name (e.g., Paris) in a sentence (e.g., Paris is the capital of France) to the correct entity (e.g., Paris, France not Paris Hilton).