Salesforce Research at ACL 2022

Conference Overview

This year marks the 60th annual meeting of the Association for Computational Linguistics Conference (ACL). ACL is the premier international scientific and professional society for people working on computational problems involving human language, a field often referred to as either computational linguistics or natural language processing (NLP).

This meeting features leading research in the world of NLP, highlighting advancements made in the study of language from a computational perspective.

ACL 2022 will take place in a hybrid format, hosting people both virtually and in-person in Dublin, Ireland from May 22nd - 27th, 2022.

Salesforce AI Research Publications at ACL 2022

Salesforce Research is pleased to announce a total of 14 accepted papers from our team of leading researchers.

Our accepted authors will present their work at ACL through pre-recorded talks and in-person poster sessions during the main conference. We look forward to sharing some of our exciting new research, whether virtually or face-to-face in Dublin!

Salesforce Researchers are shown in bold in the publication descriptions below.

Causal-aware Safe Policy Improvement for Task-oriented dialogue

Govardana Sachithanandam Ramachandran, Kazuma Hashimoto, Caiming Xiong

  • We address the issue of the under-specified nature of automatic evaluation metrics, when used directly as reward in RL for Task oriented Dialogue(ToD). We do this by introducing pairwise causal reward learning, a method to learn fine grained per-turn reward that reasons the intention of expert utterance. We extend the approach to human-in-the-loop settings to capture the true objective of a ToD, which biased metrics might fail to capture. We also propose a safe off-policy improvement method for ToD, that guarantees performance improvement against a baseline.

Chart-to-Text: A Large-Scale Benchmark for Chart Summarization

Shankar Kantharaj, Rixie Leong, Xiang Lin, Ahmed Masry, Megh Thakkar, Enamul Hoque, and Shafiq Joty.

  • We present two benchmarks for chart summarization and provide several strong baselines.

ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning

Ahmed Masry, Do Xuan, Jia Qing, Shafiq Joty, and Enamul Hoque

  • We present a chart question-answering dataset and investigate several state-of-the-art models.

Continual Few-shot Relation Learning via Embedding Space Regularization and Data Augmentation

Chengwei Qin and Shafiq Joty

  • Our paper introduces continual few-shot relation learning (CFRL), a challenging yet practical problem and propose a novel method for this problem that outperforms existing approaches.

DialFact: A Benchmark for Fact-Checking in Dialogue

Prakhar Gupta, Jason WU, Wenhao Liu, Caiming Xiong

  • We construct DialFact, a testing benchmark of annotated conversational claims paired with pieces of evidence from Wikipedia. We introduce three subtasks: verifiable claim detection, evidence retrieval, and claim verification task. We find that existing fact-checking models trained on non-dialogue data like FEVER fail to perform well on our task, and we propose a data augmentation solution to improve fact-checking performance in dialogue.

GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems

Bosheng Ding, Junjie Hu, Lidong Bing, Mahani Aljunied, Shafiq Joty, Luo Si, and Chunyan Miao

  • We introduce a novel data curation method that generates GlobalWoZ --- a large-scale multilingual ToD dataset globalized from an English ToD dataset for three unexplored use cases of multilingual ToD systems. Our method is based on translating dialogue templates and filling them with local entities in the target-language countries. Besides, we extend the coverage of target languages to 20 languages.

Interpreting the Robustness of Neural NLP Models to Textual Perturbations

Yunxiang Zhang, Liangming Pan, Samson Tan, and Min-Yen Kan

  • We empirically demonstrate the effect of perturbation learnability on both model robustness and robustness gains via data augmentation.

Modeling Multi-hop Question Answering as Single Sequence Prediction

Semih Yavuz, Kazuma Hashimoto, Yingbo Zhou, Nitish Shirish Keskar, Caiming Xiong

  • We propose PathFID, a new algorithm for solving multi-hop QA problems via single sequence generation by explicitly modeling the underlying reasoning process to resolve the answer. Our approach leads to strong performance gains on two multi-hop QA datasets: HotpotQA and IIRC. Besides the performance gains, PathFID is more interpretable, which in turn yields answers that are more faithfully grounded to the supporting passages and facts.

OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource Language Pair for Low-Resource Sentence Retrieval

Tong Niu, Kazuma Hashimoto, Yingbo Zhou, Caiming Xiong

  • A cross-lingual sentence alignment model that trains on only one language pair and transfers to all other languages.

QAConv: Question Answering on Informative Conversations

Jason WU, Andrea Madotto, Wenhao Liu, Pascale Fung, Caiming Xiong

  • We introduce QAConv, a new QA dataset that uses conversations as a knowledge source. We focus on informative conversations, including business emails, panel discussions, and work channels. Experimental results show that SOTA pretrained QA systems have limited zero-shot performance and tend to predict our questions as unanswerable. Our dataset provides a new training and evaluation testbed to facilitate QA on conversations research.

Rethinking Self-Supervision Objectives for Generalizable Coherence Modeling

Prathyusha Jwalapuram, Shafiq Joty, and Xiang Lin

  • We show empirically that increasing the density of negative samples improves a coherence model, and demonstrate its improved generalizability by testing on multiple downstream tasks.

RnG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering

Xi Ye, Semih Yavuz, Kazuma Hashimoto, Yingbo Zhou, Caiming Xiong

  • We propose RnG-KBQA, a Rank-and-Generate approach for question answering over knowledge bases. RnG-KBQA first ranks a set of candidate logical forms obtained via search over the knowledge graph. It then composes the final logical form based on the question and the top-ranked candidates. Our approach sets new state-of-the-art results on GrailQA and WebQSP, substantially outperforming other strong baselines, especially in the zero-shot generalization setting.

SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization

Mathieu Ravaut, Nancy Chain, Shafiq Joty

  • We apply a multi-gate mixture-of-experts model for multi-task re-ranking in abstractive summarization.

ConTinTin: Continual Learning from Task Instructions

Wenpeng Yin, Jia Li, Caiming Xiong

  • Prior AI research focuses on solving a particular task given a set of labeled examples. This work introduced a novel learning problem: task continual learning from textual instructions. The goal is to explore the potential of existing pretrained language models in solving new tasks by the supervision from instructions rather than labeled examples. With our data and a well-performing system, we pave the way for future studies of this complex problem in the community.