Salesforce Research at ICLR 2022

4 min read

Conference Overview

This year marks the Tenth International Conference on Learning Representations (ICLR), one of the premier academic conferences dedicated to advancing research in representation learning - a type of machine learning also referred to as feature learning or deep learning.

ICLR features the latest advancements in cutting-edge deep learning research used in artificial intelligence, statistics, and data science, as well as application areas such as machine vision, computational biology, speech recognition, text understanding, gaming, robotics, and more. The conference draws a wide range of participants including academic and industrial researchers, entrepreneurs, engineers, graduate students, and postdocs.

ICLR 2022 will take place in a fully virtual format from April 25th to April 29th.

Salesforce AI Research Publications at ICLR 2022

Salesforce Research is pleased to announce a total of 7 accepted papers from our team of leading researchers.

Our accepted authors will present their work at ICLR through pre-recorded talks and slides during the main conference. We look forward to sharing some of our exciting new research with you!

Salesforce Researchers are shown in bold in the publication descriptions below.

CoST: Contrastive Learning of Disentangled Seasonal-Trend Representations for Time Series Forecasting

Gerald Woo, Chenghao Liu, Doyen Sahoo, Akshat Kumar, Steven Hoi

  • Motivated by representation learning’s success in computer vision and NLP, we feel a more promising paradigm for time series forecasting would be to learn disentangled feature representations first, followed by a simple regression fine-tuning step. Hence, we propose CoST, a new time series representation learning framework for time series forecasting that applies contrastive learning methods to learn disentangled seasonal-trend representations. Experiments show CoST outperforms state-of-the-art (SOTA) methods, achieving a 21.3% improvement on multivariate benchmarks.

Continual Normalization: Rethinking Batch Normalization for Online Continual Learning

Quang Pham, Chenghao Liu, Steven Hoi

  • The negative effect of Batch Normalization (BN) in online continual learning results in higher catastrophic forgetting. We propose Continual Normalization (CN), a simple yet effective method to facilitate training that is similar to BN but designed to mitigate this limitation. Experiments on different continual learning algorithms and online scenarios show that CN is a direct replacement for BN that can provide substantial performance improvements.

Contrastive Clustering to Mine Pseudo Parallel Data for Unsupervised Translation

Xuan-Phi Nguyen, Hongyu Gong, Yun Tang, Changhan Wang, Philipp Koehn, Shafiq Joty

  • Modern unsupervised machine translation systems mostly train their models by generating synthetic parallel training data from large unlabeled monolingual corpora of different languages through various methods. However, a small amount of actual parallel data may exist hidden in the sea of unlabeled data, which has not been exploited. We develop a new fine-tuning objective – Language-Agnostic Constraint for SwAV loss – which enables a pre-trained model to extract pseudo-parallel data from monolingual corpora in a fully unsupervised manner. We then propose an effective strategy to utilize the obtained synthetic data to augment unsupervised machine translation. Our method achieves SOTA performance on bilingual unsupervised translation tasks.

Efficient and Differentiable Conformal Prediction with General Function Classes

Yu Bai, Song Mei, Huan Wang, Yingbo Zhou, Caiming Xiong

  • We propose generalizing conformal prediction to multiple learnable parameters, by considering the constrained empirical risk minimization problem of finding the most efficient prediction set subject to valid empirical coverage. This meta-algorithm generalizes existing conformal prediction algorithms, and we show that it achieves approximate valid population coverage and near-optimal efficiency within class, when the function class in the conformalization step is low-capacity.

LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5

Chengwei Qin and Shafiq Joty

  • Our work addresses the limitations of Lifelong Language Learning, focusing on a more challenging yet practical problem where the model needs to generalize well on new few-shot tasks without forgetting previous ones. We call this Lifelong Few-shot Language Learning (LFLL) and investigate three kinds of tasks: sequence labeling, text classification, and text generation. We propose a unified LFLL framework based on prompt tuning (PT) of T5 -- LFPT5 – which takes advantage of PT’s strong few-shot learning ability, and simultaneously trains the model as a task solver and a data generator. Experiments show LFPT5 significantly outperforms previous methods.

Robustly Extracting Factual Information from Language Models with Diverse Prompts

Benjamin Newman, Prafulla Kumar Choubey, Nazneen Rajani

  • The quality of factual information extracted from Large Language Models (LLMs) depends on the prompts used to query them. Different users querying LLMs for the same info using different wording should receive the same, accurate responses. Our work addresses this by introducing P-Adapters: lightweight models that sit between the embedding layer and first attention layer of LLMs. They take LLM embeddings as input and output continuous prompts used to query the LLM. We also investigate Mixture of Experts (MoE) models that learn continuous prompts ("experts") and select one to query the LLM. P-Adapters perform comparably to the more complex MoE models in extracting factual information from BERT and RoBERTa while eliminating the need for additional annotations.

When Can We Learn General-Sum Markov Games with Large Number of Players Sample-Efficiently?

Ziang Song, Song Mei, Yu Bai

  • Multi-agent reinforcement learning has made substantial empirical progress in solving games with a large number of players. However, theoretically, the best known sample complexity for finding a Nash equilibrium in general-sum games scales exponentially with the number of players due to the size of the joint action space, and there is a matching exponential lower bound. This paper investigates what learning goals admit better sample complexities in the setting of m-player general-sum Markov games. Our results shed light on what equilibria or structural assumptions on the game may enable sample-efficient learning with many players.

Explore More

To learn more about these and other research projects, please visit our website at