Conversational AI Programming with CodeGen: Let AI Write Code For You

14 min read

Links: Research Paper, Github


Can you imagine a machine writing an app for you, just by telling it what you want?

As futuristic as this scenario sounds, it’s actually here today.

Salesforce AI Research outlines conversational AI programming as a new paradigm that’s making this vision a reality, thanks to an AI system that writes software with you, in a conversation.

Introducing CodeGen: Turning Prompts Into Programs

The first step towards this vision is now here in the form of our large-scale language model, CodeGen, which turns simple English prompts into executable code. You don’t write any code yourself; instead, you describe what the code should do, in natural language -- and the machine writes it for you.

For a quick look at how it works, let’s ask CodeGen to solve the two-sum problem: find two integers in a list that add up to a certain number.

To begin, we simply prompt the model, in plain English, to solve the two-sum problem. As you can see in the brief video below, our CodeGen model generates functioning code that solves the problem correctly.


Some Quick Background: Terms, Definitions, Concepts

Before proceeding further, let’s define some of the terms and ideas used in this blog:

Programming or Coding: A multi-step process designed to get a machine to achieve a goal:

  • Translate a problem into a series of steps that solve it (the algorithm)
  • Translate that algorithm into a computer language (the program)
  • Run that program to see if it works (the test)
  • Find out which parts of the program did not work properly (debugging)
  • Revise the program (adjust for errors) and run it again (re-test)
  • Continue the run-debug-revise cycle until the program works (a working program runs successfully and solves the problem).

Conversational AI: Technologies enabling natural interactions between a human and a computer, via a conversation conducted in the human's native language.

  • Chatbots, voice assistants, and virtual agents are examples of conversational AI.

Types of Computer Programming

While software engineering concepts and methodologies have evolved over the past few decades (programming languages, web services, cloud computing, and so forth), the classical paradigm in which one writes the code (the underlying building block of software) has remained mostly untouched for decades. Since our research proposes a new way to do programming, it’s instructive to see how it compares to other means of doing programming:

  • Classical programming: Traditionally, a programmer factorizes a problem into smaller sub-problems, defines a requirement, then drafts a piece of code, which then is revised until it solves the given problem.
    • In 1945, this is how the first programmable machine, the ENIAC, was programmed using plugboard wiring.
    • Today, this is how programs are written using formal languages with higher abstraction such as C, Python, or Java.
    • The classical, fundamental paradigm of specifying a problem in natural language and iteratively refining a solution in a formal or programming language until the specification is satisfied remains the predominant method of programming today.
  • Automatic programming: Humans write code at a high level of abstraction, and a method is then used to auto-generate a computer program from the higher-level language.
    • Most of today’s popular computer languages are like this; coders write in a higher-level language, and a compiler generates low-level code; this saves time and effort, since we humans don't have to worry about all the low-level details.
  • Interactive programming: Coding a program (or parts of a program) on-the-fly, while that program is running.
  • Conversational AI programming: The advent of machine learning urges us to rethink the classical paradigm. Instead of a human doing the programming, can a machine learn to program itself, with the human providing high-level guidance? Can human and machine establish an interactive discourse to write a program? The answer, as our research reveals, is a resounding Yes.
    • Since it combines conversational AI (interactive human-to-machine dialogue) and automatic programming (the system automatically creates the program based on a higher-level language: your conversation!), we call what CodeGen does conversational AI programming.

A Different Kind of Coding Problem: (Learning a New Language) = [Barrier]

Up till now, we have had two ways to get computers to do useful work:

  • use pre-existing computer programs that do what you want the machine to do
  • write a new program to do it.

Option 1 is great, when the computer programs you need are available.

But Option 2 has a built-in barrier: if the type of program you need does not exist, the task of creating new programs has always been limited to those who can speak the computer’s language; you must learn at least one programming language and apply that knowledge to write programs. In other words, to get new programs, you have to know how to translate what you want to do into computerese, so the computer will understand what you want it to do. This bottleneck applies not only to situations where you want to create a program for yourself, but also when you want to create programs for others - in coding jobs, for instance.

Here are three of the major limitations of the current programming paradigm:

  • Time-consuming: one must learn a programming language and apply the knowledge correctly
  • Difficult: some find the process of learning this new language to be an arduous task, and some cannot get through the training successfully
  • Expensive: have you seen the cost of coding schools?

These factors often hinder or discourage the education and development of new programmers, especially among people in historically disadvantaged groups. In other words, traditional programming often presents people with a different kind of "coding problem" -- not one given on a test, but rather a formidable real-world obstacle that many simply cannot solve.

However, the good news is that there is another way.

The CodeGen Solution: Make Coding as Easy as Talking

What if you could just tell a machine the kind of program you need - just use your native language to describe your needs to a computer, and it would generate the code that does what you wanted? That’s the amazing promise of conversational AI programming: CodeGen makes programming as easy as talking.

Here’s an analogy to help illustrate the concept. When you order dinner in a restaurant, instead of having to know the correct ingredients to make your desired dish, and then cooking it yourself, you just tell the server what you want, and they prepare and bring it to you. Say the dish you want in a short sentence, and you get it without any involvement from you in the creation of the meal - no need to specify any ingredients or cooking steps, and you don’t need to know any special culinary terms. The restaurant acts like an intelligent system, translating your plain-English request (order) into a sequence of steps that takes basic food ingredients and generates the outcome (cooked dish) you asked for. Now imagine you’re “ordering” computer code instead of a meal, and you have the basic idea behind CodeGen.

Our implementation of conversational AI programming provides a glimpse into the future of democratizing software engineering for the masses. An “AI assistant” translates English descriptions into functional and executable Python code - allowing anyone to write code, even if one knows nothing about programming. The underlying language model, CodeGen, enables this conversational paradigm and will be made available as open source to accelerate research.

The Full Vision: Interactive Conversation with Computer Creates Code

This new paradigm in programming takes the form of a simple yet highly intelligent dialogue. In the concept’s full implementation (our vision of how it would work in its ultimate form), a typical fully-interactive conversation about your desired code would flow as follows:

Human: “I would like to create a red button.”
Machine: “Where do you want to position the button?”
Human: “In the center.”
Machine: “What happens if the button is pressed?”
Human: “Calculate a measure of center for Bitcoin's price over the past 10 days.”
Machine: “Mean, median, or mode?”
Human: “Mean. Please show me the code.”
Machine: <shows code it generated>
Human: “In line <X>, I was meaning to <state a correction or revision or new approach> … could you revise this accordingly, please?”
Machine: <shows revised code it generated>

The Current State: Conversational AI Programming with CodeGen

While the above (fictional) conversation example helps illustrate the full conceptual vision, let’s turn to some real-world examples - the concrete realization of the concept as it exists today in CodeGen. Let's start by revisiting the two-sum problem we introduced at the start of this blog:

Note that this time, we don’t stop once CodeGen generates working code to solve the problem - we ask the model to try again, and solve the problem using a hash map. This example illustrates some of the groundbreaking capabilities of our system: we can continue our conversation, refer back to “the problem” (a backreference, which CodeGen understands), and request that the model try a new approach (the hash map), in the hopes of getting an even better solution. (In our restaurant analogy, this would be like giving additional instructions to the server about your order, like “use egg whites only” or “use margarine instead of butter.”)

And it works: CodeGen succeeds in generating new code that uses a hash map, and in so doing, the new solution runs in linear time - O(n) - much faster than the original solution, which was O(n**2).

The Two Sides of CodeGen: For Non-Coders and Programmers Alike

The above “hash map” example illustrates a key aspect of CodeGen: while anyone can use CodeGen to build software from scratch, even non-coders, it does help to have some programming knowledge in certain cases. For example, knowing coding concepts can help you think of followup commands to give CodeGen, suggesting new avenues to explore while building the code (like using hash maps, or recursion - or not using these techniques).

While the vision is to create optimal programs for any problem by just telling the machine what you want, without needing any coding knowledge, the reality is that some programming knowledge can often help, in order to guide CodeGen to a good solution. This is especially true for more complex problems, where having the user suggest different approaches may help the software find a working solution - or a more efficient one.

Still, even for experienced coders, CodeGen makes getting to a functioning solution faster and easier, and allows rapid exploration of alternate methods. In other words, CodeGen is beneficial for all levels of programmers.

The Details: An In-Depth Look at How CodeGen Works

Approach. Salesforce AI Research trained CodeGen, a 16-billion parameter auto-regressive language model, on a large corpus of natural and programming languages. Two aspects are of particular interest: (1) sampling executable code by scaling the size of the model and dataset, (2) emergence of conversational capabilities.

Scaling. The large size of this model is motivated by the empirical observation that scaling the number of model parameters proportional to the number of training samples appears to strictly improve the performance of the model. The phenomenon is known as the scaling law. We leverage this law to learn a model which can translate a natural language (English) to a programming language (Code) with high accuracy. That is, the model is capable of not only generating reasonable code, but also executable code; the generated code is of such high quality that it can be immediately executed without revisions by a programmer, which allows even a non-professional audience to “write” code.

To train such a model, Salesforce enjoyed a close collaboration with Google on the TPU platform, a special piece of ASIC hardware devised specifically for neural network machine learning. Leveraging scaling laws requires vast amounts of both training data and compute, and so, for our CodeGen models, Salesforce utilized Google’s TPU v4 hardware with the recent TPU-VM architecture and JAX as a high-performance auto-grad library. While scaling models up to 16B parameters on traditional GPU stacks can pose quite a technical challenge, the TPU ecosystem allows for scaling (up to TPU-v4-512 in our setting) and model parallelism with the first-class pjit() operator, a natural means of expressing distributed computation. This combination of sheer compute coupled by fast interconnect and made accessible through JAX allows for efficient training of billion-parameter scale models.

Conversation. Having a conversation appears a rather trivial task for humans. We implicitly keep track (or a memory) of the past conversation, resolve references to previously mentioned elements, and incrementally build a mental picture or story of the discourse. For machines, holding a realistic conversation is one of the grand challenges of our time. Testing if a machine possesses human capabilities or can fool a human into believing she or he is holding a conversation with another human being is known as the Turing Test. While in the first iteration of our research, the model replies in a formal language (i.e., the programming language) and not a natural language, later incarnations will be in the form of a multi-turn discourse in natural language, so that the model may resolve ambiguities as in “May I solve this problem with algorithm A, B, or C?”. Surprisingly, modeling such conversation in conjunction with the scaling laws turned out to be rather simple, where simplicity is a rather desirable property (see Rich Sutton’s “The Bitter Lesson”).

Specifically, a conversation of several consecutive questions (by the human in natural language) and answer (by the machine in programming language) is concatenated into a single long sequence. Based on this context of the past conversation, an auto-regressive decoder model samples the next response conditional on the past pairs of questions and answers. The fact that conversational capabilities emerge with such a naively simple approach (given sufficient data and model size) was surprising.

Picture the example shown earlier, in which first a problem is stated and subsequently the question (or specification) is refined:

“Solve the two sum problem”

“Solve the problem using a hash map”

While solving the first request can be understood as a form of pseudo-retrieval of examples in the observed training data (think of a database query), the second request involves resolving the backreference of “the problem” to “the two sum problem”, and requires a shallow form of understanding the previously generated code to rewrite it using a hashmap. This phenomenon is crucial as the underlying model was never specifically trained to hold a conversation or revise code. These conversational and problem solving capabilities “emerged” naturally.

Societal Benefits and Impact: Why Conversational AI Programming is Important


Benefits for Next-Gen Software Development: Programs of the Future Need This

While programming is a useful skill today, in the next decade programming will be a necessity in many tech jobs, including at Salesforce. The world needs more and more code, in every aspect of society, and these programs are getting increasingly complex. Hence, systems like CodeGen (which help speed up the programming process while making it easier and more manageable) should play an integral role in completing increasingly large coding projects, as well as bringing a whole new generation of programmers into the world of coding to achieve these goals.

But there is another issue on the horizon: what happens when future programming needs become so complex that the skills needed to create these programs outstrip human capabilities? Digital ecosystems are evolving into systems with ever-increasing functional complexity, and at some point these systems’ complexity may increase beyond our capacity to understand them, let alone build them. We may soon get to the point where projects require technology such as conversational AI programming, in order to create the mega-complex software systems of the future -- both on the massive scale that will be required, and in a timeframe that would be impossible for a team of human programmers to produce on their own.

In short, rapidly increasing code complexity requires a new paradigm. Next-gen programming needs, both at Salesforce and at other organizations, seem destined to make conversational AI programming systems like CodeGen essential to our future.

Benefits for Society: CodeGen Democratizes Programming

A major part of Salesforce’s mission is to develop technology that can help all of society, not just the company, and that is exactly what this research does. Many groups will benefit from the conversational AI programming revolution that CodeGen represents. Here are some examples.

Enhancing equality and equity.  Opening up coding to all - democratizing access to the world of creating programs - will help bring traditionally disadvantaged groups into the world of programming, leading to increased career opportunities and incomes for such groups.

Education/teaching/learning. Kids will learn to program interactively with “AI teachers” as their companions, create worlds and games in a discourse in their native language, while learning and absorbing how to translate their ideas into programming languages.

Software professionals: engineers, data scientists, developers. Software engineers will understand the architecture, design patterns, and summarize critical paths of legacy systems with the aid of “AI assistants”. Analysis of complexity in space and time, security vulnerabilities, design patterns, refactorings, or test generations is supported by an artificial pair-programmer.

Non-software professionals. Business analysts will integrate complex external data sources and systems, correlate and normalize data, perform exploratory analysis, and visualize findings in conjunction with “AI analysts”.

In general, the democratization of coding should reap society-wide rewards.

  • The new paradigm of conversational AI programming will lead to a disruptive transformation in software engineering. Today, computer science education requirements act as a barrier to the world of programming, but tomorrow the floodgates will be opened for everyone to transform an idea into code - no programming knowledge required.
  • We expect this disruption to be on the scale of other tech-based breakthroughs such as autonomous vehicles, or even the printing press.
  • Reducing or eliminating barriers to coding will benefit the entire economy, analogous to how the printing press had an enormous economic impact in all areas of society.
    • Just as the printing press revolutionized the world by bringing publishing and reading books to the masses, so too will creating programs using natural language revolutionize the world, by letting anyone make programs that will benefit themselves, or others - and do so at much faster speed.
    • After the printing press: reading was accessible to everyone.
      • Democratization of knowledge, speeding up creation of new knowledge
    • After conversational AI programming: coding, coding jobs, app creation, and software-assisted problem solving will be accessible to everyone.
      • Democratization of programming, speeding up creation of new programs.

The Bottom Line

  • The paradigm of conversational AI programming (coding by talking) turns the notion of writing code for a machine on its head. Rather than requiring a human to write code for a machine, the machine generates code for the human (automatic programming) via a discourse between human and machine (conversational AI). The human merely needs to explain the desired functionality in plain English, while the machine writes the code based on that description.
  • CodeGen can be applied to both simple and complex problems, using natural language.
    • Most users can solve relatively simple coding problems with little or no prior programming knowledge.
    • More complex cases may require some knowledge of programming or basic computer science concepts, in order to help guide the system as it searches for a solution (i.e., working code that solves the stated problem).
    • Still, even for experienced coders, CodeGen makes getting to a functioning solution faster and easier, and allows rapid exploration of alternate methods.
  • Benefits to society: The shift to this new kind of programming - creating code from natural conversations - will lead to a disruptive transformation of software engineering and open up coding to new segments of the population.
    • This new approach democratizes access to the world of writing software, allowing anyone to develop apps in conjunction with an “AI assistant” or “teacher” without the need to learn programming in the traditional way.
    • Opening up coding to all will help bring traditionally disadvantaged groups into the programming world, leading to more career opportunities and higher incomes for such groups.
  • The vision of conversational AI, once just a cinematic dream, is now becoming real. Over half a century ago, the AI character HAL 9000 (antagonist in Arthur C. Clarke’s Space Odyssey series) captured our imagination, but machines that could converse with us and understand our intentions were pure fiction. Today, a machine that understands our goals in plain language and helps us achieve them is finally within our grasp. What society once could only dream of, Salesforce AI Research is making a reality.
    • Our research is one of the first steps towards implementing the wider vision: enabling machines to leverage in-depth natural language conversations with humans to write code - and do it faster, with fewer errors, in a manner that makes it easy for all. There will be challenges along the way, but these efforts will yield a wide range of new applications as we pursue this exciting research direction.
  • We will open source the code. Part of the mission of Salesforce is to create and publish important research that others may benefit from, so our work on conversational AI programming will be available as open source code.

Explore More

Salesforce AI Research invites you to dive deeper into the concepts discussed in this blog post (links below). Connect with us on social media and our website to get regular updates on this and other research projects.

About the Authors

Erik Nijkamp is a Research Scientist at Salesforce AI Research. His research emphasis is on large-scale generative models and representation learning with applications in NLP and computer vision. Prior to Salesforce, he was a PhD student under Prof. Song-Chun Zhu and Prof. Ying Nian Wu at UCLA.

Donald Rose is a Technical Writer at Salesforce AI Research. He works on writing and editing blog posts, video scripts, media/PR material, and other content, as well as helping researchers transform their work into publications geared towards a wider (less technical) audience.