Our work fits within a larger context of recent advances in RL. RL has been used to train AIs to win competitive games, such as Go, Dota, and Starcraft. In those settings, the RL objective is inherently adversarial (“beat-the-other-team”). Machine learning has also been used for the design of auction rules. In this work, we instead focus on the opportunity to use AI to promote social welfare through the design of optimal tax policies in dynamic economies.
Many studies have shown that high income inequality can negatively impact economic growth and economic opportunity. Taxes can help reduce inequality, but it is hard to find the optimal tax policy. Economic theory cannot fully model the complexities of the real world. Instead, tax theory relies on simplifying assumptions that are hard to validate, for example, about the effect of taxes on how much people work. Moreover, real-world experimentation with taxes is almost impossible.
Classic tax theory focuses on people who earn income by performing labor. A worker gains utility from income but incurs the cost of labor effort. At some point, the extra utility from additional income does not outweigh the cost of additional effort.
For instance, working on weekends might earn you more money, but the effort might not be worth it to you.
A key assumption is that people differ in their skill level. Low-skilled workers receive a lower hourly wage, and so earn less money than high-skilled workers for the same amount of labor. This leads to inequality.
As a policy goal, a government may prefer to tax and redistribute income in order to improve equality. However, higher taxation can discourage work and may particularly affect high-skilled workers. An optimal tax policy optimizes this balance between equality and productivity.
A prominent tax framework, proposed by Emmanuel Saez, derives a simple optimal tax formula. However, this formula requires knowing how labor responds to changes in tax rates (“elasticity”) and makes strong assumptions, for example, that the economy is static and workers do not gain new skills. Other work has studied dynamic economic systems, but needs simplifying assumptions in order to attain analytical solutions. For an extended overview of related work, see our technical paper.
The AI Economist is a purely simulation and data-driven approach to the design of optimal tax policies. It uses a principled economic simulation with both workers and a policy maker (a "planner" in the economics literature), all of whom are collectively learning using reinforcement learning.
The simulation uses a two-dimensional world. There are two types of resources: wood and stone. Resources are scarce: they appear in the world at a limited rate. Workers move around, gather and trade resources, and earn income by building houses (this costs stone and wood). Houses block access: workers cannot move through the houses built by others. The simulation runs this economy over the course of an episode, which is analogous to a “working career.”
A key feature is that workers have different skills. Higher-skilled workers earn more income for building houses, and thus more utility. Building houses also takes effort, which lowers utility. Workers also pay income taxes and the collected tax is redistributed evenly among the workers. Together, these economic factors and the various competitive drivers mean that workers need to be strategic in order to maximize their utility.
Our economic simulation produces rich behavior when AI agents (the agents are the workers in the economy) learn to maximize their utility. A salient feature is specialization: AI agents with lower skill become gatherer-and-sellers and earn income by collecting and selling stone and wood. Agents with higher skill specialize as buyer-and-builders and purchase stone and wood in order to more quickly build houses.
We do not impose such roles or behaviors directly. Rather, specialization emerges because differently skilled workers learn to balance their income and effort. This demonstrates the richness of the economic simulation and builds trust that agents respond to economic drivers. Complex emergent economic behavior has been previously studied in economics through agent-based modeling, but this has largely proceeded without the benefit of recent advances in AI.
Reinforcement learning is a powerful framework in which agents learn from experience collected through trial-and-error. We use model-free RL, in which agents do not use any prior world knowledge or modeling assumptions. Another benefit of RL is that agents can optimize for any objective.
In our setting, this means that a tax policy can be learned that optimizes any social objective, and without knowledge of workers’ utility functions or skills.
Finding optimal taxes when both workers and the policy maker are learning poses a challenging two-level RL problem:
This two-level learning problem poses a technical challenge, as the simultaneous behavioral changes of agents and changes to the tax policy can lead to unstable learning behavior. We have found that a combination of techniques, including the use of learning curricula and entropy regularization, enable stable convergence. These are described in our technical paper.
Our reinforcement learning approach produces dynamic tax policies that yield a substantially better trade-off between equality and productivity than baseline methods.
We compared the AI Economist with
All tax policies make use of seven income brackets, following the framework of the US Federal Income Tax schedule, but varying in their tax rates. The total tax is calculated by summing up the tax for each bracket in which there is income.
Episodes are divided into ten tax periods of equal length. Throughout each tax period agents interact with the environment to earn income and, at the end of the period, incomes are taxed according to the period's tax schedule and redistributed evenly to workers. The tax policy of the AI Economist allows the tax schedule to vary across periods.
We set-up the economic environment such that the fraction of worker incomes per income bracket is in rough alignment with those in the US economy.
Our experiments show that the AI Economist achieves at least a 16% gain in the trade-off between equality and productivity compared to the next best framework, which is provided by the Saez framework. The AI Economist improves equality by 47% compared to the free-market at only an 11% decrease in productivity.
Compared with the baselines, the AI Economist features a more idiosyncratic structure: a blend of progressive and regressive schedules. In particular, it sets a higher top tax rate (income above 510), a lower tax rate for incomes between 160 and 510, and both higher and lower tax rates on incomes below 160.
The collected taxes are redistributed evenly among the agents. In effect, the lower-income agents receive a net subsidy, even though their tax rates are higher (before subsidies). In other words, under the AI Economist, the lowest incomes have a lower tax burden compared to baselines.
We observed that under the Saez framework, the gatherer-and-sellers collect fewer resources than with the AI Economist. This forces the buyer-and-builders to spend more time to collect resources themselves, which lowers their productivity. At the same time, the Saez framework yields less equality through redistribution as its tax schedule is more regressive. In sum, this yields a worse balance between productivity and equality.
Finding optimal taxes can be challenging because AI agents can learn to "game" tax schemes. In our simulation, agents learn that they can lower their average effective tax by alternating between earning high and low incomes, rather than earning a smooth income across periods. This tax gaming occurs for the Saez tax and AI Economist due to their regressive tax rates (higher income brackets have lower tax rates). The performance of the AI Economist demonstrates that it is effective even in the face of strategic agent behavior, and the emergence of this behavior underscores the richness of the simulation-based framework.
We also explored whether the AI Economist is effective in experiments with human participants. These experiments used a simpler ruleset to provide better usability, for instance, removing the ability to trade. However, the same economic drivers and trade-offs applied. Participants were paid real money for the utility they gained from building houses. Hence, participants were incentivized to build the number of houses that would maximize their utility. The stakes were sufficiently high: participants were paid at an average rate of at least twice the US minimum wage.
We tested all methods in a zero-shot transfer learning setting, by using the tax rates from the AI-only world in the human setting without retraining. This is an interesting evaluation, because retraining a tax policy might require a large amount of human data. The only modification was to scale down all income brackets by a factor of three to account for lower human productivity compared to AI agents. The full details are in our technical paper.
For the experiments with human participants, we selected from the set of trained AI-Driven policies a tax schedule shaped like a camelback. We compared this schedule with baselines in experiments with participants recruited on Amazon Mechanical Turk.
In 125 games with 100+ US-based participants, the camelback schedule achieved an equality-vs-productivity trade-off that is significantly better than the free market and competitive with other baselines. Participants were paid more than $20/hour on average. Compared with AI agents, people were more prone to suboptimal adversarial behaviors, such as blocking other workers. This significantly increased the variance in productivity.
Interestingly, the camelback schedule is qualitatively different from the baselines. However, the relative performance of the camelback versus the baselines is consistent across the experiments with only AI and with only people.
The camelback tax also statistically significantly outperformed all baselines in regard to an alternative, established social welfare metric that weights the utility of lower income workers more than higher income ones.
The strong zero-shot transfer performance on human play is surprising and encouraging. The camelback tax was competitive with, or outperformed, baselines, without recalibration and while being applied with different rulesets and worker behaviors. As such, these results suggest promise in the use of the AI Economist as a tool for finding good tax policies for real economies.
AI-based economic simulations still have limitations. They do not yet model human-behavioral factors and interactions between people, including social considerations, and they consider a relatively small economy. However, these kinds of simulations provide a transparent and objective view on the economic consequences of different tax policies. Moreover, this simulation and data-driven approach can be used together with any social objective in order to automatically find a tax policy with strong performance. Future simulations could improve the fidelity of economic agents using real-world data, while advances in large-scale RL and engineering could increase the scope of economic simulations.
We believe that this kind of research has great potential for increasing equality and productivity in real economies, helping to promote more just and healthy societies. We also hope that the AI Economist can foster transparency, reproducibility, and an open and facts-based discussion about applying machine learning to economic decision-making through our public research publications and open-source code. As such, our hope is that future economic AI models can robustly and transparently augment real-world economic policy-making and, in doing so, improve social welfare.
Ethics, trust, and transparency are an integral part of Salesforce’s approach to AI research. While the current version of the AI Economist is a limited representation of the real world, and is not a tool that could be currently used with malintent to reconfigure tax policy, we recognize that it could be possible to manipulate future, large-scale iterations of the AI Economist to increase inequality and hide this action behind the results of an AI system.
Furthermore, either out of ignorance or malice, bad training data may result in biased recommendations, particularly in cases where users train the tool using their own data. For instance, the under-representation of communities and segments of the work-force in the training data could lead to bias in AI-Driven tax policies. This work also opens up the possibility of using richer, observational data to set individual taxation, an area where we anticipate a strong need for robust debate.
We encourage anyone utilizing the AI Economist to publish a model card or data sheet that describes the ethical considerations of AI-Driven tax schedules in order to increase transparency, and by extension, trust, in the system.
In order to responsibly publish this research, we have taken the following measures:
With these mitigation strategies and other considerations in place, we believe this research is safe to publish.
Paper: https://arxiv.org/abs/2004.13332
This work was a joint effort with contributions from:
Stephan Zheng, Alex Trott, Sunil Srinivasa, Nikhil Naik, Melvin Gruesbeck, Kathy Baxter, David Parkes, and Richard Socher.
We thank Lofred Madzou, Simon Chesterman, Rob Reich, Mia de Kuijper, Scott Kominers, Gabriel Kriendler, Stefanie Stantcheva, and Thomas Piketty for valuable discussions.
We also thank the following people for their valuable support: