Abhijit Banerjee, Esther Duflo, and Michael Kremer are three of the most influential economists on effective altruists. They are responsible in large part for impact evaluations. I give a synoptic survey of their work.
I reproduce the first third of the article below:
Abhijit Banerjee, Esther Duflo, and Michael Kremer are most famous, justly, for their work on randomized controlled trials, or RCTs, for which they were awarded the Nobel Prize in 2019. An RCT, for those who do not know, is the same set-up as that by which we test a medicine. We give the treatment to a group chosen at random, and compare their outcomes to an untreated group. In the hands of Banerjee, Duflo, and Kremer, and their many collaborators at the Mohammed Latif Jameel Poverty Action Lab, that method is used to test policy. How should we provide public health services? How should we conduct education? How do we get an efficient government? How do we disseminate information? The policies informed by their experiments have affected the lives of over 600 million people; a conservative estimate of lives saved would run into the millions.
Their work stretches far beyond simply being great administrators and organizers of experiments, though. They are also first-rate economists who have created work of incredible depth and insight using a variety of methods, in theory and in practice, and have been enormously influential both on me, and on the field at large.
Of the husband and wife team of Banerjee and Duflo, Banerjee is the older of the two by 11 years, and in fact supervised Duflo’s dissertation. It is actually surprising that two of his three most cited works are actually not related to RCTs at all – “A Simple Model of Herd Behavior”, from 1992, and “Occupational Choice and the Process of Development” from 1993. (with Andrew Newman). Most of his early research focused on theoretical matters of how information is spread, which would later be put into practice in RCTs, such as “Using Gossips to Spread Information”, in the context of vaccination, and “When Less is More”, on communicating the details of India’s 2016 demonetisation.
His most cited paper, “A Simple Model of Herd Behavior” is indeed simple, and quite elegant. In the model, people make decisions sequentially, and can observe what other people choose. Because people possess some private information, we are best served incorporating their signals into our decisions; but since they are also incorporating information from others, we may converge upon outcomes which are wrong. He gives a simple example of choosing between two restaurants, A and B. Everyone starts with the prior that there’s a 51% chance restaurant A is better, and 49% restaurant B is, and then receives a signal favoring one side or another. 99 out of the 100 people might think that restaurant B is better – but if the first person received a possibly inaccurate signal that restaurant A is better, then everyone will choose A. Everyone would be at least as well off if they had no information from other people’s actions at all. Thus, we can get bubbles and other such inaccurate behavior.
I see it as a sort of reverse “Bayesian Persuasion”. There, we get sub-optimal outcomes because the actions which the receiver can take are non-linear, and if you push someone over a threshold you get a big change in response. Here we get inefficiencies because the signals which people send are discrete. We can only go to one restaurant or the other, not some percentage of both. Perfect efficiency is really only possible in a continuous world.
“The Economics of Rumours” would construct a reason why a bubble wouldn’t be bought into by everyone. There are a number of investors who might invest in a project, the returns of which are either a or b, with some probability assigned to each. Investors face different costs, which for simplicity are “high” and “low”, with the payoffs such that low cost investors always want to invest, and high cost investors only want to invest if the returns are a, but not b. Some small portion of the population hears about the opportunities, and knows the returns – everyone else knows only that other people have invented. If hearing about the opportunity is a function of whether other people have invested, then the optimal decision rule for high-cost investors is to invest if early, and to not invest if heard later. The reason is simple – if it spreads slowly and you hear of it late, it’s more likely to be in state b.
He also has an intriguing, if entirely too long, paper with Eric Maskin giving an account of why money exists. Money famously avoids the need for a “double coincidence of wants” – if I wish for apples, and have bananas, money lets me get apples even if they don’t want bananas. The trouble is that, once you allow for durable goods, the goods themself can be money. Standard goods, like rice or salt, do often take the place of fiat money in developing economies. Note that I say standard goods, though. According to Maskin and Banerjee, the reason for money is that it is extremely easy to judge quality, or else people would use low-quality goods to try and get something out of the other. Thus, metals as money, for purity and size can be easily ascertained.
I do not think the jump from theory to empirical work was as much of a break as it would seem. His early papers were largely practically minded models, seeking to simply explain a commonplace observation. His paper on herding, for example, came from noticing how people waiting for the train at Princeton would often form long lines for the wrong train. His motivations for RCTs often lie not in obtaining simply an unbiased estimate, but as a way to directly test a theory with a treatment which lines up 1 to 1 with the theoretical hypothesis.
One of his early papers, with Andrew Newman, explored the possibility of poverty traps. Can countries have a large divergence in outcomes from a small difference in initial conditions? Can initial conditions be changed to start the process of development? They assume that there are credit frictions of some kind. (Were there not, first welfare theorem holds and we’d be maximally efficient – but that’s not interesting, now is it!). Poor people choose either to become self-employed, enter into employment contracts, or become an entrepreneur. If people are too poor, or insufficiently unequal, then society never takes off into business, and it remains a nation of cottage industry. They also point out, in an AEA P&P, that the poor being closer to the lower bound of possible utility makes it harder to enforce the repayment of loans, and so they will naturally find it harder to borrow.
Duflo would come to MIT in 1995 on the advice of Thomas Piketty – though she was nearly rejected by, of all people, Abhijit Banerjee. I see Duflo’s work as relentlessly optimistic. She does not think that poverty is inevitable. There is no fundamental difference between the developed and developing world. Rather, poverty is the result of particular problems, each of which can be overcome by determined action, and it is her job – no, her duty – to find the solution. She writes in her biographical statement upon accepting the Nobel, “I felt that the only way I could ever repay this huge cosmic debt to the world was first, to nourish and exploit my own unremarkable talent, and second, to play some role in helping others get the opportunity to find and nurture their talents.”
Banerjee and Duflo’s first paper together is on the limits of contracting. Indian courts are notoriously ineffective, and so firms cannot rely upon them to settle disputes. This makes building a good reputation of utmost importance, especially when the products (software packages) are much too complicated to even explain to the court. How burdensome is being unable to make the precise contracts you want?
They have a few strong and testable predictions. Firms without a good reputation will need to take on fixed cost contracts, so that the burden of cost overruns fall upon them; when they build up reputations for honorable dealing and paying when they are at fault, they can agree to contracts which allow for adjustments in the case of adverse contingencies.
I think we can see their strengths combined in this. It marries clean theorizing about a model with dogged empirical work. (They conducted 125 interviews with software CEOs in three months!) It allows them to very cleanly test the question that Banerjee had in mind, an extension of the Grossman-Hart-Moore theory of the firm work. And it gives a sympathetic account of the actions of people in the poorest parts of the world — they have never been ones to consider anyone lesser to them.
Duflo is much more than simply an organizer of experiments, of course. Her most-cited paper, with Marianne Bertrand and Sendhil Mullainathan, merely completely changed the way we handle differences-in-differences. Differences-in-differences is a statistical technique for inferring the effects of something from observational data. Suppose, as in the example they give in the paper, we are interested in the effect of legislation on the wages of women. Difference-in-differences is predicated on the assumption that women’s wages in different states, before the passage of that law, were following the same trend, and that you can attribute the divergence after the law to that event. Somebody interested in studying this might get data from the Current Population Survey (or CPS) on a panel of women, observe their wages each year, and look at how they change after a law is passed.
The trouble with this is that the wages of a particular person are correlated over time, in what is called serial autocorrelation. They do not randomly vary over time! This means that each year is not an independent observation. They are adding to our sample size when we divide to find correlation, but are not actually providing any new information. We will reject the null hypothesis, and conclude that the legislation changed wages, far more often than we should.
Bertrand, Duflo and Mullainathan demonstrate this by taking the datasets mentioned, and randomly assigning legislative changes to years and states. At a 5% significance level, we should reject the null hypothesis exactly 5% of the time. Instead, they can reject it 45% percent of the time. It is still somewhat baffling to me why this took so long for people to figure out. As they say in the paper, serial autocorrelation was well-understood in theory, and these were not the first fumbling explorations of a new technique either. In six journals alone, they counted 92 papers over 10 years! Nowadays you cannot do one of the most common statistical techniques in modern day empirical work without taking her criticism, and proposed corrections, into account, and I think the entire profession is better off for it.
Her doctoral dissertation was a differences-in-differences study itself. I suspect that that is what started her thinking on the topic. In 1973, the Indonesian government embarked on a massive school building campaign, building 61,000 primary schools in six years. Enrollment greatly increased. The placement of schools was non-random, and very obviously non-random, but it is still possible to make use of the data. Exposure to education varied across both regions and time, so if you treat the effect of exposure to education as additive and linear you can compare the difference in the change of educational exposure to estimate the effect on wages. She estimates that a year of schooling has a rate of return between 6.8 and 10.6 percent, making education well worthwhile. These effects would get revised down by her later work – she considers how changes in schooling changes which skills are rewarded in the economy, which lowers the wages of prior generations; and in any event physical capital accumulation did not keep pace with human capital accumulation – but even that still regards education as plainly worthwhile. This was a big improvement over prior work on education, which was certainly trying hard, but was largely very bad.1
Duflo would continue to do exceptional work with non-RCT methods. “Dams”, with Rohini Pande, is one of my favorites, which introduced a completely novel (and brilliant) instrumental variable strategy we see used even today. Water, as is well-known in the literature, flows downhill. If a dam is built at an elevation of a thousand feet above sea level, it can easily irrigate downstream places which are at an elevation of 950 feet, but it cannot irrigate the places upstream at 1050 feet. In addition, you cannot simply build a dam anywhere. Dams for irrigation purposes can’t be too steep, while with dams for power generation, the steeper the better. When you have exogenous placement of dams, and otherwise similar places up and downstream, you can infer the effect of dams.
According to Duflo and Pande, dams benefit the places downstream, but actually make the districts upstream worse off. These effects are mediated through institutions, and places where landlords are responsible for tax collection saw bigger increases in poverty, even when the productivity shock was the same. They build off a paper by Banerjee and Lakshmi Iyer, who used differences in colonial era institutions due to takeovers by the British during the Princely State era. People would later use this method to study other things, notably the effect of sewers.
Still, they wanted to do more to help the world. They wanted to go out and test the theories. For that, I think we need to turn to their frequent coauthor, and co-laureate, Michael Kremer.
Nice, I like this. Have you considered crossposting the full content? Usually those get a lot more people reading them, and more visibility, though do note the CC-BY restriction.