CRAWFORD SCHOOL OF PUBLIC POLICY
AUSTRALIAN NATIONAL UNIVERSITY, 13 NOVEMBER 2018
In 1958, psychologist David Weikart took up the job of being director of special education in Ypsilanti, Michigan. At that time, schools were segregated, and all the African-American students in the town attended one primary school - the Perry School. Weikart noticed that the school was run down. Instead of a playground, it had a field filled with thistles. Many of the African-American students ended up repeating grades, entering special education or leaving school early.
Yet when Weikart gave a presentation to school principals about these problems, users responded defensively. One sat with arms tightly folded; others stood by the window smoking; a few left the room. When he pressed them to act, they said there was nothing they could do. Black students were just born that way. So Weikart came up with an alternative solution: 'Because I couldn't change the schools . . . well, obviously you do it before school.'
In the late 1950s the only institutions that looked anything like preschools were nursery schools, focused purely on play. By contrast, Weikart was interested in the work of psychologists such as Jean Piaget, which suggested that young children’s minds are actively developing from the moment they are born. But when it came to early intervention, Weikart noted, ‘There was no evidence that it would be helpful.
There wasn’t data.’ So he decided to put Piaget’s theories to their first rigorous test. In 1962 the Perry Preschool opened, for children aged three and four. About 100 children applied to enrol. Half were admitted, while half remained as a control group. The selection was random – literally made by the toss of a coin.
Former Perry Preschool teacher Evelyn Moore remembers how the program pushed back against the prevailing wisdom that a child’s intelligence was fixed, and that many of the children in the community were ‘retarded’. She saw something different – these children knew the names of baseball players. They recalled the words to songs. And their parents had hope. When Moore visited the families at home, she saw that almost all had pictures on the wall of two men – John F. Kennedy and Martin Luther King.
The preschool curriculum was highly verbal. Children visited a farm, a fire station and an apple orchard, where they picked apples and cooked them into apple sauce. Months later, in winter, they went back to the orchard to see the seasonal change. When Evelyn Moore asked the children where the apples had gone, one child reflexively replied, ‘Teacher, I didn’t take ’em.’
The Perry Preschool program lasted only two years, but over the coming decades researchers tracked the outcomes for those who had participated, and for the randomly selected control group. By the time they were in their twenties, those who had been to preschool were more likely to own a car, own a home and have a steady job. They were also less likely to use drugs and less likely to be on welfare. By age forty, a quarter per cent of those in the preschool group had been to jail, compared with half of the control group.
The leading economic analysis of the program estimates that for every $1 spent on Perry Preschool, the community gained between $7 and $12. By far the biggest benefit came from reduced crime, showing that if you target early intervention at people with a fifty-fifty chance of going to prison, you can change the lives of participants at a reasonable cost to the broader community.
But while randomised evaluations have underpinned significant intervention in early years programs, they have also shown that it’s not ‘game over’ after the first 1000 days of a child’s life. Schools matter – indeed, great schools can transform lives. One randomised evaluation looked at schooling in New York’s Harlem district. Outcomes for young people in Harlem were dreadful: a study once found that life expectancy for young men born in Harlem was lower than for those born in Bangladesh. Cocaine, guns, unemployment and family breakdown created an environment where disadvantage was perpetuated from one generation to the next.
Founded in 2004, Harlem’s Promise Academy is no ordinary school. It has an extended school day, with classes running starting at 8 am, and after-school activities often continuing until 7 pm. There are remedial classes on Saturdays, and the summer break is shorter than in most schools. The school operates on a ‘no excuses’ model, emphasising grit and perseverance. It is assumed that every child will go on to university. Both students and teachers are heavily monitored, with a strong focus on test score gains. With up to twenty applicants per place, the Promise Academy uses lotteries to allocate spots, an approach that allows researchers to compare outcomes across the two groups.
What difference did they find? One way to benchmark the impact is to note that the average black high school student in the United States is two to four years behind his or her white counterparts. Yet the mostly black students who won a lottery to attend the Promise Academy improved their performance by enough to close the black–white test score gap. As the lead researcher pointed out, this overturns the fatalistic view that poverty is entrenched, and schools are incapable of making a transformational difference. He claims that the achievements of the Harlem Children’s Zone are ‘the equivalent of curing cancer for these kids’.
The randomistas are also endeavouring to improve teaching. For example, the Bill and Melinda Gates Foundation recently conducted a randomised trial of coaching programs for teachers. Each month, teachers sent videos of their lessons to an expert coach, who worked with them to eliminate bad habits and try new techniques. By the end of the year, teachers in the coaching program had seen gains in their classroom equivalent to several additional months of learning.
The British Education Endowment Foundation has so far commissioned over a hundred evaluations, many of them randomised, to test what works in the classroom. Among those randomised evaluations that produced positive results are personal academic coaching, individual reading assistance, a Singaporean-designed mathematics teaching program, and a philosophy-based intervention encouraging students to become more engaged in classroom discussion.
With so many evaluations, they can readily compare the size of the results. To get a one-month improvement for one student, personal academic coaching cost £280, individual reading assistance cost £209, the mathematics teaching program cost £60, and the philosophy-based intervention cost £8.38 So while all the programs ‘worked’, some were a whopping thirty-five times more cost-effective than others.
In some cases, the Education Endowment Foundation trialled programs that sounded promising, but failed to deliver. The Chatterbooks program was created for children who were falling behind in English. Hosted by libraries on a Saturday morning and led by trained reading instructors the program gave primary school students a chance to read and discuss a new children’s book. Chatterbooks is the kind of program that warms the cockles of your heart. Alas, a randomised trial found that it produced zero improvement in reading abilities.
Another Education Endowment Foundation trial tested the claim that learning music makes you smarter. Students were randomly assigned either to music or drama classes, and then tested for literacy and numeracy. The researchers found no difference between the two groups; suggesting either that learning music isn’t as good for your brain as we’d thought, or that drama lessons are equally beneficial.
In a similar vein, a recent randomised trial of free school breakfast programs in New Zealand schools found that it reduced hunger rates (by 8.6 units on the ‘Freddy satiety scale’, in case you’re curious). However, free breakfasts did not improve school attendance or academic achievement for low-income children. Educational randomistas are even evaluating how to get more low-income children to university.
In Ohio and North Carolina, researchers worked with tax preparation company H&R Block to identify low-income families with a child just about to finish high school. Half of these families were randomly offered assistance in completing a university financial aid application, a process that took about eight minutes. Two years later, the children of those who had received help applying for financial aid were one-quarter more likely to be enrolled at university.
Because children whose parents did not attend university often lack basic information about the college application process, modest interventions can have large impacts. In Ontario, a three-hour workshop for Year 12 students raised college attendance rates by one-fifth, relative to a randomised control group. In regional Massachusetts, peer support provided by text message raised the odds that Year 12 students would enrol in college.
* * *
For the most affluent, it doesn’t matter much whether government works. They can rely on private healthcare, private education, and private security. They are less likely to be unemployed, and have family resources to draw upon in hard times. For the top 1 percent, dysfunctional government is annoying, but not life-threatening.
But for the most vulnerable, government can mean the difference between getting a good education or struggling through life unable to read and write. Those who depend on government depend on knowing that the programs government is delivering actually work.
In Melbourne, the Sacred Heart Mission has been working closely with long-term homeless people since 1982. A few years ago, the organisation proposed to trial a new intensive casework program, targeted at people who had been sleeping rough for at least a year. When they pitched the idea to their philanthropic partners, one donor urged that it be evaluated through a randomised trial.
Guy Johnson, who worked in community housing and would eventually help conduct the research, was pretty sceptical at first. People in the community sector, he told me, ‘freak out at the word experimental’, and prefer to select participants based on need, not chance. But Johnson came to regard randomisation not only as the most rigorous method for evaluating the program, but also the fairest way of deciding who got the service.
The ‘Journey to Social Inclusion’ experiment was Australia’s first randomised trial of a homelessness program. For the forty or so people in the treatment group, it provided intensive support from a social worker, who was responsible for only four clients. This caseworker might help them find housing, improve their health, reconnect with family and access job training. Another forty people in the control group did not receive any extra support.
What might we expect from the program? If you’re like me, you’d have hoped that three years of intensive support would see all participants healthy, clean and employed. But by and large, that’s not what the program found. Those who were randomly selected into the program were indeed more likely to have housing, and less likely to be in physical pain. But Journey to Social Inclusion had no impact on reducing drug use or improving mental health. In fact, those who received intensive support were more likely to be charged with a crime. At the end of three years, just two people in the treatment group had a job – the same number as in the control group.
While it’s disappointing that the program didn’t bring most participants back into mainstream society, it’s less surprising once you begin to learn about the people it seeks to assist. In many cases, they were abused in childhood (the mother of one participant used to put Valium in the child’s breakfast cereal). Most had used drugs for decades, and they were used to sleeping rough. Few had completed school or possessed the skills to hold down a regular job. If they had children of their own, more often than not they had been taken away by child protection services.
The Journey to Social Inclusion program is a reminder of how hard it is to turn around the living standards of the most disadvantaged. If you’ve been doing drugs for decades, your best hope is probably a stable methadone program. If you’re in your late forties with no qualifications and no job history, a stable volunteering position is a more realistic prospect than a steady paycheck.
Unless we properly evaluate programs designed to help the long-term homeless, there’s a risk that people of goodwill – social workers, public servants and philanthropists – will fall into the trap of thinking it’s easy to change lives. There are plenty of evaluations of Australian homelessness programs that have produced better results than this one. But because none of those evaluations was as rigorously conducted as this one, there’s a good chance they’re overstating their achievements.
Blockbuster movies are filled with white knights and magic bullets, moon shots and miracles. Yet in reality most positive change doesn’t happen suddenly. From social reforms to economic change, our best systems have evolved gradually. Randomised trials put science, business and government on a steady path to improvement. Like a healthy diet, the approach succeeds little by little, through a series of good choices.
The incremental approach won’t remake the world overnight, but it will over a generation.
Randomised trials flourish where modesty meets numeracy. As British randomista David Halpern puts it: ‘We need to turn public policy from an art to a science.’ This means paying more attention to measurement, and admitting that our intuition might be wrong. One of the big thinkers of US social policy, Senator Daniel Patrick Moynihan, recognised that evaluations can often produce results which are solid rather than stunning. When faced with a proposed new program, Moynihan was fond of quoting Rossi’s Law (named after sociologist Peter Rossi), which states: ‘The better designed the impact assessment of a social program, the more likely is the resulting estimate of net impact to be zero.’ Rossi’s Law does not mean we should give up hope of changing the world for the better. But we ought to be sceptical of anyone peddling panaceas. The belief that some social programs are flawed should lead to more rigorous evaluation and patient sifting through the evidence until we find a program that works.
The best randomistas are passionate about solving a social problem, yet sceptical about the ability of any particular program to achieve its goals. Launching an evaluation of her organisation’s flagship program, Read India, Rukmini Banerji, told the audience: ‘And of course [the researchers] may find that it doesn’t work. But if it doesn’t work, we need to know that. We owe it to ourselves and the communities we work with not to waste their and our time and resources on a program that does not help children learn. If we find that this program isn’t working, we will go and develop something that will.’
* * *
Randomised trials don’t have to be expensive or time-consuming. One firm in the United States offered employees up to $750 if they could quit smoking for a year. Those randomly chosen for the program were 10 percentage points more likely to quit. It turned out that an effect this large means that it would be worth firms with plenty of smokers offering the program even if they did not care about the health of their employees. That’s because smokers take more breaks during the day, and more days off during the year.
Another simply randomised trial was conducted by the German government in 2010. They posted out a cheerful blue brochure to over 10,000 people who had recently lost their jobs. ‘Bleiben Sie aktiv!’ (‘Stay active!’), the leaflet urged unemployed people. The leaflet boosted employment rates among those who received it. Each leaflet cost less than €1 to print and post, but boosted earnings among the target group by an average of €450. If you know another government intervention with a payoff ratio of 450 to 1, I want to hear about it.
In 2013 the Obama White House, working with a number of major foundations, announced a competition for low-cost randomised trials. The aim was to show that it was possible to evaluate social programs without spending millions of dollars. From over fifty entries, the three winners included a federal government department planning to carry out unexpected workplace health and safety inspections, and a Boston non-profit providing intensive counselling to low-income youth hoping to be the first in their family to graduate from college. Each evaluation cost less than $200,000. The competition continues to operate through the Laura and John Arnold Foundation, which has announced that it will fund all proposals that receive a high rating from its review panel.
* * *
What is holding us back from conducting more randomised trials? When parliamentarians are probed on their misgivings, the chief concern is fairness. Half of Australian politicians and one-third of British politicians worry that randomised trials are unfair. As medical writer Ben Goldacre points out: ‘We need to get better at helping them to learn more about how randomised controlled trials work . . . Many members of parliament say they’re worried that randomised controlled trials are “unfair”, because people are chosen at random to receive a new policy intervention: but this is exactly what already happens with “pilot studies”, which have the added disadvantage of failing to produce good quality evidence on what works, and what does harm.’
Rejecting randomised trials on the grounds of unfairness also seems at odds with the fact that lotteries have been used in advanced countries to allocate school places, housing vouchers and health insurance, to determine ballot order, and to decide who gets conscripted to fight in war.
One way of thinking about the ethical issue in randomisation is that it turns on what we know about a program’s effectiveness. Adam Gamoran, a sociologist at the University of Wisconsin– Madison, agrees that if you are confident that a program works, then it is unethical to conduct a randomised trial. But if you are ignorant about whether the program works, and a randomised trial is feasible, Gamoran argues that it is unethical not to conduct one.
The problem is that we live in a world in which failure is surprisingly common. In medicine, only one in ten drugs that looks promising in lab tests ends up getting approval. In education, only one-tenth of the randomised trials commissioned by the US What Works Clearinghouse produced positive effects. In business, just one-fifth of Google’s randomised experiments helped them improve the product. Rigorous social policy experiments find that only a quarter of programs have a strong positive effect. Once you raise the evidence bar, a consistent finding emerges: most ideas that sound good don’t actually work in practice.
How do we institutionalise randomised trials?
In 2010, the British government became the first to establish a so-called ‘Nudge Unit’, to bring the principles of psychology and behavioural economics into policymaking. The interventions were mostly low-cost – such as tweaking existing mailings – and were tested through randomised trials wherever possible. In some cases they took only a few weeks. Since its creation, the tiny Nudge Unit has carried out more randomised experiments than the British government had conducted in that country’s history. Following the British model, Nudge Units have been established by governments in Australia, Germany, Israel, the Netherlands, Singapore and the United States, and are being actively considered in Canada, Finland, France, Italy, Portugal and the United Arab Emirates.
* * *
But we can do more. Over recent years, a range of Australian reports from the Auditor-General, the Productivity Commission, the Australian Institute of Health and Welfare, the former COAG Reform Council and the Grattan Institute have highlighted the need for better evaluation of government programs.
Last year, the House of Representatives Standing Committee on Tax and Revenue brought down a bipartisan report recommending that the tax office ‘make greater use of behavioural insights techniques, such as randomised controlled trials, before full implementation of new initiatives to determine if such changes are indeed better than current practices, and if so, which changes are the most effective.’
There have been productive discussions in the public service around improving the quality of evaluation, including in the Office of Development Effectiveness at the Department of Foreign Affairs and Trade, the Indigenous Evaluation Committee in the Department of the Prime Minister and Cabinet, and the BETA unit, also in PM&C.
But at present, the evaluation conversation is too fragmented and ad hoc. That’s why I am pleased to announce today that a Shorten Labor Government will create an office of ‘Evaluator General’ within the Treasury.
The mandate of the Evaluator General will be to work with department across the government to conduct high-quality evaluations of government programs – preferably randomised trials.
The Evaluator-General will collaborate with existing evaluation bodies such as BETA and the Office of Development Effectiveness. It will also work with the Evidence Institute for Schools, a body that Labor has announced we will create within the Department of Education and Training. However, while the Evidence Institute for Schools will both synthesise existing research and produce fresh findings, the focus of the Evaluator-General will be on conducting new evaluations.
I acknowledge the work that Nicholas Gruen has done in this area. As he points out, there is considerable value from creating a body in which people are able to develop a true expertise in evaluation, and which offers career progression. But while Dr Gruen has proposed a model in which the Evaluator-General sits outside government, perhaps with a status akin to the Auditor-General, our approach will be for the Evaluator-General to be located within government. In our view, this is likely to produce better results for the community, because it creates a more collaborative relationship between program experts and evaluation experts.
Unlike auditing, good evaluation is very hard to do afterwards. This is particularly true of randomised trials, which must be set up before a program is rolled out. We also see the Evaluator-General as being better able to encourage departments to make effective use of administrative data and if it takes a collaborative approach rather than purely playing an oversight role.
The Evaluator-General will be funded with $5 million per year, starting in 2019-20.
* * *
Over the course of the twentieth century, randomised trials have turned health care into a profession that relied on ‘eminence-based medicine’ to one grounded in ‘evidence-based medicine’. Companies like Netflix, Coles, United Airlines, Amazon and Google have built randomised trials into their business model. Intuit founder Scott Cook aims to create a company that’s ‘buzzing with experiments’. Whatever happens, Cook tells his staff, ‘you’re doing right because you’ve created evidence, which is better than anyone’s intuition’. If you used the internet today, it’s likely you were part of a randomised trial.
Yet when it comes to social policy, the vast majority of programs designed to help the most vulnerable are grounded more in greybeard beliefs than empirical evidence. The alternative to rigorous evaluation is often to ask the HiPPO – the highest paid person’s opinion.
As Australia faces challenges such as inequality, climate change, and Indigenous disadvantage, it’s time we raised the evidence bar. If governments want to boost innovation and raise productivity, it’s vital that we have the best tools for the job. At a time when government budgets are under pressure, there’s no excuse for continuing to fund programs that don’t work.
Conducting more randomised evaluations isn’t an excuse to give up on the problem.
We don’t abandon the search for a cure for cancer just because most cancer drugs to emerge from the laboratory don’t make it through clinical trials. Similarly, the goals of cutting crime, raising test scores, or achieving full employment should be pursued even if a specific program comes up short.
The more we ask the question ‘What’s your evidence?’, the more likely we are to find out what works – and what does not. By evaluating social policies, discarding those that don’t work, and boosting those that do, government can have a far greater impact on reducing poverty. An experimenting society is likely to become a fairer society.
Scepticism isn’t the enemy of optimism: it’s the channel through which our desire to solve big problems translates into real results.
Given the chance, randomistas can deliver a better world, one coin toss at a time.
Authorised by Noah Carroll ALP Canberra