Randomised Trials Motion

I moved a motion today on randomised trials:

DR LEIGH: To move—That this house:

(1) reaffirms this Government’s commitment to evidence-based policy making;

(2) notes that:

(a) the Productivity Commission has highlighted the importance of rigorous evaluation in assessing the impact of social, educational, employment and economic programs; and

(b) randomised policy trials are increasingly being used as an evaluation tool in developed and developing nations; and

(3) supports measures to increase the quality of evaluations, and calls on the Government to consider whether randomised policy trials may be implemented to evaluate future Government policies.

Here's what I had to say:

Evidence-Based Policy – Private Members’ Motion
28 February 2011

No government has been more committed to evidence-based policy than ours. In areas from water reform to climate change, from foreign aid to schools reform, activity-based health funding to fiscal stimulus, Labor has drawn on the best knowledge of experts in the field. What drives those of us on this side of the House is not a love of particular programs, but a hope that our time in public life will help leave Australia more prosperous and more tolerant, with a cleaner environment and jobs for the future.

To achieve these goals, we need to keep finding better ways to evaluate our policies. As a former economics professor, I can assure the house that this is particularly hard in the case of social policies. Unlike scientific experiments, evaluations of social policies are particularly tricky. We don’t always get the right answer from simple before/after evaluations, nor from comparisons of those who opted in with those who opted out.

A great advantage of randomised trials is that participants are allocated to the treatment or control group by the toss of a coin. The beauty of randomisation is that, with a sufficiently large sample, the two groups are very likely to be identical, both on observable characteristics and on unobservable characteristics. The only difference between the treatment and control groups is the intervention itself. So if we observe statistically significant differences between the two groups, we can be sure that they are due to the treatment and not to some other confounding factor.

In Australia, our farmers have used randomised evaluations for over a century, and our medical researchers have used randomised evaluations for over half a century. Yet social policy random evaluations are much rarer.

One exception is the New South Wales Drug Court trial, conducted in 1999–2000. Offenders were referred to the Drug Court from local or district courts, underwent a detoxification program and were then dealt with by the Drug Court instead of a traditional judicial process. At the time it was established, the number of places in detoxification was limited, so participants in the evaluation were randomly assigned either to the treatment group or the control group. They were then matched to court records in order to compare reoffending rates over the next year or more. The evaluation found that the Drug Court was effective in reducing the rate of recidivism, and that while it was more expensive than the traditional judicial process, it more than paid for itself.

In the case of the Drug Court, many of us probably had an expectation that the policy would reduce crime. But high-quality evaluations do not always produce the expected result. Staying for a moment with criminal justice interventions, take the example of ‘Scared Straight’, a program in which delinquent youth visit jails to be taught by prison staff and prisoners about life behind bars. The idea of the program — originally inspired by the 1978 Academy Award winning documentary of the same name — is to use exposure to prison to frighten young people away from a life of crime. In the 1980s and 1990s, several US states adopted Scared Straight programs.

Low-quality evaluations of Scared Straight, which simply compared participants with a non-random control group, had concluded in the past that such programs worked, reducing crime by up to 50 percent. Yet, after a while, some US states began carrying out rigorous randomised evaluations of Scared Straight. The startling finding was that Scared Straight actually increased crime, perhaps because youths discovered jail was actually not as bad as they had thought. It was not until policy makers moved from second-rate evidence to first-rate evidence that they learned the program was harming the very people it was intended to help.

Being surprised by policy findings is perfectly healthy. Indeed, we should be deeply suspicious of anyone who claims that they know what works based only on theory or small-scale observation. As economist John Maynard Keynes once put it when asked why he had changed his position on monetary policy during the Great Depression: ‘When the facts change, I change my mind. What do you do, sir?’

One common argument made against randomised trials is that they are unethical. Critics say: when you have a program that you think is effective, how can you toss a coin to decide who receives it? The simplest answer to this is that the reason we are doing the trial is precisely because we do not know whether the program works. The great benefit of a randomised trial is that it gives us solid evidence on effectiveness, and allows us to shift resources from less effective to more effective social programs.

We should not lightly dismiss ethical concerns about randomised trials, but they are often overplayed. Medical researchers, having used randomised trials for several decades longer than social scientists, have now grown relatively comfortable with the ethics of randomised trials. Certain medical protocols could be adapted in social policy, such as the principle that a trial should be stopped early if there is clear evidence of harm, or the common practice of testing new drugs against the best available alternative.

One example, again from New South Wales, helps to illustrate this. Since 2005, an NRMA CareFlight team, led by Alan Garner, has been running the Head Injury Retrieval Trial (HIRT), which aims to answer two important questions: Are victims of serious head injuries more likely to recover if we can get a trauma physician onto the scene instead of a paramedic? And can society justify the extra expense of sending out a physician, or would the money be better spent in other parts of the health system?

To answer these questions, Garner’s team is running a randomised trial. In effect, when a Sydney 000 operator receives a report of a serious head injury, a coin is tossed. Heads, you get an ambulance and a paramedic. Tails, you get a helicopter and a trauma physician. Once 500 head injury patients have gone through the study, the experiment will cease and the results will be analysed.

When writing a newspaper article about the trial, I spoke with Alan Garner, who told me that, although he has spent over a decade working on it, even he does not know what to expect from the results ‘We think this will work’, he told me in a phone conversation, ‘but so far, we’ve only got data from cohort studies.’ Indeed, he even said, ‘Like any medical intervention, there is even a possibility that sending a doctor will make things worse. I don’t think that’s the case, but [until HIRT ends] I don’t have good evidence either way.’

What is striking about Garner is his willingness to run a rigorous randomised trial, and listen to the evidence. Underlying HIRT is a passionate desire to help head injury patients, a firm commitment to the data and a modesty about the extent of our current knowledge. High-quality evaluations help drive out dogma. As US judge Learned Hand famously said, ‘The spirit of liberty is the spirit which is not too sure that it is right’.

Naturally, randomised trials have their limitations. Not all questions are amenable to randomisation. Like the kinds of pilot programs that we run all the time, randomised trials do not necessarily tell us how the program will work when it is scaled up, and they’re not very good at measuring spillover and displacement effects.

Because of these limitations, it is unlikely that we would ever want 100 per cent of government evaluations to be randomised trials. Most likely, the marginal benefit of each new randomised trial is a little lower than that of the previous one. At some point, it is indeed theoretically possible that we could end up doing more randomised trials than is socially optimal.

However, this is unlikely to ever occur, at least in my lifetime. My best estimate is that less than 1 per cent of all government evaluations are randomised trials (excluding health and traffic evaluations, the proportion is probably less than 0.1 per cent). Another way to put this is that, to a first approximation, Australia currently does no randomised policy trials. Governments throughout Australia could safely embark on a massive expansion of randomised policy trials in Australia before we come close to the point where the costs exceed the benefits.

Finally, one way that we might expand randomised policy trials is to learn from the US, where federal legislation sometimes sets aside funding for states to conduct randomised evaluations. The Second Chance Act for rehabilitating prisoners, the No Child Left Behind school reform law, and legislation to improve child development via home visits are just some of the US laws in which the federal government explicitly puts aside a portion of program funds for states to run random assignment evaluations.

What we need in Australian policy today is not more ideologues, convinced that their prescriptions are the answer, but modest reformers willing to try new solutions, and discover whether they actually deliver results.

Update, 14/3: Thanks to Ross Gittins for a generous mention in his column.

Do you like this post?

Andrew Leigh

Randomised Trials Motion

Stay in touch

Search