Great potential in careful use of big data - Speech, House of Representatives

HOUSE OF REPRESENTATIVES, 30 MARCH 2022

Fifty-five years ago, in 1967, Edward Gough Whitlam gave the budget reply speech. In that speech he said, 'One of the problems in discussing health policy in Australia is the lack of reliable official information.' Fifty-five years on, not as much has changed as we might have liked. As my co-author Philip Clarke, the director of the Health Economics Research Centre at Nuffield's Department of Population Health at Oxford University, notes, 'There is still great potential to learn more from using administrative data on improving health efficiency.' In Philip's paper—co-authored with Xinyang Hua, Guido Erreygers, John Chalmers and Tracey-Lea Laba, in Health Policy—when they used linked Medicare data, in the first year of life Medicare spending was actually regressive. Their paper notes that analysis of out-of-pocket expenditure could be much more detailed if there was better access to linked administrative data and suggests a number of important ways forward.

Yet the use of health data by researchers has been plagued by the incompetence of this government when it comes to releasing those data. An article in the University of Melbourne's magazine Pursuit in 2017, by Vanessa Teague, Chris Culnane and Ben Rubinstein, is titled 'The simple process of re-identifying patients in public health records'. It notes that in September 2016 the government put out an MBS/PBS sample dataset and the researchers found that the encryption of supplier IDs was easily reversed. They informed the department and the dataset was taken offline. Then in December 2016 the same researchers found that patients could be re-identified by using known information about the person to find their record. The researchers found unique records matching three current or former members of parliament and a number of other prominent Australians. They informed the department in December 2016 and, again, the dataset was taken down.

But rather than acknowledge the honesty of the cryptographers who made clear to the department the flaws in the encryption methodology, this government set about attacking those researchers—attacking the very researchers who'd noted the flaws in what the government put online. The then Attorney-General, George Brandis, announced plans to criminalise the act of re-identifying previously de-identified data, as though somehow it was alright for the government to put out a dataset in which individuals could be re-identified; they'd just make it a crime to do so. Fortunately, that legislation didn't pass, but the ongoing attacks on Vanessa Teague ultimately led to her having to quit the University of Melbourne.

There is great potential to be had from the careful use of big data. But we on this side of the House recognise the importance of maintaining strict privacy protections. Thanks to the work of the member for Maribyrnong, Bill Shorten, a number of troubling aspects of this bill have been removed or overcome, including overcoming the lack of privacy protections and safeguards, removing the scope of the scheme to extend to foreign organisations, removing the scope of the scheme to extend to the private sector, and introducing a three-year review and a five-year sunset clause. That reflects the strong views of those of us on this side of the House that big data can make a difference in improving people's lives but that it must be done with care and appropriate scrutiny.

That, of course, was the big flaw in the RoboDebt scandal, which had its genesis in the belief by those opposite that it was possible to simply use big data analytics to chase down welfare recipients who'd allegedly done the wrong thing. RoboDebt was not the first time in which big data analytics had been used to detect fraud. What made it different was removing the human element from the system. As Cathy O’Neil’s terrific book Weapons of Math Destruction has made clear, algorithms can be dangerous. They can replicate existing biases in court sentencing processes. Algorithms, if misused in the assessment of teachers, can produce results which are more strongly driven by noise than signal. Indeed, the recent book by Daniel Kahneman, Olivier Sibony and Cass Sunstein, Noise: A Flaw in Human Judgment, has made clear that we need to be very careful about the use of big data in a way that allows noise to dominate signal.

We have, in public policy, too many examples of algorithms in big data being misused. And, yet, when they're used right, they can be incredibly helpful. Georgia State university is currently using predictive analytics to spot students in danger of dropping out, not just looking at their overall GPA but their scores in particular classes. They have increased their four-year graduation rate as a result. We have seen careful work being done by Raj Chetty and his co-authors, including Nathaniel Hendren and John Friedman, in using big data analytics to look at mobility across the United States—for example, noting that your chances of going from the bottom fifth to the top fifth of the income distribution are three times greater in San Jose, California than they are if you grow up in Charlotte, North Carolina. A similar analysis by Sarah Merchant using tax records covering almost the entire US population from 1989 to 2015 has discovered that the income gap between blacks and whites persists for generations and is driven entirely by differences in wages and employment between black and white men rather than women, and found that it's smaller for black boys who grow up in neighbourhoods with lower poverty rates.

In New Zealand, the Integrated Data Infrastructure has found that the probability of moving from low pay to high pay is not as high when the analysis is done using detailed monthly income records as when carried out using patchy surveys. New Zealand has also used its Integrated Data Infrastructure to analyse the potential spread of COVID and to identify those areas that are most at risk. So, we do need to make sure, as Martin Kurzweil has pointed out, that ‘algorithm is not destiny’, that it's important that human judgement is never removed from the process. But, if we do that and we're careful about our use of big data, there is great potential for government to partner with academic institutions in order to improve lives for all Australians and to better fulfil that goal that Gough Whitlam laid out in his budget reply speech of 1967.

Andrew Leigh

Great potential in careful use of big data - Speech, House of Representatives

Be the first to comment

Sign in with

Stay in touch

Search

Great potential in careful use of big data - Speech, House of Representatives

Be the first to comment

Sign in with

Or sign in with email

Create an account

Stay in touch

Search