Brother's Art - stock.adobe.com

Melbourne researchers uncover privacy lapses in transport dataset

A team of University of Melbourne researchers has been able to re-identify individuals from a public transport dataset, raising serious privacy, safety and security issues

Researchers at the University of Melbourne have managed to re-identify individuals from a public transport dataset that was released as part of a data science competition.

The dataset comprises several data points – such as boarding and alighting locations, as well as card types – collected from 15 million Myki transport cards used by commuters who had travelled on trains, tram and buses in Melbourne and other parts of Victoria between 2015 and 2018.

Although Myki card numbers, which were mapped to card IDs in the dataset, were not included, the researchers were able to identify themselves, a co-traveller and a member of parliament (MP) in Victoria, along with details of their daily routines.

To identify the MP, the researchers correlated the data from the Myki dataset, including train station locations and the use of a state parliamentarian card, with the MP’s tweets about train travel.  

“With just a handful of pieces of information about where someone boards or exits public transport, it is possible to get an indication of where they live or work, their regular travel patterns, who they travel with, or if they travel alone – for example, children heading home from school alone,” said Chris Culnane, lead researcher from the University of Melbourne’s school of computing and information systems.

“Our analysis raises serious privacy, safety and security issues. It is easy to imagine how information like this could be used by people who might want to cause harm,” he added.

Culnane suggested the data release could have been done better by using frameworks such as differential privacy, which makes it possible to collect and share aggregate information about user habits while maintaining user privacy.

For example, the Melbourne researchers pointed to a similar dataset released by Transport for New South Wales that included only the total number of “touch-on” or “touch-off” events at each location and time.

“Even if you know several precise events for a person, there is no way of retrieving other events on the same card,” they wrote in a research paper. “These totals are aggregated into quarter-hour time blocks, and then mechanisms from differential privacy are applied to obscure the exact totals.”

Read more about privacy and data protection in Australia

But the researchers acknowledged that such techniques for privacy protection do remove detail and connections, which would make it difficult to undertake trip or journey analysis at a level that was possible with the Myki data.

In response to the researchers’ findings, the Office of the Victorian Information Commissioner (OVIC) released a report today, noting that “deficiencies in governance and risk management in relation to data can undermine the protection of privacy, even where the project is well-intentioned”.

Victorian information commissioner Sven Bluemmel said in a foreword message in the OVIC report: “The report also highlights that some of the assumptions made about data de-identification and release several years ago need to be revisited.

“Where a dataset contains unit-level data about individuals, especially where it contains longitudinal unit-level data about behaviour, more recent research indicates that such material may not be suitable for open release, even where extensive attempts have been made to de-identify it.”

To prevent similar lapses in protecting the privacy of individuals in open data initiatives, the OVIC recommended developing policies for data release decisions, building data capability across Victoria’s public sector and implementing a data governance programme, among other measures.

Read more on Data quality management and governance