Javier Castro - Fotolia

Panama Papers technology shows data journalism at its best

The open source data analytics and visualisation software of the Panama Papers project empowers a new kind of journalism, says the International Consortium of Investigative Journalists

A battery of mostly open source data analysis software made possible the investigative journalism operation that revealed how rich and powerful people hide their money.

Mar Cabra, head of the data and research unit at the International Consortium of Investigative Journalists (ICIJ), says that the Panama Papers project has “made me believe in the human race, seeing how journalists could share so much without knowing each other. And without the technology we would not be able to see that.”

A leak of 11.5 million files, amounting to 2.6TB of data, from Panamanian law firm Mossack Fonseca to the German newspaper Süddeutsche Zeitung hit the headlines in April 2016. The BBC and The Guardian also mined the document trove, but the technology lynch-pin of the project was the Washington-based ICIJ.

The organisation was founded in 1997 as a project for the Center for Public Integrity, an investigative news organisation founded by journalism professor Charles Lewis in 1989. The project’s specific mission has been cross-border investigations, and its recent focus on the offshore economy has been shared with UK media outlets, The Guardian and the BBC.

The original Mossack Fonseca leak – from a person whose identity is still unknown – was to Süddeutsche Zeitung’s investigative reporter Bastian Obermayer, who took the story forward with his colleague Frederik Obermaier.

Graphed and visualised

The ICIJ has stored the material in a graph database, donated pro bono by Neo4j, rendering it visual by using data visualisation software from Linkurious.

Mar Cabra describes how the 370-plus journalists from more than 100 media outfits in 80 countries have mined the Panama Papers material and shared leads with each other.

As a general rule, she says: “We tell them what the story is, and they decide whether they want put resource into it. All the journalists work together. It is a shared cost and investigation from the beginning.”

The platforms

The ICIJ processes the data – from this project and others – and “enables online platforms in the cloud, so the journalists can share leads”. In the Panama Papers instance there are three platforms Cabra says: “our own ‘Facebook’ – the global iHub, which is based on what was originally the Oxwall dating platform; Apache Solr for indexing and search, together with Project Blacklight, which is a user front-end for document sharing; and finally, Linkurious working with data in Neo4j.

“We try to use open source technologies because we can improve and adapt,” says Cabra. “The dating social networking tool is a good example of that; Project Blacklight, too, which is used in university libraries.”

Cabra argues that the technology platforms developed by the organisation will help with any future investigative project. “We live in an electronic world, and I can’t imagine any future project that would not involve documents. We have a process and a model whereby we can scale up and have hundreds of journalists working together; it is just a matter of adding users. But we won’t always deal with leaks of millions of documents.”

Connections made visible

Cabra describes Linkurious and Neo4j as an indispensable combination. “Our brains are not wired to see connections visually in an easy way. But the world is connected. And just by clicking on dots on a screen you can see connections you would otherwise not have been able to see.”

She gives the example of the Offshore Leaks database that the ICIJ hosts. “In the first month of publication we got five million visitors. Within 24 hours of publication [on 11 May 2016], The Times of London found a story about Emma Watson [the Harry Potter actress] and her ownership of a house under the name of an offshore company [registered in the British Virgin Islands], Falling Leaves. You could never have done that without a graph database, or it would have taken a lot longer.”

Read more about data journalism

She adds: “Look at the Pentagon Papers [about the American expansion of the Vietnam War with bombing campaigns in Cambodia and Laos] in the US, published by The New York Times [in 1971]. That entailed going through boxes of documents over a long period of time.”

When they looked at their first big offshores leak in 2012, the team was drawing lines on paper on Word documents, she recalls. It was a blog post about that first leak by Linkurious that put her and her team onto such data visualisation software in the first place.

Journalism goes viral

Cabra concludes that the ICIJ has, with its publication of offshore economy documents and the wrapping up of those in the graph database and visualisation technologies, “opened up the power of investigation to the crowd. We have received 35 million plus page views, and we are getting so many leads that it is all about more than what the journalists are doing.”

The two Süddeutsche Zeitung journalists who got the story originally have now published a book about the Panama Papers, and there is to be a film. “It is interesting that it is a collaborative book. The Germans have written the book, but there is one chapter on each country by the local journalists,” including her own contribution on Spain.

Read more on Big data analytics