the_lightwriter - stock.adobe.co

Mystery surrounds leak of four billion user records

Threat researchers uncover four billion user records on a wide-open Elasticsearch server but who left them there is a mystery

Personal data relating to over a billion people, including email addresses, phone numbers, and LinkedIn and Facebook profile information, has been leaked online via an open and unsecured ElasticSearch server, but its actual source is shrouded in mystery.

The data was uncovered on 16 October 2019 by researchers Bob Diachenko and Vinny Troia of threat intelligence platform Data Viper. Diachenko and Troia accessed and downloaded the data via a web browser without any password or authentication needed.

In a blog posting detailing the disclosure, Troia said that he uncovered four Terabytes of data spanning four separate indexes, labelled “PDL” and “OXY”.

The first dataset contained, among other things, data on 1.5 billion unique individuals, a billion personal email addresses including work emails for millions of decision makers in Canada, the UK and the US, 420 million LinkedIn URLs, a billion Facebook URLs and IDs, over 400 million phone numbers and 200 million valid US mobile phone numbers. The second dataset contained scraped data from LinkedIn profiles, including information on recruiters.

Based on his analysis, he said, this led him to believe that the data originated at two data aggregation companies, People Data Labs and OxyData.io. However, on contacting both companies, as per Wired, which first reported the story, Troia was told the server in question did not belong to either of them.

Following further investigations, Troia revealed that he was unable to find any evidence to contradict these denials, even though he was able to determine that both the datasets matched up with data held by both firms. In particular, a crucial piece of evidence that would seem to exonerate People Data Labs was the fact that that its API appears to use AWS, while the unsecured Elasticsearch server was found in Google Cloud.

“This is an incredibly tricky and unusual situation,” wrote Troia. “The lion’s share of the data is marked as ‘PDL’, indicating that it originated from People Data Labs. However, as far as we can tell, the server that leaked the data is not associated with PDL.

Read more about data breaches

  • The Supreme Court has heard an appeal from retailer Morrisons as it attempts to overturn prior judgments holding it liable for a 2014 leak of employee data.
  • For all of the talk about data breach class action lawsuits, virtually none of them reach a courtroom. Here's why and how data breach lawsuits almost always end in settlements.
  • Before a network breach occurs, you should already have a response plan in place. To make sure you're taking a proactive approach, follow this network security checklist.

“How did this mystery organisation get the data? Are they a current or former customer? If so, the data discovered on the server indicates that this company is a customer of both People Data Labs and OxyData.”

Troia theorised that if it was indeed a customer with legitimate access to both datasets, that would indicate the data had been misused rather than stolen, although he acknowledged this would be scant comfort to anybody involved.

More worryingly, he wrote, the technicalities of the data leak made it very hard to identify who was responsible. Google, for example, would not share information on its customers, and while law enforcement can request this information, they have no authority to force disclosure.

Both originating companies would also be able to argue the mystery server owner was responsible, even though there was a strong case that morally, they should notify those involved.

“Due to the sheer amount of personal information included, combined with the complexities identifying the data owner, this has the potential raise questions on the effectiveness of our current privacy and breach notification laws,” wrote Troia.

Leak a big deal

While the leak lacks the sort of personal information – such as passwords or credit card details – that would render it valuable to cyber criminals, the fact that it exposes email addresses, phone numbers and social media profiles is still a big deal, according to CyberArk’s senior vice-president of EMEA, Rich Turner.

“[This] makes a phishing expedition or an attempt to otherwise find, profile and compromise high-value targets – individuals or organisations – that much easier,” he said.

“The vast amount of data in the repository contained enough intelligence and detail to launch a well-targeted campaign which would allow a motivated group or individuals to obtain access, credentials and other highly valued information.”

“Over the years, hundreds of billions of online accounts have been exposed, meaning that personal information on every human on the face of the earth has been stolen 20 times or more,” said Cybereason chief security officer Sam Curry.

“This latest exposure is like astronomy: billions and billions ceases to be personal or mean anything. In reality, this data breach is a stark reminder that consumers need to rethink their own security hygiene. Today, everyone should assume their private information has been stolen numerous times and will continue to be accessible to a growing number of threat actors.”

Read more on Data breach incident management and recovery