A Data Leak for the Record Books

A Data Leak for the Record Books

Some 4 terabytes of personally identifiable information, contained in 1.2 billion records, were found exposed online on an unsecured server. This vast number of records supposedly makes this leak the second largest data exposure incident of all time.

What Did the 1.2 Billion Records Contain?

The unsecured server contained personal information of hundreds of millions of people. Amongst the exposed data were 622 million email addresses and 50 million unique home and mobile phone numbers. The data also contained social media profiles for Facebook, Twitter, LinkedIn and Github. Additionally, it held work histories, possibly pulled from LinkedIn, including employers, locations and job titles. What is more, the data was associated with names which made the data personally identifiable.

The sheer volume of data that was exposed in this leak is staggering, not to mention alarming. Luckily, the data did not contain sensitive information like passwords, credit card details or social security numbers. Nonetheless, such a treasure trove of information would represent a gold mine for identity thieves, phishers and other online scammers.

As Vinny Troia, an independent researcher said: “This is the first time I’ve seen all these social media profiles collected and merged with user profile information into a single database on this scale. From the perspective of an attacker, if the goal is to impersonate people or hijack their accounts, you have names, phone numbers, and associated account URLs. That’s a lot of information in one place to get you started.”

Furthermore, if the information has fallen into the wrong hands, it could be bought and sold on the dark web by malicious actors to undertake more comprehensive identity exploits in the year(s) ahead.

How was the Data Leak Discovered?

Two independent researchers, Vinny Troia and Bob Diachenko discovered the unsecured server while scanning for exposures. The server was an open Elasticsearch server, which was being hosted with Google Cloud.

The information on the unsecured server was contained in a database. Astonishingly, the database could be accessed without requiring a password or any other form of authentication.

It is unclear whether malicious parties downloaded this data before Troia and Diachenko discovered it and alerted the FBI. Although this is an extremely worrying fact, as yet there have been no reports of breaches stemming from this data leak.

Interestingly, the unsecured server was taken down within hours of the researchers having notified the FBI.

Where Did the Leaked Data Come from?

The data held on the unsecured server was found to have originated from People Data Labs (PDL) and OxyData. Both these companies are data brokers who sell data to customers to build products, power predictive modelling and enrich person profiles. To enrich person profiles, companies merge third-party data acquired from data brokers like PDL with data they already possess from their customers. This enriched data is then used by the company to make more informed customer related decisions.

Who Owned the Server?

The biggest question of all is who owned the server that left this massive amount of data exposed. Unfortunately, Troia and Diachenko were unable to determine who the server belonged to. Most of the data on the server apparently originated from PDL but they did not own the server.

This incident’s likely scenario is that a PDL customer stored purchased PDL data in a database on the server and then failed to secure both the database and the server correctly.

Information technology expert
Grace is an information technology expert who joined the VPNoverview team in 2019, writing cybersecurity and internet privacy-based news articles. Due to her IT background in legal firms, these subjects have always been of great interest to her.