How Shazam handles its data reserves

Chris Kammermann, senior infrastructure engineer at Shazam, explains how adopting the Splunk platform has helped the firm deal with legacy systems and data analysis

Sound identification firm Shazam is a staple application on many people’s phones, most commonly used to identify catchy background music on TV, the radio or during a night out.

Shazam collects customer usage data every day – but how does it handle such huge amounts of information?

Many people think of Shazam as a startup, but in fact it has been around for more than 15 years. Like other companies that have always collected data, it holds a lot of information on how customers are using its app, but has not always been able to make beneficial use of this data.

The brand has adopted Splunk to make more use of the data by analysing how consumers are using the app.

“We are sitting on a treasure trove of data and some of it is really cool and insightful, also at a social, demographic and cultural level,” says Chris Kammermann, senior infrastructure engineer at Shazam.

Shazam has been available as an app for years, but the need to analyse large amounts of data was not always so pressing.

Kammermann says the release of the first iPhone in 2007 caused a sudden jump in downloads, and the number of people using the app increased as smartphones became the norm.

As users grew, so did Shazam’s data requirements, and after a few years it was unable to continue handling and processing data using its legacy infrastructure.

“It was just the scale of the data and it also the way we wanted to change the data that was being generated,” says Kammermann. “We wanted to change the format of the data, change the structure of the data and our old IT systems just couldn’t cope with the rate of change.”

Not flexible enough

Kammermann says Shazam’s original combination of a “traditional tier-one hardware supplier” and “traditional tier-one software supplier” operating on SQL was not flexible enough, and could not scale at the same rate as Shazam’s data.

“To keep it up and running 24/7, we needed three or more people full time, and we needed to pay a significant amount of money to the hardware and software suppliers to get their support and make sure it was running all the time,” he says.

The previous system used batch processing to load data into its databases in large dumps at various points in the day, sometimes with day-long processing times, making it difficult to query data in real time.

There was also no capability to make ad-hoc queries, and any changes to queries had to be developed by an engineering team.

Shazam began to adopt Splunk in 2011, and it took two years to shed its legacy systems.

Using Splunk Enterprise, Shazam can collect data in any structure or format and create and run new queries in real time without the support of an engineering team.

Legacy data

After the transition to Splunk, Shazam’s legacy data was stored in Amazon’s AWS cloud storage service, but Kammermann says there is no reason to query it.

The volumes of data collected now and in the early days of the app are so different that comparing the two would not yield the most accurate results, and many usage comparisons go back only a year at the most.

“Generally, we compare week to week and month to month because that’s what we’re interested in,” says Kammermann.

Most of these comparisons are used to assess user reaction to changes in the app, but these insights can also be used to determine consumer trends.

Kammermann uses David Bowie as an example. Shazam’s data showed more people were “Shazaming” Bowie after his death than before.

The firm could also make predictions about the Eurovision Song Contest based on how many Shazams a song received.

Shazam has also experimented with its data in a retail setting to fully utilise the app and the data it can collect, says Kammermann.

For example, the firm found customers have more positive brand engagement with shops that are playing “cool music”. So, in the future, Shazam hopes to work with brands to determine insights, such as where customers go after Shazaming the music in a particular store, whether they then buy something and how best to turn these behaviours into profit.

“This is the kind of stuff we do as a company,” says Kammermann. “We engage with brands and retailers to figure something out and make it more profitable.

“Shazam is a magic app, and if you think about it, there’s a lot you can do with it. That’s what makes it so special, and that’s what makes it very interesting to work for. I’m loving the data. I enjoy it.”

Moving beyond music

When demonstrating some of Shazam’s new features, Kammermann plays a supersonic sound, which the Shazam app identifies as a pre-programmed indicator for his meeting with Computer Weekly.

The app also features heavily in some TV adverts, encouraging viewers to Shazam the song used in an advert in return for deals or product information.

Images can also be recognised by the app, driving further engagement. “The point is, we’re more than a music recognition app,” says Kammermann. “We recognise merchandise such as Coke cans, McDonald’s trays, parts of Condé Nast – you can Shazam pages out of magazines and it recognises where you are.”

Read more about data science

  • Computer Weekly looks at the role of predictive analytics in marketing, how data science toolkits work, and Hadoop’s emergence in the public sector.
  • The Economist Intelligence Unit finds UK companies struggling to cash in on data exploitation, in an SAS-sponsored research study.

In the future, the company’s aim is to let users Shazam anything they want. But to do this, it may have to become the default app for many of the retailers or brands it works with to utilise potential deals or customer loyalty, says Kammermann.

He points out that most people have up to 50 apps on their smartphone, but regularly use only 10 of them.

“One of the problems with having your own branded app, or any app in today’s world, is that everybody has limited app real estate,” says Kammermann. “There is less point in developing your own app nowadays because real estate on phones is hard to come by.”

In the future, brands could use the Shazam app to dish out product information or deals to consumers without having to develop their own app, making it easier to engage with consumers.

“Shazam is everywhere and has been downloaded millions of times,” says Kammermann. “We can leverage that, so brands don’t have to generate their own app. If you are out on the street and you want to engage with a brand, you don’t have to get your phone out, download the app over 3G, wait two minutes and then click a button. It’s already there through Shazam.”

The right skills

There is a distinct lack of big data skills in the UK, and industry trade body Tech UK predicts that the current rate of training will not meet the projected 157,000 additional big data roles that will appear in the next five years.

Kammermann says people with knowledge of data are “hard to come by” and adds: “We are selling data as a product, so we’re transitioning from recognising a particular song to recognising a Bluetooth beacon, supersonic radio frequency or Coke can. We are now also looking to sell data.”

This increased emphasis on data means Shazam needs data-savvy employees, which Kammermann says has been a “surprising challenge”.

But he says a level of inquisitiveness is more important than having data skills straight off the bat, because there are tools available to help train people to discover data insights.

“People who deal with the data don’t necessarily come from a data science background,” says Kammermann. “You can make the change if you’re interested.”

Read more on Big data analytics