How to tackle big data from a security point of view

Before leaping into big data, companies must be clear what they are trying to achieve, otherwise their investment will be wasted

Big Data is an immensely popular talking point, but what are we really discussing? From a security perspective, there are two distinct issues: securing the organisation and its customers’ information in a Big Data context; and using Big Data techniques to analyse, and even predict, security incidents.

Securing Your Big Data

Many businesses already use Big Data for marketing and research, yet may not have the fundamentals right – particularly from a security perspective. As with all new technologies, security seems to be an afterthought at best. 

Big Data breaches will be big too, with the potential for even more serious reputational damage and legal repercussions than at present.

A growing number of companies are using the technology to store and analyse petabytes of data including web logs, click stream data and social media content to gain better insights about their customers and their business.

As a result, information classification becomes even more critical; and information ownership must be addressed to facilitate any reasonable classification. 

Most organisations already struggle with implementing these concepts, making this a significant challenge. We will need to identify owners for the outputs of Big Data processes, as well as the raw data. Thus data ownership will be distinct from information ownership – perhaps with IT owning the raw data and business units taking responsibility for the outputs.

Very few organisations are likely to build a Big Data environment in-house, so cloud and Big Data will be inextricably linked. As many businesses are aware, storing data in the cloud does not remove their responsibility for protecting it - from both a regulatory and a commercial perspective.

Techniques such as attribute based encryption may be necessary to protect sensitive data and apply access controls (being attributes of the data itself, rather than the environment in which it is stored). Many of these concepts are foreign to businesses today.

Deploying Big Data for Security

The deployment of Big Data for fraud detection, and in place of security incident and event management (SIEM) systems, is attractive to many organisations. The overheads of managing the output of traditional SIEM and logging systems are proving too much for most IT departments and Big Data is seen as a potential saviour. There are commercial replacements available for existing log management systems, or the technology can be deployed to provide a single data store for security event management and enrichment.

Taking the idea a step further, the challenge of detecting and preventing advanced persistent threats may be answered by using Big Data style analysis. These techniques could play a key role in helping detect threats at an early stage, using more sophisticated pattern analysis, and combining and analysing multiple data sources. There is also the potential for anomaly identification using feature extraction.

Today logs are often ignored unless an incident occurs. Big Data provides the opportunity to consolidate and analyse logs automatically from multiple sources rather than in isolation. This could provide insight that individual logs cannot, and potentially enhance intrusion detection systems (IDS) and intrusion prevention systems (IPS) through continual adjustment and effectively learning “good” and “bad” behaviours.

Integrating information from physical security systems, such as building access controls and even CCTV, could also significantly enhance IDS and IPS to a point where insider attacks and social engineering are factored in to the detection process. This presents the possibility of significantly more advanced detection of fraud and criminal activities.

We know that organisational silos often reduce the effectiveness of security systems, so businesses must be aware that the potential effectiveness of Big Data style analysis can also be diluted unless these issues are addressed.

At the very least, Big Data could result in far more practical and successful SIEM, IDS and IPS implementations.

Big Data Technologies and Risks

The risks associated with Big Data technologies:

  • This is a new technology for most organisations. Any technology that is not well understood will introduce new vulnerabilities.
  • Big Data implementations typically include open source code, with the potential for unrecognised back doors and default credentials.
  • The attack surface of the nodes in a cluster may not have been reviewed and servers adequately hardened.
  • User authentication and access to data from multiple locations may not be sufficiently controlled.
  • Regulatory requirements may not be fulfilled, with access to logs and audit trails problematic.
  • There is significant opportunity for malicious data input and inadequate data validation.

If you research the term Big Data, you will invariably encounter Hadoop. Traditional data warehouses and relational databases process structured data and can store massive amounts of data, but the requirement for structure restricts the type of data that can be processed. Hadoop is designed to process large amounts of data, regardless of its structure.

The core of Hadoop is the MapReduce framework, which was created at Google in response to the problem of creating web search indexes. MapReduce distributes a computation over multiple nodes, thus solving the problem of data that is too large to fit onto a single machine. Combining this technique with Linux servers presents a cost-effective alternative to massive computing arrays.

The Hadoop Distributed File System permits individual servers in a cluster to fail without aborting the computation process by ensuring data is replicated with redundancy across the cluster. There are no restrictions on the data that HDFS stores - it can be unstructured and schema-less.

In contrast, relational databases require that data be structured and schemas be defined before storing the data. With HDFS, making sense of the data is the responsibility of the developer’s code.

Specialist Skills

In reality, Big Data is more about the processing techniques and outputs than the size of the data set itself, so specific skills are required to use Big Data effectively. There is a general shortage of specialist skills for Big Data analysis, in particular when it comes to using some of the less mature technologies.

The growing use of Hadoop and related technologies is driving demand for staff with very specific skills. People with backgrounds in multivariate statistical analysis, data mining, predictive modelling, natural language processing, content analysis, text analysis and social network analysis are all in demand. These analysts and scientists work with structured and unstructured data to deliver new insights and intelligence to the business. Platform management professionals are also needed to implement Hadoop clusters, secure, manage and optimise them.

Suppliers such as Cloudera, MapR, Hortonworks and IBM offer training courses in Hadoop, offering organisations the opportunity to build their in-house skills to address Big Data challenges.

Before leaping into this brave new world, companies must be clear about what they are actually trying to achieve, otherwise their investment will be wasted.

In summary, Big Data expands the boundaries of existing information security responsibilities and introduces significant new risks and challenges.


Peter Wood is chief executive officer, First Base Technologies and member of the ISACA London Chapter Security Advisory Group

Read more on Privacy and data protection