iconimage - Fotolia

How the internet of things and edge data impact data storage

Remote device data poses challenges for the datacentre, but edge processing, analytics and the cloud can help businesses profit from the promise of the internet of things

We’re witnessing the rise of data at the edge, and the challenges it presents to businesses are significant. The internet of things (IoT) – the broad-brush term for a varied collection of sensors, devices and smart technologies – is defined by equipment such as surveillance hardware and manufacturing plant that is becoming smarter and connected.

That connection allows organisations to extract valuable and sometimes real-time data from their infrastructure.

Elsewhere, new sensors allow companies in industries as diverse as farming, transport and aviation to monitor operations, and improve performance and safety with low-cost devices. Typically, these are mobile, or operate remotely away from corporate datacentres and IT infrastructure. Therein lies the core challenge for IT.

So far, we have only scratched the surface of the IoT and edge devices. Market researcher IDC expects spending on the IoT to reach US$1.2tn by 2022.

Gartner, another technology research firm, calculates that 8.4 billion IoT devices are in use today, and expects that number to rise to 24 billion by 2020.

Although consumer IoT devices, such as connected fridges, have captured the popular imagination, industrial applications and monitoring of facilities, utilities and transport infrastructure will account for the majority of that growth.

IoT challenges from data diversity

So far, the growth of the internet of things has only had a limited impact on the way enterprises design IT systems. In part, this is because the IoT is still a new and evolving technology. Also, it is precisely because the new devices exist on the edge.

But as businesses look to exploit the data generated by their systems, IT departments will need to make network links to devices on the edge more resilient, or come up with workarounds for limited or intermittent connectivity. They also need to understand the diversity of data produced by edge systems, and how that data will be used in the business.

According to IDC analyst Natalya Yezhkova, 87 exabytes of storage capacity will be shipped for IoT workloads by 2021.

But that total masks a wide range of data volumes and types. They can range from a single alert of a few kilobytes for a temperature spike, to potentially terabytes of data for surveillance video, facial recognition or behavioural analytics in transport or retail.

IoT challenges to data retention

IT and the business also need to discuss data retention policies. Some data, such as alerts, might only need to be retained for days or hours. Surveillance data, and any information that will be analysed later, could have a working life of months, with archiving measured in years.

Meanwhile, an aircraft manufacturer will typically keep production data for several decades after the last flight of an aircraft type. With a modern aircraft – such as the Boeing 787 or the Airbus A350 – producing up to half a terabyte of data per flight, IoT storage demands could be vast.

Examples abound from other industries. Smart thermostat company Quby, for example (see case study below), captures 100GB of compressed data each day.

Then there are the kinds of data businesses capture from the edge. Some information, such as sensor readings, is structured and can reside in a database.

Other systems, however, create vast quantities of unstructured data, primarily video and photography, but also audio files. IT departments will need to separate raw storage and long-term archiving needs from the need for quick access to metadata so that users can find imagery in the event of a systems failure or security incident.

“Video surveillance is a good example of data that seems simple but has a larger file size and needs to be stored for a long time,” says Yann Lepant, managing director of Accenture Technology.

“CCTV, for instance, has become much cheaper and more prevalent, to the point where it’s now often bundled up with cloud storage. Cameras tend to produce very high volumes of data, and whilst cloud storage might come cheap, it could certainly rack up.”

Processing at the edge

But, rather than buy ever more storage, businesses are starting to become smarter about the way they use IoT and edge data.

Part of the answer is to store – and process – more data at the edge. Another is to focus more on data analysis, and push analysis to the edge too.

Manufacturers are increasingly fitting IoT devices with their own local storage. Western Digital, for example, now makes SD cards optimised for surveillance and for localised data processing. This, according to surveillance product marketing director Brian Mallari, allows more data to stay on the device, as well as ensuring that recording continues if the network fails.

As edge devices become more powerful and reliable, businesses could opt just to pull metadata from the edge, and only harvest raw data if an event occurs.

“Edge computing alleviates [network issues] by performing data processing at the edge of the network, near the source of the data,” says Johan Paulsson, chief technology officer at Axis, a maker of surveillance equipment.

“Doing so significantly reduces the bandwidth needed between sensors and devices and the datacentre,” he says.

Read more about the internet of things

  • The internet of things will have a huge impact on storage – the sheer volume of data, the radically different types of data created and the storage needed, from flash to object to cloud.
  • We look at the impact on storage and compliance of the rise of IoT, with tips from Mathieu Gorge, CEO of Vigitrust, on how to manage endpoints in an internet of things environment.

This, in turn, means thinking about the internet of things and edge computing in terms of data analysis, rather than data storage.

Even the most pessimistic predictions of IoT growth will put a serious strain on the ability of organisations to transfer and store data, so they will need to be selective.

“Much as people talk about the data, it is actually the insight that is important,” says John Hickin, a technology expert at PA Consulting. “You need the right data in the right place at the right time. That informs how you understand edge technology.”

He suggests developments in computing, miniaturisation and power consumption are opening the door to powerful analytics processing at the edge itself. Edge analytics reduces the need for central storage, and should allow for quicker decision-making.

“For sensor data, the most useful thing is to process it at the edge,” he says. “That tells people on-site what is happening, and makes it more useful for business operations.”

Case study: Quby

Quby uses data generated by smart thermostats to give consumers insights into their energy use, and energy efficiency.

Each day, the company takes in about 100GB of compressed data – spanning consumers’ boilers, white goods, solar panels and smart plugs.

The company uses Apache Spark-based analytics platform Databricks and Amazon S3 storage to run the service.

“The main requirement was that it was stored somewhere that did not impose any hard limits on data size and was easily accessible,” says Quby data engineer Telmo Oliveira.

“You have to ask yourself as a company if you are going to become an expert in data storage or if it’s better to use a managed service and leverage the expertise that’s available.”

At present, Quby uses a centralised, cloud-based model for its storage analysis. But Oliveira expects that to change, as technology matures.

“The internet of things can potentially make our lives better on its own, but things get much more interesting when you can proactively help your customers by using data – that’s where the machine learning and our data journey begins,” he says.

Case study: Hall Hunter Partnership

Hall Hunter Partnership (HHP) grows fruit for stores including Marks & Spencer, Waitrose and Tesco. The company employs 2,000 people in southern England.

The business collects data from sensors on its farms, as well as from harvesting and packaging machinery. Currently, HHP has 105 networked soil moisture sensors that collect data every 15 minutes.

When the company first installed sensor data, it could only store 30 days’ worth of information and it took 20 minutes to update a report.

To fix this, Hall Hunter Partnership built a data warehouse using technology from supplier Wherescape. This has cut the response time for reports to seconds.

“Data is extracted and compiled into our data warehouse to measure growing degree hours, monitor trends and compare results easily between fields,” says HHP business analyst Alex Gooi. “We’ve been doing this since January 2016, amassing a total of 963,346 distinct measurements or records with a storage capacity of 128MB.”

HHP could have consolidated the data into just 96 distinct measurements, but the team prefers to have more granular data to work with.

Although the instinct for IoT projects is often to add more physical storage, Hall Hunter Partnership found it could work with existing infrastructure and save summaries from the real-time analysis, rather than the original transactional tables. And it gives the fruit producer scope to sample more climate data in the future. 

Read more on Internet of Things (IoT)