IWM uses StorCycle to free capacity for all-flash SAN upgrade

Imperial War Museums set to embark on unstructured data archiving project that will archive cold data to tape and free up capacity for a move to IntelliFlash all-flash SAN

Imperial War Museums (IWM) is set to deploy Spectra Logic StorCycle to help it archive unstructured data that comprises around 70% of capacity on existing storage area network (SAN) and network-attached storage (NAS) infrastructure.

Freeing up that capacity will allow it to retire its seven-year-old 3PAR SAN storage infrastructure and replace it with an all-flash system from IntelliFlash.

That all comes as part of a circa £100,000 project to refresh the server estate with HP Synergy compute, replacing that company’s legacy blade hardware.

IWM began to digitise its media assets around four years ago, with the movement of tens of thousands of hours of moving image footage and millions of pictures to a new digital asset management (DAM) system archive based on two Spectra Logic tape libraries (LTO-7 and TS1150) with a Black Pearl LTFS-based object storage and NAS front end.

IWM has five sites across the UK – IWM London, IWM North, IWM Duxford, the Churchill War Rooms and HMS Belfast – and is home to approximately 750,000 digital assets, representing a total of 1.5PB (petabytes) of data as uncompressed files.

New scans in the museum’s film collection generate an additional 10TB (terabytes) of data per month, with its videotape scanning project expected to create more than 900TB of data over the next four years.

The new StorCycle-based infrastructure will offload unstructured data from the 200TB hybrid flash 3PAR SAN and an estate of (mostly siloed) Synology NAS boxes with around 1PB of capacity.

CIO Ian Crawford said: “DAM is nicely tucked away. This is about data coming in from, for example, bodycam footage from the Ministry of Defence, data we create ourselves from media materials, museum projects, and so on.”

“We always had an issue dealing with a lot of unstructured data. It’s not out of control, but it’s a lot to manage. People don’t housekeep and want to keep data on drives forever. We’re generating a lot of data that we don’t want to go into DAM.”

“StorCycle will allow us to heavily reduce the amount of data we have to move from one storage platform to the other when we deploy IntelliFlash. We hope to offload about 70% to tape.”

StorCycle has only just been deployed, with the first data expected to be processed by it this week.

The big benefits for Crawford are in simplifying the infrastructure and the cost savings that will result.

“Clearly, if we can go and buy a new SAN and we need 70% less storage, that’s a huge saving. What we’ve just bought is much less costly than the 3PAR SAN we bought seven years ago. Also, tape is low cost in terms of power consumption.”

The entire project has been delayed by the difficulties of getting staff on-site because of Covid restrictions, but it is hoped the IntelliFlash and Synergy go-live can be carried out in time for the opening of new World War Two and Holocaust galleries at IWM’s main London site in September.

StorCycle runs on a Windows server and crawls storage for data that fits characteristics set by user policy and migrates it off to other tiers of storage, in IWM’s case to a new Spectra Logic Black Pearl tape infrastructure – on a T120 tape library – and separated out from DAM.

Data can be scanned to a variety of schedules and criteria, such as size or last-active, and be moved or copied off to the bulk storage layer in one-off or recurring operations. From here, they are visible as HTML links to users.

When users want to restore, they are provided with information on how long it will take – given that data may be on tape or cloud – and taken through a set of options, including the ability to restore related project files.

Spectra allows migrated files to be represented as symbolic links and HTML links. Amazon S3 and Amazon Glacier can be targets, as can Azure, with the addition of Google Cloud Platform planned.

Other items on the roadmap include clustering of StorCycle hardware nodes and the ability to run in the cloud. Support for S3 as a source will come in the forthcoming 3.4.0 release.

Read more about archiving

Read more on Flash storage and solid-state drives (SSDs)