Software firm saves storage array costs with Nutanix hyper-converged

Banking SaaS provider Vialink needed more IOPS when its NetApp arrays slowed up, but Nutanix hyper-converged proved a better package than Pure Storage and Dell EMC, too

Banking software supplier Vialink needed a high-performance storage array, but deployed Nutanix hyper-converged infrastructure (HCI) instead. It didn’t start out looking for the server functionality that comes with HCI, but it came in cheaper than alternatives that only offered storage. And, Vialink realised that choosing hyper-converged would also radically simplify the work of its technical teams.

“Before we made this choice there were only three of us managing the infrastructure. Our days were pretty stressful, faced with a storage array which couldn’t support our peaks of activity after hardly two years,” said systems and networks head, Emmanuel Helfenstein. “When you face this type of situation, you can say it becomes possible to radically change the infrastructure.”

Vialink’s software as a service digitises regulatory processes, mainly those of banking customers (BPCE is a notable customer), but also in real estate where it supplies an electronic signature solution to the Citya group.

The company’s flagship product is KYC, which handles scans of documents of new banking clients via OCR and connections to third party services to verify them and attribute a confidence score.

Because of the regulated nature of its work, Vialink doesn’t use the cloud, except to train data modules with Google Cloud Platform virtual GPUs. Everything else is handled in its own datacentres.

“In normal conditions, we don’t have an enormous need for bandwidth,” said Helfenstein. “So, in 2016, when we virtualised all our servers, we chose storage infrastructure to suit that. That was a NetApp array with 48 SAS drives at each of our two sites.”

“At the time, that worked well, but since we started to handle more demanding operations, like batch runs on databases or receiving large requests from customers, performance collapsed.”

Read more about hyper-converged

Helfenstein said 7,000 IOPS was the limit beyond which the NetApp arrays failed to respond. “I don’t think the disks were the problem rather than the CPU in the array, which wasn’t up to the job.”

Initially, the IT team tried stopping services that made big demands on processing, said Helfenstein. They blocked deduplication and data compression, and regained some IOPS. “After a short while we realised this hardware wouldn’t produce a miracle,” he said. “It would have been useless to add more disk shelves.”

Helfenstein and his co-workers were “suffering”, he said. They contacted their suppliers to look for an alternative: NetApp, Pure Storage and Dell EMC, which had already offered a switch to its hyper-converged VxRail, and then Nutanix.   

Nutanix: For the price, and its global console

“The big thing that stood out for us from Nutanix was that it offered the same compute functionality as Dell EMC’s VxRail,” said Helfenstein. “But at the same price as solutions from NetApp and Pure Storage, which lacked the server part.”

All these products contain processors. For NetApp and Pure, that delivers storage functionality only. With VxRail you can also use it to run virtual machines, but with a supplementary ESXi licence. Meanwhile, Nutanix’s AHV hypervisor is free.

“The cost of the VMware licence would be €50,000 in the first year, to manage 16 cores and deploy two vCenter consoles,” said Helefenstein. “Plus €20,000 per year for maintenance. It’s that saving there that we made by choosing Nutanix.”

The idea of putting virtual machines and storage in one box goes beyond the simple savings Vialink could make by not having to buy servers to go with its drive arrays. “When you are a small IT team, you don’t want to manage 50 consoles – Nutanix puts everything on one screen,” said Helfenstein.

“Nutanix’s management software handles, for example, firmware updates on the motherboard, controller cards and the SSDs,” he said. “And it does it transparently, without any human intervention. Previously, it required complex supplementary operations on the Dell servers that accessed our NetApp arrays.”

A transparent migration

In 2019, Vialink bought two Nutanix clusters. Each comprises four SuperMicro nodes with two sockets, 512GB of RAM and 38TB of storage on 12 3.84TB SSDs. The company opted for two options: to share files via SMB with other servers for reasons of encryption which is obligatory for suppliers in the banking sector.

Each cluster is in one of Vialink’s datacentres. “Each data is the disaster recovery mirror of the other,” said Helfenstein. “In other words, they run different applications but synchronise all their data. In that way, we share the daily load between our two sites, but if one goes down the other can handle 100% of production.” Synchronisation of stored data is covered by Nutanix via 1Gbps dark fibre.

To migrate data and virtual machines from the existing system to the new, Vialink used a tool called Move, also supplied by Nutanix and which converted VMware virtual machines from ESXi format to AHV on-the-fly.

“The migration took place transparently with applications continuing to function during the copy,” said Helfenstein. “But at some point you have to restart them on the new clusters, so we carried all this out at weekends.

“We migrated 300 VMs like that in three months, 10 different VMs at a time, so that we didn’t block everything if there was a problem.”

As it happens, there was a problem with four VMs. “The conversion of our applications from VMware to AHV format didn’t pose us a problem because all their VMs run on a Linux Debian system that includes all the drivers needed for either cluster,” said Helfenstein. “On the other hand, we had virtual network appliances that we had bought pre-configured for the old Dell server clusters that we had to adapt by hand for the new Nutanix cluster.” 

Benefits: Nodes that handle the workload, and more 

The Nutanix clusters weren’t in for long when Helfenstein and his team noticed that the IOPS count “went to 10,000, 20,000 ... then 30,000 IOPS. It held without failing and we had proved that it could support one million IOPS. We achieved a kind of serenity.”

Support was another area of satisfaction. “Nutanix had encouraged contact with its support service at the least concern,” said Helfenstein. “We took them at their word. We opened a ticket when we had to carry out an update or change any settings. They were very reactive and always responded helpfully.”

In 2020, an update went wrong, with the result that one of the nodes disappeared. “We opened a high-priority ticket,” said Helefenstein. “Someone from Nutanix contacted us quickly via Zoom. She hit three commands and the system was restarted within the hour, with no effect felt by the users.”

Having said that, the Prism console is sufficiently easy to use for the Vialink team to handle most incidents. On one occasion, the 1Gbps link between datacentres was saturated, cutting normal communications between applications. A simple intervention via Prism to configure synchronisation bandwidth between the two clusters sufficed to fix the problem in seconds.

After a year passed without further incident, Vialink decided to migrate their Kubernetes containers onto Nutanix.

“Nutanix suggested its Kubernetes orchestrator, Karbon,” said Helfenstein. “It comes free with the product anyway. And not only that, we also had the advantage of being able to manage our containers from the same Prism console we use to administer everything else. Previously, we had used a dedicated Kubernetes console.”

The extra workload carried by these containers, notably Java applications, meant the addition of memory for each node to take it to 768GB. Helfenstein said they bought the memory cards themselves and installed them, which was not a problem for Nutanix. Now, the clusters run around 600 virtual instances.

Administer databases without a DBA

With regard to simplifying administration, the IT team were set to get another nice surprise.

“In 2021, our developers asked us to support Mongo DB and Postgres databases on the Nutanix clusters,” said Helfenstein. “The problem was that we didn’t have a DBA in the team. So, Nutanix suggested we deploy ERA, which is a tool to automate database management, which manages availability and allows deployment of test and working copies in one click.”

Vialink has also invested in backup software Hycu, which specialises in protecting Nutanix clusters. This, however, is not integrated into Prism. Access is via its own console with backups stored on 300TB of Caringo object storage.

“Nutanix also offers object storage, but we didn’t go for it because we’re dealing with data protection and we didn’t want to put all our eggs in one basket,” he said.

Helfenstein also plans to invest in the Nutanix optional module that will allow real-time synchronisation between clusters, which only happens at intervals with the base system.

Read more on Converged infrastructure