Scylla’s ‘scale-up & scale-out’ mantra for monster database power
There are databases. Then there are big data databases. Then, then there are super-performance high-speed big data databases.
And finally, there are real-time big data databases that run at hyper-speed built around an efficiency mantra that champions scale-up (and then…) scale out engineering.
This is the spin (the server hard disk kind, not the marketing kind) that Scylla wants to put forward to describe its performance of millions of operations per second (MOPS) on a single node.
The company claims that independent tests show a cluster of Scylla servers reading 1-billion rows per second (RPS) – plus (and here’s the point of all of this) the organisation says that this is performance that ranks ‘far beyond’ the capabilities of a database using persistent storage.
“Everyone thought you’d need an in-memory database to hit [MOPS] numbers like that,“ said Dor Laor, CEO of ScyllaDB. “It shows the power of Scylla’s close-to-the-metal architecture. For 99.9% of applications, Scylla delivers all the power [a customer] will ever need, on workloads other NoSQL databases can’t touch and at a fraction of the cost of an in-memory solution. Scylla delivers real-time performance on infrastructure organisations are already using.”
NOTE: As TechTarget reminds us, NoSQL (not only SQL) is an approach to database design that can accommodate for a wide variety of data models, including key-value, document, columnar and graph formats – over and above ‘traditional’ relational databases (where data is sits in tables and data schema is designed before the database is built) NoSQL databases are useful for working with large sets of distributed data.
Close-to-the-metal
But let’s stop a moment. Scylla CEO Laor said ‘close-to-the-metal’… so what did he mean by that? We’ve detailed a complete definition commentary piece here, but essentially it’s all about database software that works in close proximity to and with knowledge of the actual instruction set and RAM addresses of the hardware system that it is built to run on.
In the company’s benchmark example, we see Scylla tell us that the test involved scanning three months of data (some served from cache and some from storage), which resulted in Scylla reaching speeds of 1.2 billion data points per second.
NOTE: Data points per second (DPPS) refers to the number of individual records inside any given architecture (or type) of database schema that the database query engine and management plane can accurately read, in one second.
Scanning a full year’s data, all from persistent storage (with no caching), Scylla scanned the entire database at an average rate of 969 million data points per second.
With one day of data, scanning everything from memory, Scylla achieved 1.5 billion data points per second.
Scylla uses power of modern hardware such as bare metal servers — with its shared-nothing, lock-free and shard-per-core architecture, which allows it to scale up with additional cores, RAM and I/O devices.
NOTE: A shared-nothing architecture is an approach to distributed computing architecture in which each update request is satisfied by a single node (where a node itself can represent a processor, block of memory or storage unit) so that there is (hopefully) no contention among nodes. In shared-everything, any data task (processing, memory or storage) can be served by an ‘arbitrary combinations of nodes’, so there could be traffic and the potential for collisions.
Scylla says that it stands in ‘contrast’ to Cassandra, because “Cassandra’s ability to scale-up is limited, by its reliance on the Java Virtual Machine, which keeps Cassandra from interacting with server hardware. Where Cassandra’s threads and locks slow down as the core count grows, Scylla can take full advantage of all of a server’s resources.”
The company claims that the performance Scylla demonstrated in these (above noted) benchmarks has implications for real-world applications. For example, analytics jobs that previously took all day to complete can now be run continuously to provide ‘intraday’ (i.e. inside one day) reports.
As with any Scylla story, it’s a bit like drinking from a firehose and the company presents itself with a degree of unashamed swagger and confidence. There’s also a lot to cross-reference and learn (hence the three clarification notes above and the separate close-to-the-metal explanatory story) in order to take all this in.
Scylla (the company) takes its name from the Greek sea monster of the same name in Homer’s Odyssey… let’s hope this stuff is no fairy tale.