Amazon Neptune: a shot in the arm for the graph database?

Freeform Dynamics

Amazon Web Services (AWS) announced its entry to the graph database market at its AWS reINVENT conference in Seattle in November last year. It was a notable announcement for a couple of reasons: it was the first graph database from the company (it offers a range of relational and NoSQL databases as a service). But it also shone a rather bright light on a database category that has often been considered niche, complex and expensive.

Neptune is currently in preview before it reaches general availability, but we expect that to happen soon. So should you be bothered?

A graph database is one that uses graph structures to enable the data to be queried, using the concepts of nodes, edges and properties to represent and store data. The key concept is the fact that the graph directly records the relationships between different data items in the database. Because the graph links related objects directly, it means those that have a relationship with one another can often be retrieved in one operation.

In relational databases, there are no such direct connections between related objects as data is stored in rows and columns. To create a relationship between different elements developers must write a ‘join’. But joins can become unwieldy and affect database performance.

The characteristics of graph databases enable the simple and fast retrieval of complex hierarchical structures that would be harder or even prohibitively time-consuming to model in relational databases.

The slight drawback with graph databases is that they cannot easily be queried with the de facto querying language for relational databases, Structured Query Language (SQL). Not only that, but in the graph database world there is not yet an equivalent de facto query language — there are a number of industry standard languages but there is likely to be a shakeout of some of these as graph databases become more popular and a clear winner possibly emerges.

Amazon says it built Neptune specifically for the cloud, which has its pluses and minuses. The drawback is there isn’t an on-premises version. The advantage though is that due to its economies of scale AWS tends to be able to offer good value subscriptions. As with other AWS managed services Amazon Neptune is highly available, with read replicas, point-in-time recovery, continuous backup to Amazon S3, and replication across AWS Availability Zones.

It can store billions of relationships and the graph can be queried with milliseconds latency. Neptune supports encryption at rest and in transit. As for that thorny issue of which query languages to support, AWS has hedged its bets with the option of Apache Tinkerpop Gremlin or SPARQL (Microsoft’s cloud graph offering, Azure CosmosDB, supports Gremlin or Gremlin-compatible languages such as Apache Spark GraphX).

I would have liked to see the addition to both of Cypher, a language developed by graph database pioneer Neo4j, as we believe it has very widespread adoption. Neo4j donated it to the openCypher Project in 2015 and as well as Neo4j it’s supported in SAP HANA Graph, Redis and AgensGraph databases.

Use cases and early adopters

Early adopters of Neptune are likely to be existing AWS users who have some or all of their data in the cloud already: AWS already offers a range of databases including relational and NoSQL options.

Amazon envisages that Neptune will power graph use cases such as recommendation engines, fraud detection, knowledge graphs, drug discovery, and network security. Security is probably the most common area where graph databases have been pressed into action, but they are also used in logistics, supply chain management, master data management, life sciences, e-commerce and even the hospitality industry.

Companies having a play with Neptune in preview include AstraZeneca, Thomson Reuters, Siemens, and the Financial Industry Regulatory Authority (FINRA). Amazon has been looking into how it can use it to improve its own Amazon Alexa system.

I believe AWS’ move into the graph database space is significant for the sector. It will make it simpler than ever for people to have a play with a graph database inexpensively. With Neptune, you don’t need to worry about hardware provisioning, software patching, setup, configuration, or backups.

It’s not that there are not other graph-as-a-service offerings, but few have quite the reach of AWS. With so many companies already having at least some of their data on AWS, this is an opportunity to see what a graph database can do for you.

There are too many graph databases to mention them all here, but here is a selection of firms large and small (in alphabetical order) to add to those mentioned above. Most offer some kind of pre-production free trial, so you can kick the tyres before you jump right in.

AllegroGraph
ArangoDB
Graph Base
Graph Story
HypergraphDB
IBM
Oracle
Ontotext
OrientDB
Teradata
Titan

Do you have any experience of using graph databases? I’d be interested to hear your thoughts in the comments section.