Neo4j CTO: GQL is here: the evolution from Cypher & openCypher

This is a guest post for the Computer Weekly Developer Network written in full by Philip Rathle, CTO, Neo4j..

Neo4J offers developer services to power applications with knowledge graphs, backed by a graph database with vector search.

Rathle writes as follows… 

As I covered in my previous article, this year saw a big milestone for the database space with ISO publishing a new database query language for the first time in 37 years: ISO GQL.

As a developer, your initial reaction may be, “Great, another new language to get my head around” right? But the good news is that GQL largely borrows from existing languages that are already well established.

If you’re already using Cypher or openCypher (which you likely are if you’re already using graph databases, as it is today the de facto standard), then you’re already 95% there. If you’re using SQL, you have the question of learning a new data model. But the language is not that far off. The committee behind GQL is the same one that’s also responsible for SQL. They (we) made sure to employ existing SQL constructs wherever it made sense: keywords, datatypes and so on. This provides benefits not only with respect to skills, but existing tooling and compatibility across the stack.

Coming back to Cypher, there are a couple reasons GQL looks a lot like Cypher. One is that Cypher was a major input into the GQL standard. The second is that the team behind Cypher and openCypher evolved Cypher to converge into GQL as the standard evolved. This ended up being a powerful advantage of having members of the Cypher team join ISO and participate in both initiatives. All this together means that today’s Cypher is already highly aligned with GQL.

Neo4j and other openCypher vendors have declared they are committed to GQL and to a smooth transition where Cypher converges into GQL. Here is a quick run down of how GQL will impact your existing Cypher queries, the origin of Cypher and how the openCypher project came into the world in 2015.

The origins of Cypher…

Cypher is a property graph query language and is undoubtedly the current de facto standard for property graph query languages. The overwhelming majority of graph database users write queries in Cypher.

The Cypher language emerged in 2011, during the early halcyon days of NoSQL, starting with an idea from Neo4j’s Andres Taylor:

Cypher was declarative; unlike most other graph database query languages at the time, it was modelled after SQL, where you describe an outcome and let the database do the work of finding the right results. Cypher also strove to reuse wherever possible and innovate only when necessary.

… and how GQL impacts it

GQL has built upon Cypher’s strengths, incorporating tweaks to better align with SQL to ensure its long-term viability as a database language. And we believe organically evolving the Cypher language toward GQL compliance is the best way to smooth your transition. Yes, there are features in Cypher that did not make it into the standard and may or may not come up in a future standard release. But those Cypher features will remain available and continue to be fully supported as part of our overall commitment to supporting Cypher. The GQL standard allows for vendor extensions, so in a fashion many of those features are GQL friendly.

The GQL standard includes both mandatory and optional features and the long-term expectation is that most GQL implementations will support not only the mandatory features, but also most of the optional ones. In summary, Cypher GQL compliance will not stop any existing Cypher query from working and will allow Cypher to keep evolving to satisfy users’ demands.

Same same, but different

In practice, one could say that GQL and Cypher are not unlike different pronunciations of the same language. GQL shares with Cypher the query execution model based on linear composition. It also shares the pattern-matching syntax that is at the heart of Cypher, as well as many of the Cypher keywords. Variable bindings are passed between statements to enable the chaining of multiple data fetching and updating operations. And since most of the statements are the same, many Cypher queries are also GQL queries.

That said, some changes will involve Cypher users a bit more. A few GQL features might modify aspects of existing queries’ behaviour (e.g., different error codes). Rest assured, we’ll classify these GQL features as possible breaking changes and are working hard to introduce these GQL changes in the Neo4j product in the least disruptive way possible. These are great improvements to the language and we’re excited about the positive impact they will have.

The birth of openCypher…

By 2015, Cypher had gained a lot of maturity and evolved for the better, thanks to real-world hard knocks and community feedback. Yet as time progressed, the graph query languages kept coming—still none of them with anything close to Cypher’s success. If this kept up, the graph database space would continue to accumulate new languages, making it more and more confusing.

At Neo4j, we realised that if we cared about solving this problem, we needed to open up Cypher.

So in October 2015, Neo4j launched a new open initiative called openCypher. openCypher not only made the Cypher language available to the ecosystem (including and especially competitors!), it also included documentation, tests and code artefacts to help implementers incorporate Cypher into their products. Last but not least, it was run as a collaboration with fellow members of the graph database ecosystem, very much in keeping with Neo4j’s open source ethos. All of which started a new chapter in the graph database saga: one of convergence.

openCypher proved a huge success. More than a dozen graph databases now support Cypher, dozens of tools & connectors also support it and there are tens of thousands of projects using it.

…and GQL as its offspring

Ultimately, it was the launch of openCypher that led to the creation of GQL. We approached other vendors about collaborating on a formal standard, participated in a multi-vendor and academic research project to build a graph query language from scratch on paper and eventually joined ISO. Momentum reached a crescendo in 2018, when, just ahead of a critical ISO vote, we polled the database community with an open letter to vendors, asking the community if we database vendors should work out our differences and settle on one language, rather than minting out new ones every few months. Not surprisingly, the answer was a resounding yes.

In 2019, the International Organization for Standardisation (ISO) announced a new project to create a standard graph query language – what is now GQL.

But let us be absolutely clear: the openCypher project will continue for the foreseeable future. The idea is to use the openCypher project to help Cypher database and tooling vendors get to GQL. openCypher provides tools beyond what’s in the ISO standard (which is a language specification), which actually makes it potentially useful even to new vendors headed straight to GQL. Because all openCypher implementers and all their users, start the road to GQL from a similar starting point, which is a very good one, given the similarities between Cypher and GQL.

Bright future for GQL… with openCypher

openCypher has fulfilled its initial purpose, serving as the basis for a graph database lingua franca across much of the industry. It is heartwarming for the team that has been invested in curating openCypher to think that now GQL is finally here, openCypher can still have a different but useful role in ramping implementers and users onto GQL. Our dream is to see all openCypher implementations becoming GQL-conformant implementations, after which we will all be speaking GQL! Let’s make it happen.