Will GQL Be The SQL Of Graphs?

Will GQL Be The SQL Of Graphs?
Will GQL Be The SQL Of Graphs?

The world of graph databases has been a tremendous cornucopia of innovation. Now that graphs are becoming a mainstream part of computing infrastructure for all different types of companies, from Silicon Valley leaders to the defense department to your IT department, it’s clearly time for the industry to make sense of itself and create some efficiencies through standardization. While it’s all well and good to let a thousand flowers bloom, it makes sense to decide on a set way of how things can be done to increase efficiency.

That’s the motivation behind the GQL manifesto, created by Alastair Green, who works on graph query languages at the graph technology company Neo4j, which I’ve profiled before and whose GraphTour I spoke on earlier this year. In the manifesto, Green argues that it’s time for a single query language for graphs. He lays out a vision for merging the best properties of Cypher, PGQL, and G-CORE into a new, comprehensive language expressly designed for graphs.

At the heart of this vision is the inherent value in using graphs in the first place. Green points out the intuitive nature of graphs and how they’re already benefiting companies in finding relationships and in building better algorithms. He’s also right to assert that adoption of graphs in all realms of industries and landscapes is increasing fast.

Because of this rapid adoption, Green thinks now is the time for standardization of a language. He makes a comparison to SQL and how the graph world would improve from having a similar declarative language. SQL, in and of itself, however, can’t be this language, because not all graph technology is tied to relational systems or tables.

Not to draw on an overused allegory, but the problem with not having such a standard language is that of the Tower of Babel. If graph technology is all using separate languages, we’re all at a disadvantage because then each tool has to operate in its own silo. We can never achieve the benefits of having a single way to use and query graphs, regardless of the product or tool we’re using.

Green calls this new language GQL — an acronym I’m sure you can figure out. He writes that “a common query language focuses support around data modeling, ETL and visualization tools for graph data, and portable queries mitigate vendor lock-in. But GQL also needs to be tuned and agile to meet the needs of the expanding property graph data industry. It should work with SQL, but it should not be confined by SQL. GQL would be a language that complements the traversal API of Apache Tinkerpop’s Gremlin as well as SPARQL for RDF triple-stores. This results in better choices for developers, data engineers, data scientists, CIOs and CDOs alike.”

I agree with Green’s assertions in large part because without such standardization, there will soon be too much graph technology to choose from. Right now, we already have the Cypher and PGQL languages, each of which are tied to different technology vendors, as well as G-CORE, a research language that was created to provide inspiration for graph languages in general.

Taken together, these languages have many similarities and each has its shortcomings, which Green lays out in his piece. But even though Cypher is a creation of his company, Neo4j, Green doesn’t make an argument for it above and beyond the other two. Instead, he’s advocating for a way to combine the best features of all three into an industry standard that meets the needs of most users. Given that Cypher and PGQL (started by Oracle) already have a huge number of users, he believes that if the industry came together around this idea, a universal standard language could be developed relatively quickly. He ends his manifesto with an open call for others to join him on working on such a language and a survey, which though unscientific, shows 95% of respondents supporting the idea for a standard GQL.

Recently, I spoke with Green about the current progress towards creating GQL. Green was optimistic and told me that there are essentially two tracks of discussions currently taking place about the creation of a standard graph language, both of which are focused on the best way to make it happen. One set of informal conversations is occurring amongst proponents of a standard language for graph queries, most likely built around the current ideas for GQL, and they are working to support the formation of a proposal that can go before the official International Organization of Standardization (ISO) in June 2019. Additionally, there is a more formal effort through an ad hoc subcommittee of the American committee that deals with SQL that has been focused on creating a graph property language extension to SQL and which has been holding conversations about creating a standard graph language for the past 18 months. That committee has had its charter extended to explicitly look at the creation of GQL.

Green said that those two efforts are going to have to come together to create a level of documentation that would persuade the ISO that this is a viable project and that people are committed enough to make it a reality. That process would put this project on parallel with SQL.

Green’s hope is that the underlying foundation of SQL and its fundamental characteristics of data types, expressions, and predicates could be used in GQL to make the latter’s implementation easier and the language familiar and well specified to SQL users. Then, GQL could be built up with more specific graph features, including new ideas about pattern matching. He also sees the possibility of using Oracle’s PGQL language and openCypher as the basis and extending its capabilities and embedding it in a GQL context, including incorporating elements of the research language, G-CORE. Finally, he sees a chance for Gremlin to play a role as a procedural language within GQL. His perspective is that all those who are working on both tracks of the GQL progress have an interest in making the new, standardized language open to ensure it has good relationships with adjacent languages like GraphQL, Gremlin and SPARQL.

Given the opportunities for advancement in the graph world that would occur with GQL, I will monitor the progress of the move towards a standard language over the next year and report back with any updates.

originally posted on Forbes.com by Dan Woods