Image Courtesy- Pikrepo
Gentle introduction to Graph database
Ever wondered how Spiderman would see the world? Yes, ofcourse the way we do. No mystery there. But the way he interprets the world is vastly different than us. Yes, I am talking of the spidey sense. As developers we also tend to develop the spidey sense over few projects. We tend to have insight towards what can go wrong and what is prone to be a threat to the amazing software or the piece of software we build.
Spidey sense in that manner is a ticket to success. Business houses which are successful to have developed the spidey sense. Typically, the observed pattern is of an individual developing this spidey sense and leading a team to success. The elusive target has always been to get the spidey sense to a group of individual. Teams, gaining spidey sense are the ultimate achievement a business team can have.
Over ages business houses have depended on insights derived from data to acquire the spidey sense. The non-superheo way of putting this would be business acumen is developed by using data. Thus, to get in perspective –
Being spiderman requires one to be able to build insights from data
This is why databases have been around for ages to come now. Processing capability is always coupled with capability to manage the storing of data. There are multiple waves of technology trends in this space of managing data but most of them have stayed put up for long times in the scale of computing trends. Relational databases have been around from late 60’s or dominantly from 70’s till late 2009’s that is around 39-40 years of stretch. Then came the avalanche of databases of which the one we are going to focus for this dispatch is the Graph database.
Graph database is popularly visualized as web of interconnected dots. Nothing different from the Spiderman’s web. Best part of this web is there are so much of insights that can be drawn relatively easily than the contemporary styles of managing data. The way we can look at this style of managing data is –
Relational databases store data and insights are born by processing the data in a particular manner
Whereas for the graph database
Graph databases store data along with insights which can be queried right away making it easier for people to get insights.
Without further delay let us familiarize ourselves with this modern style of storing data and managing knowledge.
Central concepts of Graph
In our industry we will be learning and unlearning many things over our lifespan. Best approach to learn quickly something is to grasp the central concepts first before we dive deeper no matter how lucrative the Hello World program is to us. Similarly, best approach to unlearn which is always and should always succeed learning a new concept is to drop all the biases in our mind and read the text as if it is the first time we came across them.
With this background let us visit the central concept to graph database –
Node is pivotal concept of the graph database. Without node much of the other two dimensions will not exist.
Property is one which decorates a node or relationship with additional information. This additional information are stored in the form of key-value pair. There is no constraint on the modeller to use specific property and not be able change without incurring much of challenges.
Relationship adds insight to the graph database. This connects two nodes and describes the nature of the connection which typically is the insight which we drawn parallel with spidey sense earlier.
The mnemonic many of us connect well as developers is typically used to describe classes in Object Oriented Programming. Given, a paragraph of requirement in a text; nouns in the text represent the classes. There are more such mnemonics from the programming world. But, the underlying style if you notice is related to
This is altogether different approach to manage data. The fundamental storage structure of the database changed with this kind of databases. This was best visualized as a web of connected points. Thus, the analogy of Spiderman. The fact which stands central to this is the parallel we draw from the literary world. If we attempt similar mnemonic for the central concepts of graph database, we will end up getting something similar to this –
With an example this looks like this –
This is the fundamental on which we build our understanding of different vendor offering on graph database.
There are multiple vendors who offer a graph database. Drawing a list of such vendors might not serve the purpose of development that well. However, that is an information which if we have is good enough.
AgensGraph, Blazegraph, CosmosDB, Grakn.AI and of course neo4j are few popular ones to know.
We will be focusing on the neo4j offering which you can easily get started and there are lot of support by the community. This vendor’s offering of graph database is more like OLTP style of database. Whereas the Grakn.AI is the one which is tuned specific to distributed hyper relationship and knowledge mining purposes .
Cosmos DB on the other hand is polyglot style of persistence where it is a Platform as a Service offering by Azure and supports, RDBMS, Object and of course Graph storage.
Other than the databases themselves there are many surrounding aspects which require discussion for practical application. Of these concepts we have –
1. Query language
2. Computing framework support
3. Deployment models
To keep focus we will focus on the query language. Powered with query language you can for yourself experience the graph database. Popular graph databases support the following query languages –
CYPHER and GREMLIN is the most popular and commonly supported query languages. SPARQL is more exotic and is typically supported in the database which of triple store storage.
You can get started by downloading the community version or by trying your hands on the sandbox version on cloud. Let us use the sandbox version for this dispatch. We assume that you are comfortable with the terminology of ACID and CAP and use mentions without expansion in the rest of the article. Neo4j as described by themselves is an ACID-compliant transactional database. This means they have made the same choices as their relational counterpart have made. However, applicability of CAP is questionable to neo4j, since it is not really a distributed database per-se.
Neo4j supports CYPHER right out of the box, GREMLIN support can be added via a plugin. You could use one of the supported ID providers by the sandbox version. Your first login will be greeted with project selection where you get some preloaded data in the graph database. You could use one or start with a blank graph database.
Each project lasts for 3 days and launched within a docker container. We selected the graph algorithms for this dispatch. You are free to select any of the pre-listed project. Once the instance is up and running you will need to click the “Open Neo4j Browser” button –
Browser is a web-browser based query execution window equivalent to TOAD, SQL Management Studio, MySQL Browser etc. Within the browser there will be few things that will help you get started and experience the graph database right up front. For example, the following command will have been pre-executed in the browser –
On navigating further right, you can click on few starter CYPHER queries to explore the graph database. The first of the query is more management command –
Since, the graph is already loaded with data you will notice something like this on the window –
This tells us that there is essentially only one type of node called Character which relates to itself with few relationships.
This command thus explains the bare essential of the database. However, if you notice there is no visual on the properties of the node or the relationship. To explore that run the following query –
MATCH (n) RETURN n LIMIT 25
You will get a result of this sort –
For now, we will leave you here, with dreams of possibilities. Right now, if you feel like Spiderman who just discovered his strengths and wonders what he could do with that; you are in the sweet spot. We will resume this in next dispatch with more details on queries a