Image Courtesy – Pexels.com
Continuation of gentle introduction to graph database
We acclimatized ourselves with the notion of graph database. We being have worked with relational databases for more than 4 decades find this novel and suddenly much of our data warehouses and principles are shifted left in data lifecycle. Concepts central to graph database is vastly different from the one for relational databases. They model the world as objects and relationships; rather the web of relationships.
Quick recap of the central concepts –
Node which represent the entity or the noun or the concept of object
Label classify and organize the concept of object
Relationship establish the connection between two concept of object or the noun
Property decorate the concept of object or the noun or the relationship with adjectives
We wetted our appetite with CYPHER the query language of graph database. We will dive deeper into that in this dispatch. Before we do that, we also will look at different approaches a wanna-be spiderman can leverage graph database. i.e. as developer different ways to connect to graph database.
Tapping into data
Databases have two central elements for them to function. Process which runs as daemon and offers connection and answers queries asked by the clients. Storage which is responsible for storing (obviously) on a persistent medium which is used by processor to respond to queries. Off these two processes is the one which mediates all interactions with external systems. One such system is client.
Client’s connect with database processes over typically a binary protocol like TCP. These protocols are intensive for client to code against. Thus, traditionally all database providers have offered drivers which ease connectivity activity for client. These drivers offer library of code which takes the pain away from clients to write code specific to the binary protocol.
Graph databases are no different and offer similar solution. However, in modern times text-based HTTP protocols are popular as well. Thus, popular graph databases offer text protocol-based connectivity.
Last dispatch we picked neo4j as graph database vendor. Thus, let us use the specifics of neo4j for this topic as well.
Neo4j uses bolt as the binary protocol. Thus, the address which client connects with to neo4j can be of the form –
bolt:://localhost:7643
http://localhost:7467
https://localhost:7648
The binary bolt protocol uses language specific code. Thus, neo4j offers libraries which are released for popular programming languages dotnet, python, javascript, java etc.
We have this section to keep a bookmark on this which we will pick in a different dispatch and continue. Like promised we will continue to learn on how we can build insights from data. For that we need to think and talk like spidey who can instinctively discover, interpret and work on these insights.
Thinking like Spidey
Having looked at how clients connect we need to learn how to talk to storage powerhouse. CYPHER is the way we do that. We peeked at how it looks let’s take a step back and learn the principles of learning a new language.
CYPER is a query language. In specific it is a language which works on property graph. You may ask what is a property graph and is it any different from graph database. The answer is sweet and short – Property graph is one way of realizing graph database. That makes CYPHER as one of the query languages for graph database. The approach of property graphs for realizing graph database makes its mark in query language as well.
So, what stands for Vertex in the pure graph concept is known as Node in CYPHER. Similarly, what is termed as Edge in pure graph concept is called Relationship in CYPHER. There are query languages which use this language; however not CYPHER.
Each query language from the days of relational databases have principles central to them. We realize that by now you are fed up with the concept of principles. We are taking an approach similar to the ones where we started with principles of graph database followed up with Neo4j and now we do the same for query language. In our defence we will like to put forth; these principles help understand the concept easily without getting dragged to nuances specific of a vendor. Thus, though we picked neo4j as vendor to give examples we use principles to put out the concepts of the query language.
Coming back to the principles of query language; they are –
Selection
Filtering
Projection
Rename
Readers with mathematical background will quickly connect this with relational algebra. For all the readers the question that will linger is how are principles of relational algebra applicable to graph database. We started out with setting them as different type of data management and storage system.
It is simple even though graph databases are different in terms of management and storage of data; there are commonalities in the way data is retrieved and worked with between graph and relational databases. In addition to those resembling the relational algebra there is a critical element which will settle this beyond any doubt –
Steps
Traversals
These are two principles which are specific to graph databases. By nature, themselves traversals comprise of steps. Thus, you can for all practical purposes consider them as one principle.
Let us map these principles to actual developer stuffs (code ?). So,
Selection is performed using the clause MATCH in Cypher
Filter is performed using the clause WHERE in Cypher but there are other complex scenarios where filters are used even without the WHERE clause
Projection is performed using the clause RETURN in Cypher
Traversals & Steps are concept which are addressed as Path in graph database. As far as it is about keywords they map out to OPTIONAL MATCH, and / or RELATIONSHIP realized as ASCII art
Rename maps to the same word that is used in relational databases i.e. AS.
Now that we know these classifications and meta about CYPHER; let us pick one query from our earlier dispatch and talk about it in the context of these clauses and interpret them –
MATCH (n) RETURN n LIMIT 25
Here, we have the following interpretation from the query language –
MATCH => Selection
(n) => Variable (though not a principle; take this as you consider x in any equation)
RETURN => Projection
LIMIT => Projection Constraint
25 => Constant.
Here is another principle of interpreting this query –
SELECTION [PLACEHOLDER] PROJECTION WITH CONSTRATINT [CONSTRAINT PARAMETER]
In simpler English this can be stated as –
Select [something] and project them (the selected [something]) for display.
You might be curious on how the relationships are represented and queried. Let us take a simple query form the database we used in previous dispatch.
MATCH (n:PERSON)-[:INTERACTS]->()
RETURN n
Here we have the following interpretation from the query language –
MATCH => Selection
(n:PERSON) => We have earlier talked about n. Consider them equivalent to a x in any equation or any “i” in your for loop. But the syntax of “:PERSON” is new. The “:PERSON” is a label. We talked about labels when we introduced the concept of graph database. The label is prefixed with colon “:”. This helps group nodes into groups.
-[:INTERACTS]-> => This is the ASCII art style way of depicting a relationship. There are components of the relationship which can be interpreted separately. “– “represents the starting node directionality. i.e. in our example it is absent. “->” represent the ending node directionality. Which in our example points to an unnamed node and is not projectionable in our example. i.e. “( )”. Then there is the [:INTERACTS] which represent a label like the one for node. So, there can be label for relationship and queried as well in CYPHER.
() => The node; which can be any node to which a person node has a interacts relationship. For our example we are not bothered about naming or storing them in a variable.
RETURN => Projection where we project the variable in our case represented as n.
In a simpler English we are attempting to –
Fetch all person who interact with any other person.
Though we did not put the node to carry the label of person; since we loaded data of person we will not get anything different than person.
Like we did earlier for the simpler query we can write this using the terms from principles like –
Select [all person] filter and project them (the selected person) for display.
You might wonder we did not use the “WHERE” clause yet we claim to have filtered? Yes, this is an implicit filter which uses traversal and step to filter. i.e. we have used –[ ]-> which is a traversal mechanism and is a single step process.
Let us stop here for this dispatch. We will give you some time to ponder on this style of thinking about data and relationship. We will take up a live scenario and apply graph style of thinking to show you the insights from the data in next dispatch.