Graph database is emerging as a couture for storing data. If you haven’t caught up, please run up to catch with latest trends. Like with many new approaches the one thing that daunts this space is modelling of domain in the graph parlance. You can connect with that challenge if you recollect how, we struggled when we sat for the first time to understand Object Oriented Programming in C++. Wasn’t that quite a hurdle to cross over? How can a code written in a computer create a Car and when we call Honk will it emit the honk sound? Forget the concepts of OOP the very fact that real life examples were used it led to confusions of various kind. This is just one type; I am sure you; our readers will have experienced many other kinds of confusions.
To overcome this unsurmountable challenge, we will take an example. We will be using Azure Cosmos DB. We have used the graph API and created a Graph with name property as partition. We use our local developer machine to connect to the remote server. So needless to say, we have installed the Apache Tinkerpop Gremlin console in the local machine.
We hope you configure the console for working with the remote server using Microsoft documentation on that topic.
In any IT Services environment this is a bread-and-butter activity for the engineers to execute. They execute a workflow to accomplish a client’s goal and thereby earn.
Many IT companies will have standardized certain workflow as run book which is to be followed by every engineer who wishes to accomplish a specific goal.
Workflow is an interesting topic to model in graph database as well. It is sequential with branches at any stage. If you understand the base concepts of graph database; namely Vertex and Edge; you will have intuitively generated a model by now.
Let us go with the intuition. Each stage is a Vertex and the transition between the stages is modelled as Edge. To make it concrete let us model the workflow that is easily accessible to you.
Documentation of how to connect to AWS EC2 Linux Server from a windows client. We will model two workflows to encounter the tough decisions that a database modeller encounters. We will model for connecting via SSH Client and via WSL from windows client.
The sequence of steps to connect using SSH client has a flow like this –
When you compare to the documentation, we have skipped the pre-requisite of installing the SSH client in the windows client. We feel it is obvious that required software should be present before this workflow is followed. That doesn’t mean it is not required in the docs in AWS. It is required that an explicit mention is made from education perspective but from our context that is kind ofgiven so we skip.
The sequence of steps to connect using WSL has a flow like this –
Outright you will have noticed much of the steps are same. There are couple of this that we should take note of in these workflows at this stage.
Armed with those observations we will now go about creating the graph using Gremlin language.
The graph
g.addV(‘WORKFLOW’).property(‘id’, ‘ConnectUsingSSH’).property(‘name’, ‘Connect to Linux EC2 instance from windows machine using SSH client’) |
From modelling perspective, we leveraged the context as much as possible. This is a workflow so why not label the vertex so?We do that in the section – g.addV(‘WORKFLOW’). This is a convention we follow that label be capitalized. We use ‘- ‘for separation of words just in case there are two words required to label a vertex. The rest of sections in the line above are self-explanatory.
One thing that we want to call out is; since, we are using CosmosDB we have ready access to the graph in static variable g. This will not be the in case you are connecting to local instance of a graph database which implements the Tinkerpopframework. You will have to initialize the variable in such case. But let us continue with CosmosDB for now and not much bother about the local instance of Tinkerpop implementation.
Also, since you are working with Gremlin console the moment you press enter the action gets executed in server.
We have also created the head of the workflow. The acts kind of title in a document if had you documented this workflow in a document. Let us speed up the process and add all the steps for this workflow –
g.addV(‘WORKFLOW-PREREQUISITE’).property(‘id’, ‘CTL-SSH-Windows-PreStep1’).property(‘name’, ‘Check instance status’)
g.addV(‘WORKFLOW-PREREQUISITE’).property(‘id’, ‘CTL-SSH-Windows-PreStep2’).property(‘name’, ‘Get public DNS name and user name to connect to instance’)
g.addV(‘WORKFLOW-STEP’).property(‘id’, ‘CTL-SSH-Windows-Step1’).property(‘name’, ‘Issue SSH command using public DNS name’).property(‘command’,’ssh -i /path/my-key-pair.pem my-instance-user-name@my-instance-public-dns-name‘)
g.addV(‘WORKFLOW-STEP’).property(‘id’, ‘CTL-SSH-Windows-Step11’).property(‘name’, ‘Issue SSH command using IPv6 address’).property(‘command’,’ssh -i /path/my-key-pair.pem my-instance-user-name@my-instance-IPv6-address’) |
One thing that might stand out first if you are pick for the names will be the repetition of the word WORKFLOW. However, it comforts us to not interpret the PREREUQISITE as a prerequisite for something else in the graph as it evolves. The other interesting thing is to capture the information available from the website in the essence of it where we have added a property to the Vertex that goes as Step#1. This is the command that needs to be executed. Modelling this command as separate node will require one additional hop plus the separation of the title i.e., the description of what happens and the action i.e., the command is superfluous.
If you squint your eye a further, you will notice we do not have Step#2. This is because of the observation we made earlier where both the Step#1 and Step#2 both will never executed together or in sequence. Thus, we call it Step11 instead of Step2. We have our vertices ready. The fun of linking them begins now –
g.V(‘ ConnectUsingSSH’).addE(‘NEXT-STEP’).to(g.V(‘CTL-SSH-Windows-PreStep1’)) g.V(‘ConnectToLinuxUsingSSH-Windows’).addE(‘NEXT-STEP’).to(g.V(‘CTL-SSH-Windows-PreStep2’)) g.V(‘CTL-SSH-Windows-PreStep1’).addE(‘NEXT-STEP’).to(g.V(‘CTL-SSH-Windows-Step1’)).property(‘workflow’, ‘ConnectUsingSSH‘) g.V(‘CTL-SSH-Windows-PreStep1’).addE(‘NEXT-STEP’).to(g.V(‘CTL-SSH-Windows-PreStep2’)).property(‘workflow’, ‘ConnectUsingSSH‘) g.V(‘CTL-SSH-Windows-PreStep2’).addE(‘NEXT-STEP’).to(g.V(‘CTL-SSH-Windows-Step1’)).property(‘workflow’, ‘ConnectUsingSSH‘) g.V(‘CTL-SSH-Windows-PreStep2’).addE(‘NEXT-STEP’).to(g.V(‘CTL-SSH-Windows-PreStep1’)).property(‘workflow’, ‘ConnectUsingSSH‘) g.V(‘CTL-SSH-Windows-Step1’).addE(‘ALTERNATE-STEP’).to(g.V(‘CTL-SSH-Windows-Step11’)).property(‘workflow’, ‘ConnectUsingSSH‘) |
There is a bit of gremlin standard which will skip narrating here. You can gather about creating edges in their documentation. From modelling perspective, the interesting things that stand out are –
For now, our one part of the graph is ready. We will create the next graph in our next graph. And show you the wonders of retrieving data.
image courtesy- PxHere