Workflow meta in graph database

Graph database is emerging as a couture for storing data. If you haven’t caught up, please run up to catch with latest trends. Like with many new approaches the one thing that daunts this space is modelling of domain in the graph parlance. You can connect with that challenge if you recollect how, we struggled when we sat for the first time to understand Object Oriented Programming in C++. Wasn’t that quite a hurdle to cross over? How can a code written in a computer create a Car and when we call Honk will it emit the honk sound? Forget the concepts of OOP the very fact that real life examples were used it led to confusions of various kind. This is just one type; I am sure you; our readers will have experienced many other kinds of confusions.

To overcome this unsurmountable challenge, we will take an example. We will be using Azure Cosmos DB. We have used the graph API and created a Graph with name property as partition. We use our local developer machine to connect to the remote server. So needless to say, we have installed the Apache Tinkerpop Gremlin console in the local machine.

We hope you configure the console for working with the remote server using Microsoft documentation on that topic.

The workflow

In any IT Services environment this is a bread-and-butter activity for the engineers to execute. They execute a workflow to accomplish a client’s goal and thereby earn.

Many IT companies will have standardized certain workflow as run book which is to be followed by every engineer who wishes to accomplish a specific goal.

Workflow is an interesting topic to model in graph database as well. It is sequential with branches at any stage. If you understand the base concepts of graph database; namely Vertex and Edge; you will have intuitively generated a model by now.

Let us go with the intuition. Each stage is a Vertex and the transition between the stages is modelled as Edge. To make it concrete let us model the workflow that is easily accessible to you.

Documentation of how to connect to AWS EC2 Linux Server from a windows client. We will model two workflows to encounter the tough decisions that a database modeller encounters. We will model for connecting via SSH Client and via WSL from windows client.

The sequence of steps to connect using SSH client has a flow like this –

1. Pre-requisite #1 Check the status of the instance
2. Pre-requisite #2 Get the public DNS and user name to connect to instance
3. Step #1 Issue a SSH command which uses public DNS to connect
4. Step #2 Issue a SSH command which uses public IP to connect to the instance

When you compare to the documentation, we have skipped the pre-requisite of installing the SSH client in the windows client. We feel it is obvious that required software should be present before this workflow is followed. That doesn’t mean it is not required in the docs in AWS. It is required that an explicit mention is made from education perspective but from our context that is kind ofgiven so we skip.

The sequence of steps to connect using WSL has a flow like this –

1. Pre-requisite #1 Check the status of the instance
2. Pre-requisite #2 Get the public DNS and user name to connect to instance
3. Pre-requisite #3 Copy the private key from Windows to WSL
4. Step #1 Issue a SSH command which uses public DNS to connect
5. Step #2 Issue a SSH command which uses public IP to connect to the instance

Outright you will have noticed much of the steps are same. There are couple of this that we should take note of in these workflows at this stage.

1. Compared to connecting from SSH client the WSL has only one additional step i.e., copy the private key from Windows to WSL.
2. If you have referred the documentation in AWS, you will also notice the command to connect is also same.
3. Both Step #1 and Step #2 will never be executed together orone after the other. Either of them will accomplish our goal.
4. Pre-requisite #1 and #2 can be carried out independent of each other in any order.
5. It is quite possible that either of pre-requisite be skippedbecause the developer has preserved the value or performed that earlier.

Armed with those observations we will now go about creating the graph using Gremlin language.

The graph

g.addV(‘WORKFLOW’).property(‘id’, ‘ConnectUsingSSH’).property(‘name’, ‘Connect to Linux EC2 instance from windows machine using SSH client’)

From modelling perspective, we leveraged the context as much as possible. This is a workflow so why not label the vertex so?We do that in the section – g.addV(‘WORKFLOW’). This is a convention we follow that label be capitalized. We use ‘- ‘for separation of words just in case there are two words required to label a vertex. The rest of sections in the line above are self-explanatory.

One thing that we want to call out is; since, we are using CosmosDB we have ready access to the graph in static variable g. This will not be the in case you are connecting to local instance of a graph database which implements the Tinkerpopframework. You will have to initialize the variable in such case. But let us continue with CosmosDB for now and not much bother about the local instance of Tinkerpop implementation.

Also, since you are working with Gremlin console the moment you press enter the action gets executed in server.

We have also created the head of the workflow. The acts kind of title in a document if had you documented this workflow in a document. Let us speed up the process and add all the steps for this workflow –

g.addV(‘WORKFLOW-PREREQUISITE’).property(‘id’, ‘CTL-SSH-Windows-PreStep1’).property(‘name’, ‘Check instance status’)

 

g.addV(‘WORKFLOW-PREREQUISITE’).property(‘id’, ‘CTL-SSH-Windows-PreStep2’).property(‘name’, ‘Get public DNS name and user name to connect to instance’)

 

g.addV(‘WORKFLOW-STEP’).property(‘id’, ‘CTL-SSH-Windows-Step1’).property(‘name’, ‘Issue SSH command using public DNS name’).property(‘command’,’ssh -i /path/my-key-pair.pem my-instance-user-name@my-instance-public-dns-name‘)

 

g.addV(‘WORKFLOW-STEP’).property(‘id’, ‘CTL-SSH-Windows-Step11’).property(‘name’, ‘Issue SSH command using IPv6 address’).property(‘command’,’ssh -i /path/my-key-pair.pem my-instance-user-name@my-instance-IPv6-address’)

One thing that might stand out first if you are pick for the names will be the repetition of the word WORKFLOW. However, it comforts us to not interpret the PREREUQISITE as a prerequisite for something else in the graph as it evolves. The other interesting thing is to capture the information available from the website in the essence of it where we have added a property to the Vertex that goes as Step#1. This is the command that needs to be executed. Modelling this command as separate node will require one additional hop plus the separation of the title i.e., the description of what happens and the action i.e., the command is superfluous.

If you squint your eye a further, you will notice we do not have Step#2. This is because of the observation we made earlier where both the Step#1 and Step#2 both will never executed together or in sequence. Thus, we call it Step11 instead of Step2. We have our vertices ready. The fun of linking them begins now –

g.V(‘ ConnectUsingSSH’).addE(‘NEXT-STEP’).to(g.V(‘CTL-SSH-Windows-PreStep1’))

g.V(‘ConnectToLinuxUsingSSH-Windows’).addE(‘NEXT-STEP’).to(g.V(‘CTL-SSH-Windows-PreStep2’))

g.V(‘CTL-SSH-Windows-PreStep1’).addE(‘NEXT-STEP’).to(g.V(‘CTL-SSH-Windows-Step1’)).property(‘workflow’, ‘ConnectUsingSSH‘)

g.V(‘CTL-SSH-Windows-PreStep1’).addE(‘NEXT-STEP’).to(g.V(‘CTL-SSH-Windows-PreStep2’)).property(‘workflow’, ‘ConnectUsingSSH‘)

g.V(‘CTL-SSH-Windows-PreStep2’).addE(‘NEXT-STEP’).to(g.V(‘CTL-SSH-Windows-Step1’)).property(‘workflow’, ‘ConnectUsingSSH‘)

g.V(‘CTL-SSH-Windows-PreStep2’).addE(‘NEXT-STEP’).to(g.V(‘CTL-SSH-Windows-PreStep1’)).property(‘workflow’, ‘ConnectUsingSSH‘)

g.V(‘CTL-SSH-Windows-Step1’).addE(‘ALTERNATE-STEP’).to(g.V(‘CTL-SSH-Windows-Step11’)).property(‘workflow’, ‘ConnectUsingSSH‘)

There is a bit of gremlin standard which will skip narrating here. You can gather about creating edges in their documentation. From modelling perspective, the interesting things that stand out are –

1. We use same label to mark the edges that connect the subsequent steps of the workflow. i.e., NEXT-STEP
2. Recalling our observation earlier that developer need not execute both the pre-requisites we have spanned the workflow to begin in either of the step or if the developer needs can execute both pre-requisite#1 and #2. Thus, the title vertex i.e., ConnectUsingSSH is connected with both the pre-requisites.
3. Using the same rationale as above Step#1 can be started by either of the pre-requisite.
4. We have observed earlier that many steps are same. Thus, it is prudent to re-use the vertices instead of recreating them. But if we re-use the vertices and edges sometimes it might be confusing as to which workflow steps are being followed. So, we embedded the workflow information in the edge by adding a property called workflow to it. It has the same ID as that of the workflow. This will come in handy when we create the next workflow for connecting via WSL. We will use almost the same vertices but re-create the edge. We will ruminate on the practice of re-creating the edge instead of vertex when we create those edges.

For now, our one part of the graph is ready. We will create the next graph in our next graph. And show you the wonders of retrieving data.

 

image courtesy- PxHere