Picture courtesy- noun project

XML is a word of the past and JSON represents the present and the future. There have been many other formats like RDF, RSS and ATOM which stemmed from XML. In the present days no other data format has been more popular than JSON. One of the reasons we feel it gained the foothold and continues to hold it is the simplicity. There is very few to things to be called as Semantics associated with JSON. XML on the other hand has so rich, and diverse semantics that sometimes simpler things get more difficult to do using this format.

JSON-LD is a natural evolution of the steady and high popularity of JSON. As more and more data are exchanged, stored in JSON format the relationship between these fragments carry more importance and have compelled the industry to define the bare minimum delta required to make the semantics bit richer for JSON.

JSON-LD stands for JSON for Linking Data. This format is used to define and add meta and link data represented by REST endpoints in a typical environment. This is ratified as Standard with version 1.1 in July of 2020. In this dispatch we take a stab on this format and how to process this format C#. There is a steady stream of libraries for other popular languages as well.

Potential use cases

 

A simple CRUD REST endpoint dealing with JSON structure might get only a slight edge with JSON-LD and as developers one might complain it adding more complexity than value over time. However, there are compelling situations where this format weighs heavily on generating value compared to adding complexity.

Knowledge graph thrive on rich meta data. Its effectiveness is not a singular value measured at any instant but effectiveness of knowledge graph is a metric that needs to be curated and monitored over span of time (typically in months and years). JSON-LD can be that fuel which offers readability to both machines and humans, occupies lesser space on disk for knowledge graphs.

Search engines efficiency and effectiveness (again!) depends on the quality of meta on the web page. Space occupied and eloquence of expressing the meta i.e., information about the information present on the page a big thing for these use cases. Google has indeed indicated effectiveness of JSON-LD using an example to improve SEO rankings for pages.

 

<html>

    <head>

       <title>Disinfection services – Chennai</title>

       <script type=”application/ld+json”>

           {

               “@context”: “https://schema.org”,

               “@type”:”LocalBusiness”,

               “name”: “Clean Home Services Pvt. Ltd.”,

               “url:”https://cleanhomes.co.in”,

               “openingHours:”MoSu 10:00-21:00”,

               “description”:”We offer cleaning services to home including disinfection and deep cleaning. This page describes the offering detail. Interested customers can also contact our Cleanliness Experts to learn more.”

           }

       </script>

    </head>

</html>

 

Document databases could enhance the feature offerings and match few insightful features typically offered by the relational databases. Many other such use cases get enabled with standardization of JSON-LD.

 

The library

Across different strata of developer this article focuses on the strata that puts demonstrates value to business by writing code. Developers in these strata do not have time and energy to spend thinking about nuances for format and its tradeoff unless forced.

For such developers writing code in C# – dotNetRDF. If you are taken aback by the name of the library, hang-in! Conceptually these concepts are deeply related. We will not venture to the philosophical similarities between these formats. There is a JSON-LD API within this library this 1.1 compliant. There is another library that is endorsed by the official JSON-LD website – json-ld.net. This library however is 1.0 version compliant. Choosing a library is your choice. Let us for now expand a bit more dotNetRDF.

Simple and easy steps to install the library –

 

dotnet add package dotNetRDF

 

Once you are in the next step is to write code. There are numerous tutorials and very well documented website to get you rolling with that. We will focus on a scenario and putting this library to use for you to better connect and to kindle your imagination on how to go about using this format and library.

The scenario

We extend the local business scenario above. The business is in its humble beginning. They have a public facing website. Like many other businesses they use website to pass the word, have an interface for customers to reach out for their services. There is no online shopping cart yet. They do have an appointment booking feature in the website. That’s about it on the background. Being a business of the uber age of technology the developer who they contracted for the website development have already used JSON-LD to markup up meta for the content on website. It is built using simple HTML, CSS and JS. However, the developer did not document much about the tags and value used for the important elements of JSON-LD. Business now wants to understand what has gone in and how they could improve it better. Business apparently appears low on the Google searches.

You as developer are now tasked with creating the summary of the JSON-LD tags and values used by the contracted developer. You can do that by opening all the HTML pages and copying relevant values and sending that across to business. Alternatively, you can write a program to read these HTML files and get those values more quickly.

We are going to expand upon the second approach. We can do that easily with RegEx but that then you will have to wrestle with the patterns and getting to what you want will be very difficult.

The code

We will skip the standard processing activities like listing all files, reading the file content and parsing the JSON+LD segment from the page. We sill start with processing the JSON+LD segment and try to get the data that business wants.

We have already installed the library. We will need some associated libraries like HtmlAgilityPack and Newtonsoft to work with HTML and JSON. We will skip the part where we parse the HTML using the HtmlAgilitPack library. JSON+LD starts with parsing JSON. We have amazing library Newtonsoft that does this for us. In fact, dotNetRDF accepts parameters that are types defined in the Newtonsoft library.

 

string jsonString = scriptObject.InnerText;

JObject parsedMeta = JObject.Parse(jsonString);

 

We have taken a leap in the code and scriptObject represents the HtmlElement of the document which is a script tag with type specified as “application/ld+json”. Once the element is available, we are getting the InnerText. Remember the yield when we use the property InnerHTML in this case will be similar. But it is better we get the string directly instead of HTML element and then get the value from the element.

Next, we can either guess or sample iteratively. What we mean by that is we would need to know how the JSON-LD is written across all the documents. Hopefully the developer has written it in a uniform manner. That is a good assumption to start. Any deviations we can easily know it after our first run. The first run is simple invocation of a static method on JsonLdProcessor.

 

using VDS.RDF.JsonLd;

 

 

JObject parsedObject = JsonLdProcessor.Compact(parsedMeta, null, new JsonLdProcessorOptions());

 

This is all that is required to get a json object representation of the meta present in the file. However, you might have a revelation and probably a shortcut as well. If we are going to get a JObjectonly why not limit ourselves at let us say the earlier statement itself i.e., at parsedMeta wouldn’t that be easier?

Definitely yes that would have got us what we wanted only if the context was absent. The line in code above passes null as second parameter. Null being passed as second parameter this line of code is just an excess work. However, if we change the code slightly, we can cover ourselves for many more scenarios –

 

 

using VDS.RDF.JsonLd;

 

 

JObject parsedObject = JsonLdProcessor.Compact(parsedMeta, parsedMeta, new JsonLdProcessorOptions());

 

The parameters if you check the documentation is that first is the JSON+LD document and the second parameter is context. You can explicitly search for the context in the parsedMeta’s json body else you can pass the object itself again for the library to extract the context.

The reason you do this is to not make an assumption that the developer has written an expanded JSON-LD for the HTML document. In such scenarios the call to Compact will make the tags simpler and shorter for you to export into excel or csv format that a business user could connect more easily.

It serves another purpose where that one line can also validate whether the SEO’s could parse the JSON+LD meta written by the developer. Because if for a page if the JsonLdProcessor fails to compact you can be fairly sure that SEO will also fail. This information could help business to look at the page fresh instead of making small corrections to the incorrect JSON+LD meta. This would not have been possible with merely stopping at parsing the text as JSON object.

Using the parsedObject each property can be enumerated in a for each loop within which the excel or csv file could be created.

Wish you a great time ahead in solving your own problems with code!