Staying connected during tough times

Image courtesy- Getty Images

WebRTC the backbone of modern communications

Present times needless to say are challenging. We all as human beings witness this for the first time at this magnitude of impact. As developers we can solve some parts of the challenge to lower the magnitude of impact. One of the dominant problems is the need to stay connected and have more face to face interactions. Video calls in multitude of social apps address this problem. Did you ever wonder what is involved under the hood to solve that kind of a problem?

Definitely the social apps available in the market solve the problem but there is always potential and capability to raise the bar. To do just that we will need to know the basics. In this dispatch we will introduce the basics. Creativity of you can catapult you to greater levels beyond this basic knowledge.

Introduction

WebRTC is a standard that is fundamental to many of the video call capability in the social apps. Google has socialized this technology working with IETF and W3C. Makers of browser picked up on this standard with many browsers starting from Mozilla to Microsoft complying with these standards. Ericsson was the first implementor for this technology in May of 2011. Many practitioners observe it has three pivotal components – 

  1. Real Time Communication abbreviated as RTC contributes to the name of the technology as well!
  2. Plugin-free audio and video transmission
  3. Data channel to share data beyond audio and video

Google observes the pivotal components of the technology as two factors – 

  1. Media capture devices
  2. Peer to Peer communications

No matter what perspective we carry on what are the pivotal pieces of this technology; it is pragmatic we put our use case as central element and see which component of technology contributes towards fulfilling that use case. Generally, we have observed this approach maximises the application of technology. 

From our dispatches you might easily connect when we say something is a standard essentially talks about that technology’s API which can be leveraged to build client applications. We will be looking at those API’s in this dispatch. Let us remind ourselves that this is basic knowledge, advanced use cases can be built upon this knowledge.

The use case

Earlier in the introduction we point the importance of use case to appreciate the first experiment with a technology, we present the use case here – 

An app to use browser to initiate a video call a friend.

We would keep it simple so that we can discuss more on the purpose of the API and talk about the possibilities with them.

WebRTC for video call

This use case requires two major things – 

  1. Peer to Peer connection
  2. Video + Audio device interaction

Majority of API’s defined for WebRTC are organized in WebRTC and getMediaStream. For our use case we will first capture the media content. Then use the RTCPeerConnection to establish connectivity between two friends. This is exciting now we know what to use but how to is the next point that we will be addressing here. We can start simply by launching a favourite text editor. 

Capturing the friend’s media

Let us create a HTML document which will be the canvas on which we will have the video of our device captured and displayed.

<body>

  <video id= “loopbackContainer” autoplay></video>

  <script type=”text/javascript” async>

    const openLocalCamera = async(mediaConstraints) => {

      return await navigator.mediaDevices.getUserMedia(mediaConstraints);

    }

    (async () => {

      try {

        const stream = await openLocalCamera({‘video’: true, ‘audio’: true});

        const loopbackHost = document.querySelector(‘video#loopbackContainer’);

        loopbackHost.srcObject = stream;

      } catch (error) {

        console.error(‘Error accessing local camera or selecting the video control’, error);

      }

    })();

  </script>

</body>

Needless to say, we avoid writing boilerplate HTML markup elements in this code.

There are couple of interesting things happening here. First let us look at the javascript. After all of indulging in so many advanced frameworks we have lost touch with few gems programming in browser. We mean with so many advanced frameworks we transpile and webpack the output to browser so much that we do no more manually hand stitch the javascript inside a browser for execution. The gems we talked about is IIFE (Immediately Invoked Function Expression) and using plain old DOM methods to select an object.

Many of HTML5 gems are asynchronous methods which can be invoked either using the async..await keyword combination or handled via the promises – then and catch combo. We choose to use async..await keyword combo here.

The reason we touched upon the pollution of multiple frameworks is while putting up the code for this tutorial we did struggle for a moment to figure out how to call an async function (openLocalCamera) from browser which will execute on page load. It was this lapse of moment that made us think about missing those hand assembling days of past. We were being nostalgic but definitely the pace of software delivery demanded by modern times we cannot keep up with hand stitching code this manner. For learning we believe we should always be like this.

The very important piece of standard which serves two purpose – 

  1. Plugin free access client media device
  2. Video + audio device interaction

Is from the one line of code – navigator.mediaDevices.getUserMedia(…); The parameters to this invocation is set of constraints that the device should apply to stream of data captured from the camera attached to the device.

Many of the hard lifting required for interactive with the camera is handled by the browser. It is toward this that standard supports a plugin-free experience. Now, let us take this to an extra level and add controls to stop and resume the video feed – 

In the HTML we will have a control like this – 

<video id=”loopbackContainer” autoplay></video>

<div>

  <input type=”button” id=”btnStop” value=”Stop video+audio” />

  <input type=”button” id=”btnResume” value=”Resume video+audio” />

</div>

This should be straight forward. We are adding two controls. Event handlers for these controls will be added in the javascript as below – 

(async () => {

  try {

    let stream = await getLocalCamera({‘video’ : true, ‘audio’ : true});

    let loopbackHost = document.querySelector(‘video#loopbackContainer’);

    loopbackHost.srcObject = stream;

    const stopButton = document.querySelector(‘input#btnStop’);

    const resumeButton = document.querySelector(‘input#btnResume’);

    stopButton.addEventListener(‘click’, () => {

      stream.getTracks().forEach((tr) => {

        tr.stop();

      });

    });

    resumeButton.addEventListener(‘click’, () => {

      stream = await getLocalCamera({‘video’: true, ‘audio’: true});

      loopbackHost.srcObject = stream;

    });

  } catch (error) {

    ……

  }

})();

There are few subtle changes to the code we already created. We have replaced the const with let for stream and loopbackHost. We introduced this to allow for changes to their properties to be made when a control is clicked on the browser. 

Beyond the subtle changes we have introduced event handlers for the button. For the resume button code is similar to the one we run when the page is loaded in the browser. For the stop button we introduce ourselves to a new concept – MediaStreamTrack.

When we capture stream of data from the camera, we get the object in multiple tracks. These tracks give developer finer control on a specific aspect of the data. E.g. we could pause only audio or only video by filtering tracks with those attributes.

So far, we have captured media and have played it back on the local device; kind of proof that we could do that in browser. For the next dispatch we will send this to a remote device and capture the stream and display it there. Till then kindle your creativity on what could be done on top this that we have built