VoIP Blog

The sad state of VoIP from browsers

Apr 30 2016

The sad state of VoIP from browsers

There are 3 popular ways to make phone calls over the internet:
  • using a hardware phone: dedicated hardware on your desk with internet connection over an Ethernet cable or Wi-Fi
  • using a softphone: a software which you can download for your OS offering VoIP call capabilities
  • using your browser: everything is being migrated to browsers nowadays, so it should be normal to be able to make phone calls straight from your preferred browser
In this blog post I am trying to describe the "VoIP from Browser" problem, especially when interconnection is required with traditional telecommunication infrastructure (SIP, PSTN).

We can talk about VoIP as a mainstream technology from around 1996, when H.323 was born. Internet average bandwidth has barely reached the minimum requirement for VoIP (around 10kbits, although there are also 2kbits codec with decent voice quality. However, for VoIP link, quality is even more important than bandwidth: packet loss, latency and its variations resulting in jittering). The SIP protocol basics have been developed at the same time with H.323, however its first RFC (2543) was published only in 1999. Now SIP is the de facto standard for VoIP (despite its often unnecessary complexity, it has a clear advantage over H.323).

At the beginning there was Internet Explorer. And thanks to Microsoft we had the much loved ActiveX controls :) If somebody is not familiar with ActiveX: it means native code (C++ and others) running in the browser based on the complicated COM/OLE. Windows audio layer was a quality software from the beginning, Linux and the current market leader Android are way behind with their crappy API and audio delays. ActiveX was not an easy to use technology, but for those who mastered the COM/OLE world it was an easy task to write an almost perfect H.323 or SIP client embeddable in browser. 99% of the users using VoIP was on Windows/IE at this time, so there was no reason to talk about the other platforms.

As the internet spread around the world and malicious software became a serious threat, ActiveX quickly became the target of hackers, thus becoming a pain for endusers. IE also got competition and ActiveX was less and less used. Fortunately, browser plugins (other than ActiveX, for example NPAPI based) became a trend and these plugins allowed access to native OS API, thus overcoming the lack of ActiveX. Everybody was happy again with a little bit more work to port their software to different browsers.

In the meantime, we had a completely separate platform to create cross-platform VoIP applications for browsers: the Java Applet. It was a challenge not to oversize the application with all the bloated java packages (more suitable for server side) but it was possible. Another big headache with Java was its audio library: a total mess with lots of bugs, inconsistencies, incapabilities and delay. However, for those who was persistent enough could implement the audio layer in native C which was accessible to java trough JNI.
I have seen only very few Java Applet based VoIP clients, but it was a nice concept for those who succeeded to implement it.

In 2005 Adobe purchased Macromedia and converted it to Flash. It was a huge success. And more importantly, it got RTMP support for VoIP. Sadly, from the quality point of view it was far behind native, maybe because the lack of UDP transport. However, a few year later RTMP had a major upgrade with much better quality and UDP support. One big drawback was that it required a separate App server to convert the media and the signaling to traditional VoIP (SIP/RTP with codec's such as G.711 and G.729).

Around 2008 MS begun to push Silverlight, which was a promising technology for web browsers and included everything needed for a VoIP client. Unfortunately, it was too late for them, unable to beat Google on web.

In 2014 the big browser vendors suddenly decided to kill all these. HTML5 is the new hype. Poor developers now need to rewrite everything to HTML5 to fulfill the needs of the giants who are playing a game that nobody understands. (Browser as an OS re-implementing everything but now based on the poor JavaScript?). Fortunately, they implemented something also for VoIP and called it WebRTC (Real time communications for web). WebRTC is actually a black-box in the browser, covering only audio/video recording/playback and streaming with a few codec. The other part of a VoIP client, the signaling, can be implemented (hacked) separately via a new technology called Websocket. Both of these are very inconsistent at this moment, support is different in browsers and constantly changing as new bugs are introduced.
Ah, and I forgot to mention, the other hype: encryption. WebRTC has got a nice encryption layer (DTLS/SRTP) for which I haven't seen a stable server side solution yet due to implementation complexity. Also, it requires secure HTTP (HTTPS enforced in Chrome) which makes it more cumbersome to use for novices or in internal networks with no internet access.

After all, the whole browser ecosystem is exposed to the giants and vendors business interests:
  • Google wants to see everything in browsers (except on platforms owned by them: Android)
  • Mozilla also gets their money from web after their browser, so it is in their interest to have a decent browser, however their business relationships and goals are too complicated and often depends on where the money is
  • Microsoft already has Skype so their interests is to delay WebRTC as much as possible (thankfully they can't do this for ewer because IE/Edge is not in that position anymore)
  • Apple's main business is not the web, but they can't fall too behind with Safari, so they must follow the others
Welcome to the future: forced to one single technology, with a quality superseded by any VoIP software released in 1996. Forget 10 years of research and development (for example G.729 and G.723 codecs which are perfect for narrowband) and start again everything on the new super hyped HTML5 which will give us all the goods we missed in the last years. You are free now. Free to forget everything learned in the past decade, free to invest in new "sexy" technologies and free to throw  everything out again a few years later when the global politics of these software giants will force you to do so.
Hopefully VoIP from browser is not so against the giants interests and soon or later we will have a better WebRTC implementation across all browsers, however until that we are stuck with this mixed, always-changing environment.

To summarize, WebRTC has the following disadvantages:
  • not native VoIP: extra software layers required to convert from TLS/Websocket/DTL/SRTP to simple UDP based SIP/RTP as it used today in most VoIP networks
  • it is limited black-box in the browser. You get a small API suitable only for the basics (No way to interact with the media streams for example to add filters or call recording)
  • too complex: to implement it correctly, you need to be familiar with javascript, websocket, HTML5 WebRTC API, Web Audio, SIP, SDP, RTP, TLS/DTLS, SRTP, ICE, STUN, TURN
  • doesn't work from corporate networks when UDP is blocked or only ports 80 and 443 are allowed. Chrome and Firefox just recently started to add better support for media over TURN TCP/TLS
  • weak codec support: you are forced to use G.711, G.722 or OPUS. Forget about the good old  G.729, G.723, iLBC and others. G.711 doesn't offer compression and opus can't be routed directly to current telecom carriers
  • mandatory encryption: it has good intentions but requires some more CPU (TLS, DTLS and SRTP processing) and complications (at software layer and users have to setup SSL certificate for websocket) which would be unnecessary otherwise in many circumstances
  • too many bugs and compatibility problems: while browser vendors are disabling the old well known technologies, they have failed to provide a replacement suitable for production usage (everything is in "draft" and "beta" stage)
In the next posts I will describe how we intend to solve this complex problem with our webphone trying to create an always-working, robust but simple to use solution.