Things I learned by creating a web app with text and voice channels

Things I learned by creating a web app with text and voice channels

Recently I created this web app (with React and Node) which has text and voice channels, there are a lot of things I learned, I have condensed the learnings in this blog, and added more details about the app at the end of this blog. This blog will give a glimpse of how live chat applications like WhatsApp and multimedia streaming applications like Google Meet and Discord work.

From HTTP to WebSockets

"Can't we used simple HTTP requests for creating a chat application ❓", I wondered when I saw most of the blogs online recommending WebSockets for chat applications.

HTTP uses the client-server model, the client initiates a request and once it gets the complete response (HTTP 1.1+), the connection is closed. Now the server can't push data if the client has not requested it, which is needed in case of chat applications. One solution is to continuously request the server every few seconds using XHR or fetch API, this is called polling, but it is resource-intensive on both sides. Another similar method is long polling where the server holds up the requests in a queue and respond only when data is available to send.


For chat applications, we need a low latency bi-directional channel where both the client and server can push the data on the channel at their convenience,WebSockets serve the exact purpose πŸ˜ƒ. They also have a smaller header size and are stateful, which means the connection terminates if either of endpoints closes the connection.


How chat applications work ? 🧐 websockets - Google Slides - Google Chrome 2020-12-29 01-52-09.gif

The above gif demonstrates how messages are sent in a group chat. When the server receives a message from a client, it broadcasts the message to all other connections.


SocketIO is a Javascript library that simplifies the usage of WebSockets (it isn't purely WebSockets based and uses other protocols too for reliability). Handling group chat becomes easy because of socketIO rooms.

The need of WebRTC

Applications that use multimedia sharing like Google Meet, Discord, etc use WebRTC (Web Real-Time Communication) technology. The WebSocket protocol and it's API's aren't optimized for multimedia streaming, also we have a server between two clients which can cause lag.WebRTC works best if we want real-time sharing of audio and video without a lag. WebRTC allows peer to peer teleconferencing between two browsers and uses UDP at the transport layer. Peer to peer means there is no intermediate server between two computers, i.e, two client browsers directly connect using the WebRTC API, which is not the case in WebSockets. I have used WebRTC for the audio channel in my application.

Establishing a peer to peer connection is really difficult because of NAT devices in between, WebRTC tries to achieve this peer to peer connection using a STUN server.


NAT stands for Network Address Translation. Firewalls and most routers have NAT functionality in them. When we connect to the internet, it's highly probably there's a NAT device in between. A NAT device talk's to the internet on behalf of you. It means all the request from/to your computer's IP and port actually pass through the NAT's IP and port. It basically replaces the source IP and port of the packet you send with its own public IP and port. An external device talks to the NAT's IP and port and feels like it's talking to you.

p2p.gif NATs are used to establish network security. Since your computer doesn't know the public IP address of NAT which is talking on behalf of you to the internet, you take the help of a STUN server to discover this external/public IP address. Similarly, one computer can also connect to the other computer by getting its public IP address using the STUN server. Smart, isn't it ?😎. In the above gif, I have not shown but there's a NAT between the computers and the STUN server.

If there's symmetric NAT (it's the most restrictive one) between two peers, it's almost impossible to establish a peer to peer connection. In that case, the data is relayed via a TURN server, just like WebSockets. All of this connection establishment rules come under ICE protocol in WebRTC.

I have kept the explanation for NATs and STUN short, and will probably write a separate blog on NATs and STUN πŸ€”, it's explained here and here in detail.

About my web app Study Room ☺️


Looks like the blog as got really long 😡. So, I'll end this by telling you about the app I have been working on. It's called Study Room, basically, students can join in rooms and discuss together in text and voice channels. It's made with ReactJs and ExpressJs. For live chat, I have used the SocketIO library. For the voice channel, I have used the library peerJS. The voice channel uses mesh architecture in WebRTC, it's the easiest to implement but it isn't scalable. The scalable option is MCU.


Thanks for reading the blog πŸ˜‡. I am sure I have missed a lot of things, please comment on this blog if you have anything to share.