Whatsapp System Designing

I will try explain everything about whatsapp system designing in a sensible engineering way. 



Once you are able to design this, you should be able to design any chat based application.

Note: Key features are mentioned as *

Overall Architecture



User Equipment is connected to a gateway in the cloud, and all the internal services will be carried out with the use of external protocol in different language. All the security mechanisms will be taken care by gateway. 

We also need to track of mapping between the user equipment and the gateway. If this mapping is stored in gateway, it will be quite expensive. 

 User Equipment   ====>       Gateway 

 UE 1                     ====>       Gateway 1

 UE 2                     ====>      Gateway 2

Maintaining a TCP session in the gateway itself costs very high. Moreover if we maintain these mapping, it is going to be replicated in all the boxes. It is also a transient information. 

Since Gateways always starve for memory, we need microservice to keep these sessions (user to gateway mapping). All the information from the gateway are decoupled and stored in "sessions" microservice.  This is just another router. 

Also, there will be multiple services to avoid single point failure. 




Sending Message from One to Another

Now, with session service in place, let's start sending the message. 

User sends a message to his/her connected gateway, but this gateway is pretty dumb and doesn't know what to do with the message. So, it will just forward it to sessions' service which is indirectly another router. It tries to figure out where the user B is located or to which gateway it is connected to by searching in the database. It routes the message to Gateway 2 which then send it to User B.


Now, B received a message from A. These message transfer can't be done using HTTP but client to server protocols using which client sends a request and server sends responses. Click here to learn more about Client-Server. In my socket programming post, I've also mentioned about polling where the server will be running all time and whenever it receives any read/write request, it will process it. 




We will not be able to use HTTP in real time but we need a web socket over TCP. Websockets are used mainly to allow peer-to-peer communication and there is no client-server semantics. With this, server can send a message to client B.  


Sent + Delivered + Read Receipts *

When session service sends a message to user B, in a parallel manner, it also sends a message to Gateway which in turn sends a response to User A.

           Sent ✅

B received the message which means it has been delivered. 

           Delivered 

B must also send an acknowledgement to Gateway 2 which sends to sessions service. Sessions service looks for where A is located in its database and sends the below acknowledgement message to User A. Hence, A received a delivery receipt. 

           Delivered ✅
 
Message:                         
           From : "User B".  
           To : "User A"        

When User B opens the chat app and sees the message, there will be another acknowledgement sent from User B to User A. This depicts the read receipt. 
 
           Read ✅

It is a lot to digest but read it once again to have a good insight. 

Last Seen & online 

If User B wants to know when A was last online, it would be pretty dumb to ask A itself. So, there will be some database maintained in server where all these information are stored. 

When A has done some activity in Whatsapp, the time gets stored in the database. 

UserTimestampGateway
ANov 21, 2020 05:001
BJan 21, 2021 16:002

This threshold can be varying. For example, if User A was online 3 seconds ago, the timestamp shouldn't be stored as last seen was 3 seconds ago instead it should be telling as online. This threshold can be set as 10 seconds, so only after 10 seconds, the user's last seen will be updated as "last seen was 10 seconds ago".

This last seen can be tracked by another microservice. 

Load Balancer

Service Discovery and Heartbeat Maintenance 

Group Messaging *

There are 5 users drawn in the overall architecture picture, and there are two groups. One group is in red and the other in green. Two users connected to gateway 1 and 1 user connected to gateway 2. It is too complicated for session service to handle all these information, so we decoupled these information in a 'Group service' microservice. 


When a User 1 (who is in Red) sends a message in a group chat, 'session service' will ask who are the other group members to the 'Group Service' microservice. Group Service microservice will tell that 10 users exist in these group with the following IDs, and Session Service runs through its own database to find to which gateway these are all connected to. With this information, it will be routing the message to each of the users one by one. 

If the group has too many members, it is too complicated. In these kind of scenarios, we are in the need of batch processing. But, the chat application is a real-time app, so we limit the number of persons in the group. Session service can handle the web sockets which send these messages to the relevant users. 

As group receipts for delivered and seen is expensive, so we don't want to do that. 

Though we have a bare bone structure, the details are important.

This also gets the advantage of consistent hashing and message queues. 

Parser Microservice

While parsing a message, if we want to do some smart things to find out if these are authenticated or not on the gateway, push away all these responsibilities away from gateway. The smart way to send an unpassed message to any service is having the parser microservice. 

Parser Microservice is placed between the gateway and sessions service. The message can be passed in two ways: 

1. Via HTTP as a JSON object 

2. Over TCP using websockets 

Parser will convert the unpassed message to sensible message. This is basically done to avoid any overhead in the gateway. 

This will reduce exponentially a gateway's work and cost in terms of developing the software. 

Consistent hashing

Group ID will be mapped to user IDs. This is one-to-many mapping because one group can have many users. To avoid duplication in the information, we will use consistent hashing.  It reduces memory footprint across the servers by delegating some information to some boxes. 

It allows to route the request (group ID) to the right box. 

Message Queues

After sending a message, if it fails, it will be retrying to send it again. But, retry works only if we know what request we need to send next. 

The advantage of message queue is once the message is queued in the message queue, it ensures that the message will be sent. It will be sent after a particular time, and this time is configurable. If it fails, it will retry for n number of times. Again, this n is also configurable. 

If the message is not able to send, it will send an acknowledgement saying that it is failed to the users.

Deprioritising Messages

When there is a festival season, everyone will be wishing each other, so there would be a lot of load on the system. Sometimes, we might drop messages. Instead of dropping, we can deprioritise the messages.  By deprioritising the messages, we are making the system health good.

Retrial, Imdempotency, Ordering are posted in other post. 

Image Sharing 

Image storage and retrieval is explained in Tinder video. 

Chats are Temporary/Permanent 

If the chats are stored only in user space, it gives a lot of privacy and reduces storage issues at the service provider end if it is stored only in user's application.

Mostly all the chat based application are developed in temporary basis. If I delete my Whatsapp and my friend also deletes her/his Whatsapp, those chat messages are lost forever. 

Single point of Failure 

Will be updated.

Profile/Image/Email/SMS Service

It is not relevant

Auth Service 

It is quite simple 

Service Discovery/HeartBeat Maintenance 

Will be updated. 

References: 

https://www.youtube.com/watch?v=vvhC64hQZMk 

https://medium.com/javarevisited/whatsapp-system-design-dd1cc2e6bfb4


Comments