Dot Net For All

The Unique Pillars of distributed systems

Distributed systems and systems design is one of the most sought after topic in the interview these days. And there is no reason that it should not be the one. If you want to be a good software engineer you need to be aware about these concepts. In this article I will walk you through the six unique pillars of distributed systems.

What is a distributes system?

A distributed system is compose of nodes that cooperate to achieve task by exchanging messages over communication links. A node can generically refer to a physical machine or a software process.

A distributes systems requires high availability and need to be resilient to single node failure.

Some systems are distributed as the amount of data and workloads they handle is too big to be handled by single node or machine. For example, google search engine.

The pillars of Distributed systems.

Communication

Wherever there is a distributed system, there is communication as well.

There could be multiple ways in which a client interacts with the server. It could be through Http, SOAP, gRPC etc.

There are two type of communication i.e. asynchronous and synchronous. As the name suggests the synchronous communication is the one in which the we get the response for each every request sent to the server.

Asynchronous communication occurs between two services when the caller posts a message to the reciever and once the caller gets the acknowledgement that the message will be processes d, it returns to the work. If the request required a response value, the receiver can use a reply queue to notify the caller of the result,

Although there could be some networking library which might be helping to communicate. But we should be aware about the nitty grities of the communication under the hood. How are the request an d response messages represented on the wire? What happens when a temporary network outage happens? How do we securely send the messages over the wire?

Coordination

Coordination refers to how much work coordination the workflow modeled by the communication requires. The two common generic patterns for microservices are orchestration and choreography.

Consistency

Consistency refers to the way we want to save data in the datastore and the way services behave in microservice architecture. Atomic transactions lie on one side of the spectrum, whereas different degrees of eventual consistency lie on the other hand.

Scalability

The performance of a distributed system represents how efficiently it handles load, and it’s generally measured in throughput and response time.

Throughput is the number of operations processed per second and response time is the total time between a client and request and its response.

Load can be measured in different ways since its specific to the system’s use case. For example, number of concurrent users, number of communication links or ratio of writes to reads are all different forsm of laod.

A quick and easy way to increase the capacity is to add more hardware with better performance, referred as scale up. This will also eventually hit the ceiling sooner or later.

The other option is to add more nodes or servers working in coordination, which is known as scale out.

Resiliency

Resilient system continues to work even if some of the nodes doesn’t work or are completely down. There could be many possibilities of the system failure like network down or system failure. No matter how small that failure could be but we have to design the system to keep in mind that it will eventually fail.

Being pessimist about your system is the best way to work and evolve it.

Failures that are left unchecked can effect system’s availability, which is defined the amount of time the application can serve requests divided by the duration of the period measured.

Availability is often described with nines, a shorthand way of expressing percentages of availability. Three nine are typically considered acceptable and anything above four is considered to be highly available.

Operations

Distributed systems need to be tested, deployed and maintained. It used to be that one team developed an application, and another was responsible for operating it. The rise of microservices and devops has changed that.

The same team that designs a system is also responsible for its live-site operation. New developments need to be rolled out continuously without affecting the system’s availability. The system needs to be observable so that it’s easy to understand what’s happening at any time. Alerts need to fire when its service level objectives are at risk of being breached.

Conclusion

Agree or not, the systems which are monolith today are going to be distributed in the coming future. You can notice it in your project and organization. Its going to happen sooner or later. You have to be very much aware about the system design concepts to tackle that challenge. I have just scratched the surface of distributed system design in this article but there’s so much to add.

Top career enhancing courses you can't miss

My Learning Resource

Excel your system design interview