Partial failure
In distributed systems, services communicate over asynchronous networks, which means:
- A service must wait for a response from another service after sending a request
- Transmission time can vary due to network congestion, dynamic packet routing, and network connection failures
- Data loss is possible due to packet corruption in wireless networks or an unavailable receiving service
- Services are not synchronized because they do not have identical internal clocks
The fundamental issue with the situations described above is determining whether and when a response is received, which is referred to as handling partial failures.
Consider two services, A and B. If A cannot send a request to B, it is straightforward to handle. However, if A can send the request to B, several scenarios can occur:
- B receives the request but is unable to process it, resulting in no response being sent
- B takes an excessive amount of time to process the request and the request times out at A's side
- B returns a response, but A is unable to receive it
When a request times out on A's side, it may be resent, causing duplication of the request. However, in some cases, such as withdrawing $1000 from a bank account, duplicate requests must not be applied. To prevent this, a system property called idempotence ensures that duplicate requests are only executed once.
Implement idempotence
Read requests are naturally idempotent since they do not make any state changes, but write requests are not. Idempotence can be implemented for write requests as follows:
- Include a unique idempotency key, such as an UUID, in the request
- When the request is received by the server, it checks if the key is already stored in the database
- If the key is not present, the server processes the request and stores the key in the database, both of which are done in a transaction to ensure consistency
- If the key is already present, the server simply returns the response of the previous request, avoiding any duplicate processing
To implement the database for storing idempotency keys, we can create a separate table in the database for storing application data. The database could be a key-value store to support low-latency lookups. Note that there is no need to keep idempotency keys forever.
Types of delivery guarantees
The property of idempotency guarantees that each message is delivered exactly once, a.k.a exactly-once delivery, which is one of the three delivery guarantees shown in the table below. When considering reliability and performance, we must weigh the trade-offs among the delivery guarantees.
At-most-once | At-least-once | Exactly-once | |
---|---|---|---|
Definition | A message is sent <= 1 times | A message is sent >= 1 times | A message is sent 1 times |
Transmission ack | Fire and forget | Retransmission ack | Retransmission ack, no duplicates |
Reliability | Unreliable | Moderate | Reliable |
Performance | Fast | Moderate | Slow |
Example | UDP messages (network layer) | TCP messages (network layer) | Database transactions (application layer) |
Consensus
Consensus is a fundamental problem in distributed systems where a group of nodes need to reach an agreement on a single value or decision. The problem arises due to the possibility of node failures or network partitions, which can result in different nodes having different views of the system state. Consensus is essential for many distributed systems applications such as database replication, leader election, and atomic transactions.
There are several consensus algorithms designed to solve the consensus problem, including Paxos, Raft, and Zab. These algorithms differ in their design and complexity, but they share the same goal of ensuring agreement among nodes in the presence of failures. These algorithms typically involve a leader node that proposes a value, and the other nodes vote on whether to accept or reject the value.
Achieving consensus in a distributed system is not an easy task, and it comes with trade-offs. For instance, consensus algorithms often sacrifice system performance and availability for safety. In addition, consensus algorithms can be challenging to implement and verify, which can lead to bugs and other issues that can affect the system's correctness.
Time
To determine the order of events in distributed systems, we can compare their timestamps. However, clocks on different nodes can vary due to temperature or voltage changes. To address this, time services are available to provide accurate time sources. The most commonly used one is the Network Time Protocol (NTP).
One issue with NTP synchronization is the periodic resetting of clocks, which can affect the local node time. This can result in inaccurate measurements of event response times.
It is recommended to use NTP services to periodically resynchronize clocks, with intervals ranging from 1 hour to 1 day. For applications that require precise ordering of events across different nodes, it is better to use Chrony (opens in a new tab), which has higher accuracy and scalability compared to NTP. Another option for AWS customers is Amazon Time Sync Service (opens in a new tab). However, applications should not depend solely on timestamps from different nodes to determine the actual event order.
Keynotes
Partial failure
- Services in distributed systems communicate over asynchronous networks
- Handling partial failures is the core problem
- Idempotence ensures duplicated requests are only applied once
Consensus
- Consensus algorithms prioritize safety over performance and availability
- Implementation and verification of consensus algorithms can be challenging and may introduce bugs
Time
- Clocks on different nodes are different
- Applications should not rely on timestamps to calculate the actual order of events