This post is a summary of the classic distributed systems paper, Time, Clocks, and the Ordering of Events in a Distributed System. If you’re interested in distributed systems and why ordering is hard, I’d highly encourage reading the paper. It’s a foundational paper in the area of study and I’m glad I finally got around to reading it.
As a non-academic who is getting used to reading papers more often, I find it useful to write notes as I go - hopefully the summary is helpful for someone, too!
You can think of a process as a series of events and a distributed system as a collection of processes. Each process can be running on a different computer, with its own physical clock. When thought of in this way, it’s clear that determining the order of events in such a system could be impossible. This paper — authored by 2013 Turing Award recipient and general giant in Distributed Systems research, Leslie Lamport — explores this problem, and introduces logical clocks as well as a synchronization algorithm as a way of determining the order of events in a distributed system.
The paper defines a distributed system as one that “consists of a collection of distinct processes which are spatially separated, and which communicate with one another by exchanging messages.”. This also infers that “the message translation delay is not negligible compared to the time between events in a single process.”. It’s important to emphasize this. The transmission of messages in a distributed system is not an atomic action. A message being sent, and a message being retrieved, describe two separate events in a distributed system.
When thought of this way, it might become clear that making statements about
the ordering of events in distributed system can become complicated. For
instance, imagine that we have two processes, Process A and Process B. Several
events occur in Process A, labelled below as
a1, ..., a4 and several events
occur in Process B, labelled
b1, ..., b3. Event
b2 are both the
sending of messages, and event
b3 are the receiving of messages.
It’s easy to say that
a1 happened before
a2, and that
a3 happened before
a4 because they’re in the same process and have access to the same physical
clock. It’s impossible however, without additional information, to say whether
a3 happened before
b2, or whether
b3 happened first. We can say
b2 happened before
a4, and that
a3 happened before
they have a causal relationship. Event
b3, the receiving of a message, can’t
a3, the sending of that message, happening first.
It’s interesting to note that two events that are not in the same process, and
do not have a causal relationship are by definition concurrent. For example, in the above
a1 are concurrent, even though the diagram implies that a1
happened slightly before
b1. We cannot trust that the physical clocks on
Process A and Process B are identical, and the relationship isn’t causal,
so we can define the two events as concurrent. The way I read this is that if
the events happened in parallel, nothing about the outcome would change.
Introducing Logical Clocks
To solve this problem, the paper introduces the concept of logical clocks. A
logical clock is basically a function that assign a time to an event, so if
event a comes before
C(a) < C(b). One possibly confusing thing
is that logical clocks have nothing to do with physical clocks. The only
similarity is that in both logical and physical clocks, time has to go forward.
A logical clock can assign a value 1 to
event a and value 5 to
event b if
event a happens before
event b. This satisfies the condition that
a happens before
b. This can be visualized with dotted lines added
to the above diagram:
It’s now apparent that
b1 both happened at the “time”
happened at time
c3, and so on.
So that’s all good, but how do we avoid running into the same problem that we have with physical clocks, where one processes idea of the time might skew from anothers. It turns out that each process can maintain it’s own logical clock, as long as it adheres to two rules.
- Rule 1
- Each process
Cibetween any two successive events.
- Rule 2
- (a) If event
ais the sending of a message
Pi, then the message
mcontains a timestamp
Tm = Ci(a). (b) Upon receiving a message
Cjgreater than or equal to its present value and greater than
Okay, that’s pretty dense, so let’s unpack a bit. Basically, each process can maintain it’s own logical clock, incrementing the value every time an event occurs. When a process sends a message, it includes the current value of its logical clock as a timestamp. When a process receives a message, it looks at the included timestamp, and sets its own clock value to something greater.
Total Ordering and a Synchronization Algorithm
- We said before that we already have a partial ordering of events.
- We use logical clocks to obtain a total ordering of events.
- We break ties with an arbitrary total ordering < of the processes (so we decide ahead of time that Pi < Pj < Pk)
- This can be handy for a range of reasons. One example is the problem of mutual exclusion.
- We have two or more processes that want to obtain a resource with the following conditions: 1) A process which has been granted the resource must release it before it can be granted to another process. 2) Different requests for the resource must be granted in the order in which they are made. 3) If every process which is granted the resource eventually releases it, then every request is eventually granted.
- So. How do we do this considering that messages sent are not necessarily received in the same order? (we could violate #2 above).
- Assume that for two processes, messages are sent and received in the same order. We also assume that every message is eventually received.
- So we use a system with clocks to define a total ordering of events, this tells us what came first - we just have to be sure that each process learns about all the other processes operations.
- Pi sends message TmPi requests resource. (Tm is the timestamp). This is sent to every other process.
- Each other process, Pj receives the message TmPi, it sends an acknowledgement message to Pi.
- To release the resource, Pi removes any TmPi requests resource message from it’s request queue and sends TmPi releases resource message to every other process.
- Each other process, Pj receives a Pi releases resource message, it removes any TmPi requests resource message from its queue
- Process Pi is granted the resource if 1) there is a TmPi requests resource message in its request wueueu which is ordered before any other request in its queue by the relation =>. 2) Pi has received a message from every other process timestamped later than Tm.
This algorithm is distributed! It is not dependent on centralized consensus or storage. * The synchronization is spcified in terms of a State Machine. C is all possible commands, S is all possible states. e(C x S) => S. e(C, S) = Si means executing command C with the machine in state S changes the machine to state Si.
Each process independently simulates the execution of the state machine, using the commands issued by all the processes. All processes order the commands according to their timestamps using the total ordering A process can execute a command with a timestamp T when it has learned of all commands issued by all other processes with timestamps less than or equal to T Not robust to failures. A single process failing will halt the entire system.