read

This post is a summary of the classic distributed systems paper, Time, Clocks, and the Ordering of Events in a Distributed System. If you’re interested in distributed systems and why ordering is hard, I’d highly encourage reading the paper. It’s a foundational paper in the area of study and I’m glad I finally got around to reading it.

As a non-academic who is getting used to reading papers more often, I find it useful to write notes as I go - hopefully the summary is helpful for someone, too!

You can think of a process as a series of events and a distributed system as a collection of processes. Each process can be running on a different computer, with its own physical clock. When thought of in this way, it’s clear that determining the order of events in such a system could be impossible. This paper — authored by 2013 Turing Award recipient and general giant in Distributed Systems research, Leslie Lamport — explores this problem, and introduces logical clocks as well as a synchronization algorithm as a way of determining the order of events in a distributed system.

The paper defines a distributed system as one that “consists of a collection of distinct processes which are spatially separated, and which communicate with one another by exchanging messages.”. This also infers that “the message translation delay is not negligible compared to the time between events in a single process.”. It’s important to emphasize this. The transmission of messages in a distributed system is not an atomic action. A message being sent, and a message being retrieved, describe two separate events in a distributed system.

When thought of this way, it might become clear that making statements about the ordering of events in distributed system can become complicated. For instance, imagine that we have two processes, Process A and Process B. Several events occur in Process A, labelled below as a1, ..., a4 and several events occur in Process B, labelled b1, ..., b3. Event a3 and b2 are both the sending of messages, and event a4 and b3 are the receiving of messages. It’s easy to say that a1 happened before a2, and that a3 happened before a4 because they’re in the same process and have access to the same physical clock. It’s impossible however, without additional information, to say whether a3 happened before b2, or whether a4 or b3 happened first. We can say that b2 happened before a4, and that a3 happened before b3, because they have a causal relationship. Event b3, the receiving of a message, can’t happen without a3, the sending of that message, happening first.

process messaging

It’s interesting to note that two events that are not in the same process, and do not have a causal relationship are by definition concurrent. For example, in the above diagram, b1 and a1 are concurrent, even though the diagram implies that a1 happened slightly before b1. We cannot trust that the physical clocks on Process A and Process B are identical, and the relationship isn’t causal, so we can define the two events as concurrent. The way I read this is that if the events happened in parallel, nothing about the outcome would change.

Side Note: This reminded me of an excellent talk Rob Pike gave on Concurrency is not Parallelism.

Introducing Logical Clocks

To solve this problem, the paper introduces the concept of logical clocks. A logical clock is basically a function that assign a time to an event, so if event a comes before event b, C(a) < C(b). One possibly confusing thing is that logical clocks have nothing to do with physical clocks. The only similarity is that in both logical and physical clocks, time has to go forward. A logical clock can assign a value 1 to event a and value 5 to event b if event a happens before event b. This satisfies the condition that C(a) < C(b) if a happens before b. This can be visualized with dotted lines added to the above diagram:

logical clocks

It’s now apparent that a1 and b1 both happened at the “time” c1, a2 happened at time c3, and so on.

So that’s all good, but how do we avoid running into the same problem that we have with physical clocks, where one processes idea of the time might skew from anothers. It turns out that each process can maintain it’s own logical clock, as long as it adheres to two rules.

Rule 1
Each process Pi increments Ci between any two successive events.
Rule 2
(a) If event a is the sending of a message m by process Pi, then the message m contains a timestamp Tm = Ci(a). (b) Upon receiving a message m, process Pj sets Cj greater than or equal to its present value and greater than Tm.

Okay, that’s pretty dense, so let’s unpack a bit. Basically, each process can maintain it’s own logical clock, incrementing the value every time an event occurs. When a process sends a message, it includes the current value of its logical clock as a timestamp. When a process receives a message, it looks at the included timestamp, and sets its own clock value to something greater.

Total Ordering and a Synchronization Algorithm

  • We said before that we already have a partial ordering of events.
  • We use logical clocks to obtain a total ordering of events.
  • We break ties with an arbitrary total ordering < of the processes (so we decide ahead of time that Pi < Pj < Pk)
  • This can be handy for a range of reasons. One example is the problem of mutual exclusion.
  • We have two or more processes that want to obtain a resource with the following conditions: 1) A process which has been granted the resource must release it before it can be granted to another process. 2) Different requests for the resource must be granted in the order in which they are made. 3) If every process which is granted the resource eventually releases it, then every request is eventually granted.
  • So. How do we do this considering that messages sent are not necessarily received in the same order? (we could violate #2 above).
  • Assume that for two processes, messages are sent and received in the same order. We also assume that every message is eventually received.
  • So we use a system with clocks to define a total ordering of events, this tells us what came first - we just have to be sure that each process learns about all the other processes operations.
  • Algorithm:

    • Pi sends message TmPi requests resource. (Tm is the timestamp). This is sent to every other process.
    • Each other process, Pj receives the message TmPi, it sends an acknowledgement message to Pi.
    • To release the resource, Pi removes any TmPi requests resource message from it’s request queue and sends TmPi releases resource message to every other process.
    • Each other process, Pj receives a Pi releases resource message, it removes any TmPi requests resource message from its queue
    • Process Pi is granted the resource if 1) there is a TmPi requests resource message in its request wueueu which is ordered before any other request in its queue by the relation =>. 2) Pi has received a message from every other process timestamped later than Tm.

This algorithm is distributed! It is not dependent on centralized consensus or storage. * The synchronization is spcified in terms of a State Machine. C is all possible commands, S is all possible states. e(C x S) => S. e(C, S) = Si means executing command C with the machine in state S changes the machine to state Si.

Each process independently simulates the execution of the state machine, using the commands issued by all the processes. All processes order the commands according to their timestamps using the total ordering A process can execute a command with a timestamp T when it has learned of all commands issued by all other processes with timestamps less than or equal to T Not robust to failures. A single process failing will halt the entire system.

Blog Logo

Paul Osman


Published

Image

Paul Osman

Breaking computers, racing bicycles, running.

Back to Overview