A sequential monad for concurrent programming

This projects brings forward a monad to mimick concurrent programming in a sequential setting. This monad is not meant to be used to actually run concurrent programs, but rather to study them and analyse them. The general approach we have is to explore every possible behaviour of the program exactly once, sequentially, in one particular order. Because of the design, it is enough indeed to explore one trace of the program to deduce everything about the causal and nondeterministic behaviour of the program.

Source Code · Documentation of the library

The general idea

The core idea is to mix a state and a continuation monad. Because we are running the whole program in one trace (including different ondeterministic execution), we need a context to tell us where we are at the moment. This context is split into three fields:

The core idea does not require any structure on these types, so they are simply parameter of the monad. We describe this core monad in the next section.

The continuation is there to represent the fact that a program might return several values at top-level, at different times. For instance, nondeterministic choice returns true or false, so will invoke the continuation twice. We will see that the usage of continuations is also crucial for representing shared data between concurrent threads.

The core monad: sequentiality, parallelism, nondeterminism

As mentioned in the previous section, the core idea simply assumes three types for local-level, execution-level and global-level state. This gives the context. The core monad is then simply defined as

type 'a t = context -> (context -> 'a -> context) -> context

So a computation of type 'a expects a context, a continuation and must return a new context. The continuation itself expects the updated context, and a return value and returns a new context. It is important that the context is returned by the continuation, and is key to knitting the executions together – remember we are exploring the whole system together.

It is straightforward to see that this forms a monad.

This monad supports rich forms of nondeterminism as we will see later. A first example is the boolean choice operator that returns true or false, nondeterministically.

let choice : bool t = fun context k ->
  let context' = k context true in
  let context'' = k { context with global = context'.global } false in
  context''

This code calls the continuation twice, in sequence with the two booleans. But only some of the context from the first call is propagated to the second one: only the global part of the context. Otherwise the two calls are called with the same history and the same execution state, which esnsures that they won’t see each other and will not be able to communicate.

The alternative, is fork which like the Unix primitive, returns several values /in parallel/. In this case, the branch seeing true and the the branch seeing false can commuicate through shared state.

let fork : bool t = fun context k ->
  let context' = k context true in
  let context'' = k { context with global = context'.global; execution = context'.execution } false in
  context''

In this case, we also forward the execution state from the first call to the second one. This will ensure that these two will communicate. In other words, we should have

fork >>= fun b ->
if b then write 1 else read ()

to possibly read one, while

choice >>= fun b ->
if b then write 1 else read ()

to never read one. But we have not discussed shared state yet. This core monad is agnostic about the types for histories, execution and global states, so does not support shared data out of the box. This comes as a layer on top which instanciates the following type to implement shared data.

Shared Data

We now explain how we represent access to a shared piece of data among concurrent threads. The idea is based off distributed system and starts by giving a data type signature. In its simplest form, a datatype signature gives two types: a type t describing the datatype and a type 'a op describing the potential operations on this datatype. The 'a describes the return type of the operation, that can be used for query operations such as read. Then, on these type, one must give two functions: apply which takes an operation, a value of type t and returns a pair of a new value, and the return result; and commute which indicates when two operations commute (ie. their action on the state commute).

For instance, a simple reference has t = int, 'a op contains two constructors: R of type int and W(k) of type unit (k has type int). In OCaml these types are captured by GADTs. The apply function is natural and the commute function specifies that only pairs of reads commute – everything else does not commute.

The model works as follows: when two concurrent threads want to issue an operation on the shared state, these operations have to be linearised unless these operations commute. Avoiding useless linearisations is key to save performance so we believe is a necessary optimisation. A signature is commutative when all operations are commutative – then no linearisations; or no “synchronisations” are needed.

We know explain how to add a function perform : 'a op -> 'a t that performs an operation on the shared data and returns its results.

Sequentiality and shared accesses

How do these sequential accesses work in our sequential setting. We demonstrate the basic idea on a simple program:

fork >>= fun b ->
if b then (perform R >>= fun n -> return (Printf.printf "Value read: %d\n" n))
else perform (W 1)

This program should return 0 or 1 (assuming the initial value is 0). We now describe how this program is executed under our semantics:

  1. We first call fork, which calls its continuation on true
  2. Then, we perform a Read, which reads 0 as they are no other choices. However, it adds in the execution state that a “Read” operation has been performed, and indicates the continuation of that Read operation (here: the printf). This bundle (operation + continuation + misc) is called a transaction.
  3. Then, that is all for the first continuation call of fork, which then calls its continuation on false, but with the execution environment enriched with the presence of the Read operation.
  4. Now we perform the write. However we notice that there is a concurrent Read, and Read/Write does not commute. So we create two nondeterministic executions
    1. One where the Write does occur after the read, and writes 1.
    2. One where the Write occurred before the read. Then, the transaction of the Read is invalidated, and we restart the read. We can do this by (1) applying the operation again and (2) invoking the continuation stored in the transaction on the new return value.

For this to work, we need some data in the context, to keep track of transactions performed and of their causal relationship. Indeed, the program

perform R >>= fun n -> perform (W n)

should not have to restart anything since the read is causally before the write.