Thread Equivalence Checking - Part 1

July 8, 2024

Concurrent programs are complicated to write and even more complicated to get right. To deal with this, my friend Aaryan and I wrote a tool for verifying the correctness of multithreaded programs.

I’ll present this project as a series of blog posts. In the first, I want to give a motivation and a high-level overview of the project.

Motivation

Languages like C define an abstract machine that is the target of the language. This abstract machine is sequential, and executes statements in their entirety before moving on to the next. Since sequential execution is by design, we can begin by saying that sequential execution is the “correct” way for a program to execute; it is what the programmer intended. However, real machines do all sorts of (nondeterministic) things like reorder instructions, start an instruction before the last one was finished, or execute multiple instructions at the exact same time. Additionally, a single “instruction” for the C abstract machine is usually multiple instructions for the underlying processor. Finally, even with all of the prior optimizations removed, multiple threads of execution are multiplexed to give the appearence of true multitasking.

Each of these optimizations were created to give the appearence of a single stream of execution while also improving performance year after year. The benefit is that most of the time programmers can ignore all of the nuance and write imperative code as if it is being executed one line after another. The downside is that when you can’t ignore the nuance, it is becomes very difficult to know for certain whether your code is correct.

Hardware optimizations are designed to give the appearence of sequential execution relative to a single stream of instructions (and even then they sometimes don’t), but the stream must be correct in the first place. This is on the program to uphold. One way this promise can be broken is with software multitasking, which is what our checker attempts to verify.

Equivalency

We want to show that all possible ways the operating system can run our threads will result in an observable state that is consistent with the state produced by a single-threaded execution of our program.

For example, consider the following program (compiled without heavy optimization):

int global = 0;

void a() {
    int local = global;
    local += 2;
    global = local;
}

void b() {
    int local = global;
    local *= 2;
    global = local;
}

There are only two ways of running the functions sequentially: running a then b or b then a. The former leaves 4 in global, and the latter leaves 2 in global. Thus, the two valid end states are global = 2 and global = 4.

include <stdio.h>

void main() {
    a(); b();
    printf("global = %d\n", global);
    global = 0;
    b(); a();
    printf("global = %d\n", global);
}

However, if we allow a and b to be run concurrently, the operating system can schedule a and b however it pleases. This introduces several more possibilities.

For example, a can be run up through loading global, then the OS switches to b, which loads up through loading global, then a is run to completion, then b is run to completion. The result is global = 0. This is not a possible state if we stick with the abstract machine.

The goal of the thread equivalence checker is to verify, with complete certainty (see appendix), that every way we can run a and b concurrently produce a resulting state that is identical to the final state of a sequential execution of a and b. That is to say, they are part of the same equivalence set and our program is sequentially equivalent.

One way to ensure concurrent execution of a and b end in sequential final states is by using synchronization primitives. Synchronization primitives maintain the illusion of an abstract machine with the performance benefits of multithreading.

For example, with a toy mutex implementation, we can fix our functions.

int global = 0;
mutex_t mut;

void a() {
    lock(mut);
    int local = global;
    local += 2;
    global = local;
    unlock(mut);
}

void b() {
    lock(mut);
    int local = global;
    local *= 2;
    global = local;
    unlock(mut);
}

Now if a is interrupted by b, b will not be able to modify or even read global since its call to lock(mut) would hang (and probably cause it to be interrupted by another thread). The above implementation of a and b is now equivalent to a sequential one!

Approach

Ok, so how do we go about proving (or disproving) sequential equivalency for functions?

More generally, we want to check if any number of functions are sequentially equivalent. To do so, we need to

Collect valid end states by running functions sequentially
Collect all possible end states by running functions concurrently
Compare the results from (1) and (2) to ensure they are equal

Sequential End States

To tackle 1, we can compute all permutations of the given functions (for the above it’s a,b and b,a), run each sequence, and collect the end states.

Concurrent End States

For 2, we need to enumerate all ways the functions can interrupt eachother. To do so, we need to define what it means to be interrupted.

For simplicity’s sake (and to make the problem at all tractable), we assume that the functions are all running on a single core (see Appendix). This means that when the thread running a is interrupted by the thread running b, the OS context switches from the first to the second. The execution state (address of next or last instruction, stack pointer, registers, etc.) of a is saved somewhere (usually on the stack) and the execution state of b is restored.

For our purposes, what matters is that for a single context switch from a to b, there are some number of instructions executed on a, then all of b is executed, then the remainder of a is executed. Let’s represent that as a list: [x,b] where we execute x instructions on a then switch to b and run it to completion, then switch to a and run it to completion. We could have started on b, so we should add a term at the start to state which thread we start on: [a,x,b].

More generally, the format goes:

[start thread, # instructions 0, thread 0, # instructions 1, ..., # instructions n]

where we start on the start thread, run for # instructions 0, switch to ctx switch 0, and so on. We call this an interleaving of threads with n context switches.

Now, let’s enumerate all interleavings. Let’s focus on enumerating the instruction counts first. First, run the interleaving. Now, add 1 to the final instruction count. Run again. You could keep doing this, but how do you know when to stop? Well, let’s return to the prior example, but let’s compile it (with optimization):

a():
        ldr     r2, .L3
        ldr     r3, [r2]
        add     r3, r3, #2
        str     r3, [r2]
        bx      lr
.L3:
        .word   .LANCHOR0
b():
        ldr     r2, .L6
        ldr     r3, [r2]
        lsl     r3, r3, #1
        str     r3, [r2]
        bx      lr
.L6:
        .word   .LANCHOR0

Suppose we only want one context switch. We could start with the following interleaving: [a, 1, b]. Then we increment to [a, 2, b], and so on. However, after [a, 5, b], we’ve run out of instructions in a to execute. Further interleavings with the same prefix (such as [a, 100, b]) are meaningless and redundant. At this point we’re done (aside from enumerating threads).

Furthermore, we can apply this to any number of context switches. If we have n context switches and we run an interleaving that context switches m times, we need to increment the (n - m)th instruction count from the end.

Ok, so what happens after enumerating all the instruction counts? We need to enumerate all destination threads (plus the starting thread). This is simple enough: form all permutations with replacement of n + 1 threads such that no two adjacent threads are the same. This reads like a LeetCode problem so the algorithm is left as an exercise to the reader.

Equivalency Check

After enumerating all possible interleavings, running each, and collecting their end states, we check if each resulted in an end state that is in the set of sequential end states. If so, the functions are correct. If not, there is a bug.

Stay Tuned

This post covered the idea behind our thread checker. However, it left the implementation blank: how do we run an interleaving? how do we collect and compare end states? In the next posts we’ll use Arm debugging features and the MMU to do all of the above!

Part 2

Appendix

Assumptions:

A uniprocessor system: all instructions execute atomically and cannot be interrupted.
Deterministic functions: given a specific interleaving, functions will always execute the same instructions