Potentially Broken Address Dependency via test_bit() When Compiling With Clang

From: Paul Heidekrüger
Date: Wed Oct 27 2021 - 06:29:05 EST


Hi all,

For my bachelor thesis, I have been working on the infamous problem of
potentially broken dependency orderings in the Linux kernel. I'm being
advised by Marco Elver, Charalampos Mainas, Pramod Bhatotia (Cc'd).

For context, see:
https://linuxplumbersconf.org/event/7/contributions/821/attachments/598/1075/LPC_2020_--_Dependency_ordering.pdf

Our approach consists of two LLVM compiler passes which annotate
dependencies in unoptimised intermediate representation (IR) and verify
the annotated dependencies in optimised IR. ATM, the passes only
recognise a subset of address dependencies - everything is still WIP ;-)

We have been cross-compiling with a slightly modified version of
allyesconfig for arm64, and the passes have now found a case that we
would like to share with LKML for feedback: an address dependency being
broken (?) through compiler optimisations in
fs/afs/addr_list.c::afs_iterate_addresses().

Address dependency in source code, lines 373 - 375 in fs/afs/addr_list.c:

> [...]
> index = READ_ONCE(ac->alist->preferred);
> if (test_bit(index, &set))
> goto selected;
> [...]

where test_bit() expands to the following in
include/asm-generic/bitops/non-atomic.h, lines 115 - 122:

> static __always_inline int
> arch_test_bit(unsigned int nr, const volatile unsigned long *addr)
> {
> return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
> }
> #define test_bit arch_test_bit

The address dependency gets preserved in unoptimised IR since the virtual register %33 transitively depends on %28:

> %28 = load volatile i8, i8* %preferred, align 2, !annotation !15
> store i8 %28, i8* %tmp21, align 1
> %29 = load i8, i8* %tmp21, align 1
> %conv23 = zext i8 %29 to i32
> store i32 %conv23, i32* %index, align 4
> %30 = load i32, i32* %index, align 4
> store i32 %30, i32* %nr.addr.i, align 4
> store i64* %set, i64** %addr.addr.i, align 8
> %31 = load i64*, i64** %addr.addr.i, align 8
> %32 = load i32, i32* %nr.addr.i, align 4
> %div.i = udiv i32 %32, 64
> %idxprom.i = zext i32 %div.i to i64
> %arrayidx.i = getelementptr i64, i64* %31, i64 %idxprom.i
> %33 = load volatile i64, i64* %arrayidx.i, align 8, !annotation !16

In optimised IR, there is no dependency between the two volatile loads
anymore:

> %11 = load volatile i8, i8* %preferred, align 2, !annotation !19
> %conv25 = zext i8 %11 to i32
> %set.0. = load volatile i64, i64* %set, align 8

Now, since @nr traces back to the READ_ONCE() to @index, does this make
the load from @addr in test_bit() address-dependent on that READ_ONCE()?
Should the load from @addr therefore be ordered against the READ_ONCE()?

Many thanks,
Paul