Re: [RFC PATCH] LKMM: Add ctrl_dep() macro for control dependency

From: Mathieu Desnoyers
Date: Wed Sep 29 2021 - 15:51:00 EST


----- On Sep 29, 2021, at 10:54 AM, Linus Torvalds torvalds@xxxxxxxxxxxxxxxxxxxx wrote:

> On Wed, Sep 29, 2021 at 7:47 AM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> And if there *is* need ("look, we have that same store in both the if-
>> and the else-statement" or whatever), then say so, and state that
>> thing.
>
> Side note: I'd also like the commit that introduces this to talk very
> explicitly about the particular case that it is used doe and that it
> fixes. No "this can happen". A "this happened, here's the _actual_
> wrong code generation, and look how this new ctrl_dep() macro fixes
> it".
>
> When it's this subtle, I don't want theoretical arguments. I want
> actual outstanding and real bugs.
>
> Because I get the feeling that there were very few actual realistic
> examples of this, only made-up theoretical cases that wouldn't ever
> really be found in real code.

There is one particular scenario which concerns me in refcount_dec_and_test().
It relies on smp_acquire__after_ctrl_dep() to promote the control
dependency to an acquire barrier on success. Because it is exposed
within a static inline function, it hides the fact that control dependencies
are being used under the hood.

I have not identified a specific instance of oddly generated code, but this is
an accident waiting to happen. If we take this simplified refcount code
into godbolt.org and compile it for RISC-V rv64gc clang 12.0.1:

#define RISCV_FENCE(p, s) \
__asm__ __volatile__ ("fence " #p "," #s : : : "memory")
#define __smp_rmb() RISCV_FENCE(r,r)

volatile int var1;
volatile int refcount;

static inline bool refcount_dec_and_test(void)
{
refcount--;
if (refcount == 0) {
__smp_rmb(); /* acquire after ctrl_dep */
return true;
}
return false;
}

void fct(void)
{
int x;

if (refcount_dec_and_test()) {
var1 = 0;
return;
}
__smp_rmb();
var1 = 1;
}

We end up with:

fct(): # @fct()
lui a0, %hi(refcount)
lw a1, %lo(refcount)(a0)
addiw a1, a1, -1
sw a1, %lo(refcount)(a0)
lw a0, %lo(refcount)(a0)
fence r, r
snez a0, a0
lui a1, %hi(var1)
sw a0, %lo(var1)(a1)
ret

Which lacks the conditional branch, and where the "fence r,r" instruction
does not properly order following stores after the refcount load.

Adding ctrl_dep() around the refcount == 0 check fixes this:

fct(): # @fct()
lui a0, %hi(refcount)
lw a1, %lo(refcount)(a0)
addiw a1, a1, -1
sw a1, %lo(refcount)(a0)
lw a0, %lo(refcount)(a0)
beqz a0, .LBB0_2

fence r, r
addi a0, zero, 1
j .LBB0_3
.LBB0_2:

fence r, r
mv a0, zero
.LBB0_3:
lui a1, %hi(var1)
sw a0, %lo(var1)(a1)
ret

I admit that this is still a "made up" example, although it is similar to the actual
implementation of refcount_dec_and_check(). But if we need to audit every user of this
API for wrongly generated code, we may be looking for a needle in a haystack.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com