Re: [PATCH] compiler.h: Fix barrier_data() on clang

From: Arvind Sankar
Date: Thu Oct 15 2020 - 10:45:20 EST


On Thu, Oct 15, 2020 at 08:50:05AM +0000, David Laight wrote:
> From: Arvind Sankar
> > Sent: 14 October 2020 22:27
> ...
> > +/*
> > + * This version is i.e. to prevent dead stores elimination on @ptr
> > + * where gcc and llvm may behave differently when otherwise using
> > + * normal barrier(): while gcc behavior gets along with a normal
> > + * barrier(), llvm needs an explicit input variable to be assumed
> > + * clobbered. The issue is as follows: while the inline asm might
> > + * access any memory it wants, the compiler could have fit all of
> > + * @ptr into memory registers instead, and since @ptr never escaped
> > + * from that, it proved that the inline asm wasn't touching any of
> > + * it. This version works well with both compilers, i.e. we're telling
> > + * the compiler that the inline asm absolutely may see the contents
> > + * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495
> > + */
> > +# define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory")
>
> That comment doesn't actually match the asm statement.
> Although the asm statement probably has the desired effect.
>
> The "r"(ptr) constraint only passes the address of the buffer
> into the asm - it doesn't say anything at all about the associated
> memory.
>
> What the "r"(ptr) actually does is to force the address of the
> associated data to be taken.
> This means that on-stack space must actually be allocated.
> The "memory" clobber will then force the registers caching
> the variable be written out to stack.
>

I think the comment is unclear now that you bring it up, but the problem
it actually addresses is not that the data is held in registers: in the
sha256_transform() case mentioned in the commit message, for example,
the data is in fact in memory even before this change (it's a 256-byte
array), and that together with the memory clobber is enough for gcc to
assume that the asm might use it. But with clang, if the address of that
data has never escaped -- in this case the data is a local variable
whose address was never passed out of the function -- the compiler
assumes that the asm cannot possibly depend on that memory, because it
has no way of getting its address.

Passing ptr to the inline asm tells clang that the asm knows the
address, and since it also has a memory clobber, that it may use the
data. This is somewhat suboptimal, since if the data was some small
structure that the compiler was just holding in registers originally,
forcing it out to memory is a bad thing to do.

> If you only want to force stores on a single data structure
> you actually want:
> #define barrier_data(ptr) asm volatile("" :: "m"(*ptr))
> although it would be best then to add an explicit size
> and associated cast.
>

i.e. something like:
static inline void barrier_data(void *ptr, size_t size)
{
asm volatile("" : "+m"(*(char (*)[size])ptr));
}
plus some magic to disable the VLA warning, otherwise it causes a build
error.

But I think that might lead to even more subtle issues by dropping the
memory clobber. For example (and this is actually done in
sha256_transform() as well, though the zero'ing simply doesn't work with
any compiler, as it's missing the barrier_data()'s):

unsigned long x, y;
... do something secret with x/y ...
x = y = 0;
barrier_data(&x, sizeof(x));
barrier_data(&y, sizeof(y));
return;

Since x is not used after its barrier_data(), I think the compiler would
be within its rights to turn that into:

xorl %eax, %eax
leaq -16(%rbp), %rdx // &x == -16(%rbp)
movq %eax, (%rdx) // x = 0;
// inline asm for barrier_data(&x, sizeof(x));
movq %eax, (%rdx) // y = 0; (!)
// inline asm for barrier_data(&y, sizeof(y));

which saves one instruction by putting y at the same location as x, once
x is dead.

With a memory clobber, the compiler has to keep x and y at different
addresses, since the first barrier_data() might have saved the address
of x.