Potentially missing "memory" clobbers in bitops.h for x86

From: Alexander Potapenko
Date: Thu Mar 28 2019 - 10:14:27 EST


Hello,

arch/x86/include/asm/bitops.h defines clear_bit(nr, addr) for
non-constant |nr| values as follows:

void clear_bit(long nr, volatile unsigned long *addr) {
asm volatile("lock; btr %1,%0"
: "+m"(*(volatile long *)addr)
: "Ir" (nr));
}
(https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/bitops.h#L111)

According to the comments in the file, |nr| may be arbitrarily large.
However the assembly constraints only imply that the first unsigned
long value at |addr| is written to.
This may result in the compiler ignoring the effect of the asm directive.

Consider the following example (https://godbolt.org/z/naTmjn):

#include <stdio.h>
void clear_bit(long nr, volatile unsigned long *addr) {
asm volatile("lock; btr %1,%0"
: "+m"(*(volatile long *)addr)
: "Ir" (nr));
}

unsigned long foo() {
unsigned long addr[2] = {1, 2};
clear_bit(65, addr);
return addr[0] + addr[1];
}

int main() {
printf("foo: %lu\n", foo());
}

Depending on the optimization level, the program may print either 1
(for -O0 and -O1) or 3 (for -O2 and -O3).
This is because on higher optimization levels GCC assumes that addr[1]
is unchanged and directly propagates the constant to the result.

I suspect the definitions of clear_bit() and similar functions are
lacking the "memory" clobber.
But the whole file tends to be very picky about whether this clobber
needs to be applied in each case, so in the case of a performance
penalty we may need to consider alternative approaches to fixing this
code.


--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-StraÃe, 33
80636 MÃnchen

GeschÃftsfÃhrer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg