Re: [PATCH v4 07/11] READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses

From: Jann Horn
Date: Fri Apr 24 2020 - 12:32:07 EST


On Tue, Apr 21, 2020 at 5:15 PM Will Deacon <will@xxxxxxxxxx> wrote:
> {READ,WRITE}_ONCE() cannot guarantee atomicity for arbitrary data sizes.
> This can be surprising to callers that might incorrectly be expecting
> atomicity for accesses to aggregate structures, although there are other
> callers where tearing is actually permissable (e.g. if they are using
> something akin to sequence locking to protect the access).
[...]
> The slight snag is that we also have to support 64-bit accesses on 32-bit
> architectures, as these appear to be widespread and tend to work out ok
> if either the architecture supports atomic 64-bit accesses (x86, armv7)
> or if the variable being accesses represents a virtual address and
> therefore only requires 32-bit atomicity in practice.
>
> Take a step in that direction by introducing a variant of
> 'compiletime_assert_atomic_type()' and use it to check the pointer
> argument to {READ,WRITE}_ONCE(). Expose __{READ,WRITE}_ONCE() variants
> which are allowed to tear and convert the one broken caller over to the
> new macros.
[...]
> +/*
> + * Yes, this permits 64-bit accesses on 32-bit architectures. These will
> + * actually be atomic in many cases (namely x86), but for others we rely on

I don't think that's correct?


$ cat 32bit_volatile_qword_read_write.c
#include <pthread.h>
#include <err.h>
#include <stdio.h>
#include <stdint.h>

/* make sure it really has proper alignment */
__attribute__((aligned(8)))
volatile uint64_t shared_u64;

static void *thread_fn(void *dummy) {
while (1) {
uint64_t value = shared_u64;
if (value == 0xaaaaaaaaaaaaaaaaUL || value == 0xbbbbbbbbbbbbbbbbUL)
continue;
printf("got 0x%llx\n", (unsigned long long)value);
}
return NULL;
}

int main(void) {
printf("shared_u64 at %p\n", &shared_u64);
pthread_t thread;
if (pthread_create(&thread, NULL, thread_fn, NULL))
err(1, "pthread_create");

while (1) {
shared_u64 = 0xaaaaaaaaaaaaaaaaUL;
shared_u64 = 0xbbbbbbbbbbbbbbbbUL;
}
}
$ gcc -o 32bit_volatile_qword_read_write
32bit_volatile_qword_read_write.c -pthread -Wall
$ ./32bit_volatile_qword_read_write | head # as 64-bit binary
^C
$ gcc -m32 -o 32bit_volatile_qword_read_write
32bit_volatile_qword_read_write.c -pthread -Wall
$ ./32bit_volatile_qword_read_write | head # as 32-bit binary
shared_u64 at 0x56567030
got 0xaaaaaaaabbbbbbbb
got 0xbbbbbbbbaaaaaaaa
got 0xaaaaaaaabbbbbbbb
got 0xbbbbbbbbaaaaaaaa
got 0xbbbbbbbbaaaaaaaa
got 0xbbbbbbbbaaaaaaaa
got 0xbbbbbbbbaaaaaaaa
got 0xbbbbbbbbaaaaaaaa
got 0xbbbbbbbbaaaaaaaa
$


AFAIK 32-bit X86 code that wants to atomically load 8 bytes of memory
has to use CMPXCHG8B; and gcc won't generate such code just based on a
volatile load/store.