Re: [PATCH v6] hugetlb: Add hugetlb.*.numa_stat file

From: Marco Elver
Date: Tue Nov 16 2021 - 07:12:46 EST


On Mon, Nov 15, 2021 at 11:59AM -0800, Shakeel Butt wrote:
> On Mon, Nov 15, 2021 at 10:55 AM Mina Almasry <almasrymina@xxxxxxxxxx> wrote:
[...]
> > Sorry I'm still a bit confused. READ_ONCE/WRITE_ONCE isn't documented
> > to provide atomicity to the write or read, just prevents the compiler
> > from re-ordering them. Is there something I'm missing, or is the
> > suggestion to add READ_ONCE/WRITE_ONCE simply to supress the KCSAN
> > warnings?

It's actually the opposite: READ_ONCE/WRITE_ONCE provide very little
ordering (modulo dependencies) guarantees, which includes ordering by
compiler, but are supposed to provide atomicity (when used with properly
aligned types up to word size [1]; see __READ_ONCE for non-atomic
variant).

Some more background...

The warnings that KCSAN tells you about are "data races", which occur
when you have conflicting concurrent accesses, one of which is "plain"
and at least one write. I think [2] provides a reasonable summary of
data races and why we should care.

For Linux, our own memory model (LKMM) documents this [3], and says that
as long as concurrent operations are marked (non-plain; e.g. *ONCE),
there won't be any data races.

There are multiple reasons why data races are undesirable, one of which
is to avoid bad compiler transformations [4], because compilers are
oblivious to concurrency otherwise.

Why do marked operations avoid data races and prevent miscompiles?
Among other things, because they should be executed atomically. If they
weren't a lot of code would be buggy (there had been cases where the old
READ_ONCE could be used on data larger than word size, which certainly
weren't atomic, but this is no longer possible).

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/asm-generic/rwonce.h#n35
[2] https://lwn.net/Articles/816850/#Why%20should%20we%20care%20about%20data%20races?
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/explanation.txt#n1920
[4] https://lwn.net/Articles/793253/

Some rules of thumb when to use which marking:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/access-marking.txt

In an ideal world, we'd have all intentionally concurrent accesses
marked. As-is, KCSAN will find:

A. Data race, where failure due to current compilers is unlikely
(supposedly "benign"); merely marking the accesses appropriately is
sufficient. Finding a crash for these will require a miscompilation,
but otherwise look "benign" at the C-language level.

B. Race-condition bugs where the bug manifests as a data race, too --
simply marking things doesn't fix the problem. These are the types of
bugs where a data race would point out a more severe issue.

Right now we have way too much of type (A), which means looking for (B)
requires patience.

> +Paul & Marco
>
> Let's ask the experts.
>
> We have a "unsigned long usage" variable that is updated within a lock
> (hugetlb_lock) but is read without the lock.
>
> Q1) I think KCSAN will complain about it and READ_ONCE() in the
> unlocked read path should be good enough to silent KCSAN. So, the
> question is should we still use WRITE_ONCE() as well for usage within
> hugetlb_lock?

KCSAN's default config will forgive the lack of WRITE_ONCE().
Technically it's still a data race (which KCSAN can find with a config
change), but can be forgiven because compilers are less likely to cause
trouble for writes (background: https://lwn.net/Articles/816854/ bit
about "Unmarked writes (aligned and up to word size)...").

I would mark both if feasible, as it clearly documents the fact the
write can be read concurrently.

> Q2) Second question is more about 64 bit archs breaking a 64 bit write
> into two 32 bit writes. Is this a real issue? If yes, then the
> combination of READ_ONCE()/WRITE_ONCE() are good enough for the given
> use-case?

Per above, probably unlikely, but allowed. WRITE_ONCE should prevent it,
and at least relieve you to not worry about it (and shift the burden to
WRITE_ONCE's implementation).

Thanks,
-- Marco