Re: KCSAN: data-race in __alloc_file / __alloc_file

From: Alan Stern
Date: Sat Nov 09 2019 - 18:08:35 EST


On Fri, 8 Nov 2019, Linus Torvalds wrote:

> On Fri, Nov 8, 2019 at 1:57 PM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Can we please agree to call these writes something other than
> > "idempotent"? After all, any write to normal memory is idempotent in
> > the sense that doing it twice has the same effect as doing it once
> > (ignoring possible races, of course).
>
> No!
>
> You're completely missing the point.
>
> Two writes to normal memory are *not* idempotent if they write
> different values. The ordering very much matters, and it's racy and a
> tool should complain.

What you have written strongly indicates that either you think the word
"idempotent" means something different from its actual meaning or else
you are misusing the word in a very confusing way.

Your text seems to say that two operations are idempotent if they have
the same effect. Compare that to the definition from Wikipedia (not
the best in the world, perhaps, but adequate for this discussion):

Idempotence is the property of certain operations in
mathematics and computer science whereby they can be applied
multiple times without changing the result beyond the initial
application.

For example, writing 5 to x is idempotent because performing that write
multiple times will have the same result as performing it only once.
Likewise for any write to normal memory.

Therefore talking about idempotent writes as somehow being distinct
from other writes does not make sense. Nor does it make sense to say
"Two writes to normal memory are *not* idempotent if they write
different values", because idempotence is a property of individual
operations, not of pairs of operations.

You should use a different word, because what you mean is different
from what "idempotent" actually means.

> But the point of WRITE_IDEMPOTENT() is that it *always* writes the
> same value, so it doesn't matter if two different writers race on it.
>
> So it really is about being idempotent.

Okay, I agree that I did completely miss the point of what you
originally meant. But what you meant wasn't "being idempotent".

> > A better name would be "write-if-different" or "write-if-changed"
>
> No.
>
> Again, you're totally missing the point.
>
> It's not about "write-if-different".
>
> It's about idempotent writes.

Assuming you really are talking about writes (presumably in different
threads) that store the same value to the same location: Yes, two such
writes do not in practice race with each other -- even though they may
satisfy the formal definition of a data race.

On the other hand, they may each race with other accesses.

> But if you do know that all the possible racing writes are idempotent,
> then AS A POSSIBLE CACHE OPTIMIZATION, you can then say "only do this
> write if somebody else didn't already do it".

In fact, you can _always_ perform that possible optimization. Whether
it will be worthwhile is a separate matter...

> But that's a side effect of being idempotent, not the basic rule! And
> it's not necessarily obviously an optimization at all, because the
> cacheline may already be dirty, and checking the old value and having
> a conditional on it may be much more expensive than just writing the
> new value./

Agreed.

> The point is that certain writes DO NOT CARE ABOUT ORDERING, because
> they may be setting a sticky flag (or stickily clearing a flag, as in
> this case). The ordering doesn't matter, because the operation is
> idempotent.

Ah, here you use the word correctly.

> That's what "idempotent" means. You can do it once, or a hundred
> times, and the end result is the same (but is different from not doing
> it at all).

Precisely the point I make above.

> And no, not all writes are idempotent. That's just crazy talk. Writes
> have values.

By "writes have values", do you mean that writes store values? Of
course they do. But it is clear from what you wrote just above that
all writes _are_ idempotent, because doing a write once or a hundred
times will yield the same end result.

On a more productive note, do you think we should change the LKMM so
that it won't consider two writes to race with each other if they store
the same value?

Alan Stern