Re: [PATCH] locking/memory-barriers.txt: Improve documentation for writel() usage

From: Arnd Bergmann
Date: Fri Sep 23 2022 - 14:40:47 EST


On Fri, Sep 23, 2022, at 5:55 PM, Parav Pandit wrote:
>> From: Arnd Bergmann <arnd@xxxxxxxx>
>> Sent: Friday, September 16, 2022 12:09 AM

>> > And I sort of see above pattern in two drivers, and it is not good.
>> > It ends up doing dsb(st) on arm64, while needed barrier is only
>> > dmb(oshst).
>> >
>> > So to fix those two drivers, it is better to first avoid wmb()
>> > documentation reference when referring to writel().
>>
>> Yes, this suggestion is correct. On x86 and a few others, I think it's even
>> worse when wmb() is an expensive barrier, while writel() is the same as
>> writel_relaxed() and the barrier is implied by the MMIO access.
>>
>> It might help to spell this out and say that writel() is always preferred over
>> wmb()+writel_relaxed().
>>
> True.
>
>> Site note: there are several other problems with wmb()+__raw_writel(),
>> which on many architectures does not guarantee any atomicity of the access
>> (a word store could get split into four byte stores), breaks endianess
>> assumptions and may still not provide the correct barrier semantics.
>>
> Hmm. So far didn't observe this on arm64, x86_64, ppc64 yet.
> May be because the address is aligned to 8 bytes, we don't see the byte stores?

It's complicated. On some architectures (but not the ones you list),
__raw_writel() is a pointer dereference, so the compiler is allowed
to use whichever instruction it wants. Depending on the CPU it
is building for, gcc can decide to split up those stores when it
sees a pointer that was assigned from pointer with lesser alignment
(which is undefined behavior in C). If the pointer is actually
unaligned but gcc does not see this, then powerpc and arm will
trigger an alignment exception for an MMIO location even on CPUs
that can work with unaligned data on normal RAM.

> mlx5_write64() variant to use writeX() and avoid wmb() post the
> documentation update is good start.

Ok, fair enough, as long as the loop function is not timing
critical itself.

Arnd