Re: Can you help us on memory barrier usage? (was Re: [PATCH v4 4/6] mm: swap: Allow storage of all mTHP orders)
From: Ryan Roberts
Date: Tue Mar 26 2024 - 13:10:33 EST
On 25/03/2024 03:16, Huang, Ying wrote:
> "Paul E. McKenney" <paulmck@xxxxxxxxxx> writes:
>
>> On Sat, Mar 23, 2024 at 11:11:09AM +0900, Akira Yokosawa wrote:
>>> [Use Paul's reachable address in CC;
>>> trimmed CC list, keeping only those who have responded so far.]
>>>
>>> Hello Huang,
>>> Let me chime in.
>>>
>>> On Fri, 22 Mar 2024 06:19:52 -0700, Huang, Ying wrote:
>>>> Hi, Paul,
>>>>
>>>> Can you help us on WRITE_ONCE()/READ_ONCE()/barrier() usage as follows?
>>>> For some example kernel code as follows,
>>>>
>>>> "
>>>> unsigned char x[16];
>>>>
>>>> void writer(void)
>>>> {
>>>> memset(x, 1, sizeof(x));
>>>> /* To make memset() take effect ASAP */
>>>> barrier();
>>>> }
>>>>
>>>> unsigned char reader(int n)
>>>> {
>>>> return READ_ONCE(x[n]);
>>>> }
>>>> "
>>>>
>>>> where, writer() and reader() may be called on 2 CPUs without any lock.
>>>> It's acceptable for reader() to read the written value a little later.
>>
>> What are your consistency requirements? For but one example, if reader(3)
>> gives the new value, is it OK for a later call to reader(2) to give the
>> old value?
>
> writer() will be called with a lock held (sorry, my previous words
> aren't correct here). After the racy checking in reader(), we will
> acquire the lock and check "x[n]" again to confirm. And, there are no
> dependencies between different "n". All in all, we can accept almost
> all races between writer() and reader().
>
> My question is, if there are some operations between writer() and
> unlocking in its caller, whether does barrier() in writer() make any
> sense? Make write instructions appear a little earlier in compiled
> code? Mark the memory may be read racy? Or doesn't make sense at all?
A compiler barrier is neccessary but not sufficient to guarrantee that the
stores become visible to the reader; you would also need a memory barrier to
stop the HW from reordering IIUC. So I really fail to see the value of adding
barrier().
As you state above there is no correctness issue here. Its just a question of
whether the barrier() can make the store appear earlier to the reader for a
(micro!) performance optimization. You'll get both the compiler and memory
barrier from the slightly later spin_unlock(). The patch that added the original
WRITE_ONCE() was concerned with squashing kcsan warnings, not with performance
optimization. (And the addition of the WRITE_ONCE() wasn't actually needed to
achieve the aim).
So I'm planning to repost my series (hopefully tomorrow) without the barrier()
present, unless you still want to try to convince me that it is useful.
Thanks,
Ryan
>
>> Until we know what your requirements are, it is hard to say whether the
>> above code meets those requirements. In the meantime, I can imagine
>> requirements that it meets and others that it does not.
>>
>> Also, Akira's points below are quite important.
>
> Replied for his email.
>
> --
> Best Regards,
> Huang, Ying