Re: [PATCH v1 1/2] perf auxtrace: Change to use SMP memory barriers

From: Adrian Hunter
Date: Thu May 27 2021 - 05:23:57 EST


On 27/05/21 11:25 am, Adrian Hunter wrote:
> On 27/05/21 11:11 am, Peter Zijlstra wrote:
>> On Thu, May 27, 2021 at 10:54:56AM +0300, Adrian Hunter wrote:
>>> On 19/05/21 5:03 pm, Leo Yan wrote:
>>>> The AUX ring buffer's head and tail can be accessed from multiple CPUs
>>>> on SMP system, so changes to use SMP memory barriers to replace the
>>>> uniprocessor barriers.
>>>
>>> I don't think user space should attempt to be SMP-aware.
>>
>> Uhh, what? It pretty much has to. Since userspace cannot assume UP, it
>> must assume SMP.
>
> Yeah that is what I meant, but consequently we generally shouldn't be
> using functions called smp_<anything>
>
>>
>>> For perf tools, on __x86_64__ it looks like smp_rmb() is only a compiler barrier, whereas
>>> rmb() is a "lfence" memory barrier instruction, so this patch does not
>>> seem to do what the commit message says at least for x86.
>>
>> The commit message is somewhat confused; *mb() are not UP barriers
>> (although they are available and useful on UP). They're device/dma
>> barriers.
>>
>>> With regard to the AUX area, we don't know in general how data gets there,
>>> so using memory barriers seems sensible.
>>
>> IIRC (but I ddn't check) the rule was that the kernel needs to ensure
>> the AUX area is complete before it updates the head pointer. So if
>> userspace can observe the head pointer, it must then also be able to
>> observe the data. This is not something userspace can fix up anyway.
>>
>> The ordering here is between the head pointer and the data, and from a
>> userspace perspective that's a regular smp ordering. Similar for the
>> tail update, that's between our reading the data and writing the tail,
>> regular cache coherent smp ordering.
>>
>> So ACK on the patch, it's sane and an optimization for both x86 and ARM.
>> Just the Changelog needs work.
>
> If all we want is a compiler barrier, then shouldn't that be what we use?
> i.e. barrier()

I guess you are saying we still need to stop potential re-ordering across
CPUs, so please ignore my comments.

>
>>
>>>> Signed-off-by: Leo Yan <leo.yan@xxxxxxxxxx>
>>>> ---
>>>> tools/perf/util/auxtrace.h | 6 +++---
>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
>>>> index 472c0973b1f1..8bed284ccc82 100644
>>>> --- a/tools/perf/util/auxtrace.h
>>>> +++ b/tools/perf/util/auxtrace.h
>>>> @@ -452,7 +452,7 @@ static inline u64 auxtrace_mmap__read_snapshot_head(struct auxtrace_mmap *mm)
>>>> u64 head = READ_ONCE(pc->aux_head);
>>>>
>>>> /* Ensure all reads are done after we read the head */
>>>> - rmb();
>>>> + smp_rmb();
>>>> return head;
>>>> }
>>>>
>>>> @@ -466,7 +466,7 @@ static inline u64 auxtrace_mmap__read_head(struct auxtrace_mmap *mm)
>>>> #endif
>>>>
>>>> /* Ensure all reads are done after we read the head */
>>>> - rmb();
>>>> + smp_rmb();
>>>> return head;
>>>> }
>>>>
>>>> @@ -478,7 +478,7 @@ static inline void auxtrace_mmap__write_tail(struct auxtrace_mmap *mm, u64 tail)
>>>> #endif
>>>>
>>>> /* Ensure all reads are done before we write the tail out */
>>>> - mb();
>>>> + smp_mb();
>>>> #if BITS_PER_LONG == 64 || !defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT)
>>>> pc->aux_tail = tail;
>>>> #else
>>>>
>>>
>