Re: [PATCH v1 2/2] perf auxtrace: Optimize barriers with load-acquire and store-release

From: Peter Zijlstra
Date: Tue Jun 01 2021 - 02:48:51 EST


On Mon, May 31, 2021 at 10:03:33PM +0300, Adrian Hunter wrote:
> On 31/05/21 6:10 pm, Leo Yan wrote:
> > Hi Peter, Adrian,
> >
> > On Wed, May 19, 2021 at 10:03:19PM +0800, Leo Yan wrote:
> >> Load-acquire and store-release are one-way permeable barriers, which can
> >> be used to guarantee the memory ordering between accessing the buffer
> >> data and the buffer's head / tail.
> >>
> >> This patch optimizes the memory ordering with the load-acquire and
> >> store-release barriers.
> >
> > Is this patch okay for you?
> >
> > Besides this patch, I have an extra question. You could see for
> > accessing the AUX buffer's head and tail, it also support to use
> > compiler build-in functions for atomicity accessing:
> >
> > __sync_val_compare_and_swap()
> > __sync_bool_compare_and_swap()
> >
> > Since now we have READ_ONCE()/WRITE_ONCE(), do you think we still need
> > to support __sync_xxx_compare_and_swap() atomicity?
>
> I don't remember, but it seems to me atomicity is needed only
> for a 32-bit perf running with a 64-bit kernel.

But that:

do {
old_tail = __sync_val_compare_and_swap(&pc->aux_tail, 0, 0);
} while (!__sync_bool_compare_and_swap(&pc->aux_tail, old_tail, tail));

doesn't want to make sense to me. It appears to do a cmpxchg with 0
values to load old_tail, and then a further cmpxchg with old_tail to
write the new tail.

That's some really crazy code. That makes absolutely no sense what so
ever and needs to just be deleted.

On top of that, it uses atomic instrincs for a u64, which is not
something that'll actually work on a whole bunch of 32bit platforms.