Re: [PATCH v1 2/2] perf auxtrace: Optimize barriers with load-acquire and store-release

From: Peter Zijlstra
Date: Tue Jun 01 2021 - 02:59:00 EST

On Tue, Jun 01, 2021 at 02:33:42PM +0800, Leo Yan wrote:
> 32-bit perf wants to access 64-bit value atomically, I think it tries to
> avoid the issue caused by scenario:
> CPU0 (64-bit kernel) CPU1 (32-bit user)
> read head_lo
> WRITE_ONCE(head)
> read head_hi

Right; so I think Mark and me once spend a bunch of time on this for the
regular ring buffer, but my memory is vague.

It was supposed to be that the high word would always be zero on 32bit,
but it turns out that that is not in fact the case and we get to have
this race that's basically unfixable :/

Or maybe that was only the compat case.. Ah yes, so see the kernel uses
unsigned long, so on 32bit the high word is empty and we always
read/write 0s, unless you're explicitly doing daft things.

But on compat, the high word can be non-zero and we get to have 'fun'.