Re: [PATCH 2/6] locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions

From: Mark Rutland
Date: Tue Apr 09 2024 - 12:34:49 EST


On Tue, Apr 09, 2024 at 02:50:19PM +0200, Uros Bizjak wrote:
> On Tue, Apr 9, 2024 at 2:03 PM Uros Bizjak <ubizjak@xxxxxxxxx> wrote:
> >
> > On Tue, Apr 9, 2024 at 1:13 PM Mark Rutland <mark.rutland@xxxxxxx> wrote:
> >
> > > > static __always_inline void arch_atomic64_and(s64 i, atomic64_t *v)
> > > > {
> > > > - s64 old, c = 0;
> > > > + s64 val = __READ_ONCE(v->counter);
> > >
> > > I reckon it's worth placing this in a helper with a big comment, e.g.
> > >
> > > static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> > > {
> > > /*
> > > * TODO: explain that this might be torn, but it occurs *once*, and can
> > > * safely be consumed by atomic64_try_cmpxchg().
> > > *
> > > * TODO: point to the existing commentary regarding why we use
> > > * __READ_ONCE() for KASAN reasons.
> > > */
> > > return __READ_ONCE(v->counter);
> > > }
> > >
> > > ... and then use that in each of the instances below.
> > >
> > > That way the subtlety is clearly documented, and it'd more clearly align with
> > > the x86_64 verions.
> >
> > This is an excellent idea. The separate definitions needs to be placed
> > in atomic64_32.h and atomic_64_64.h (due to use of atomic64_t
> > typedef), but it will allow the same unification of functions between
> > x64_32 and x64_64 as the approach with __READ_ONCE().
>
> Something like this:
>
> --cut here--
> /*
> * This function is intended to preload the value from atomic64_t
> * location in a non-atomic way. The read might be torn, but can
> * safely be consumed by the compare-and-swap loop.
> */
> static __always_inline s64 arch_atomic64_read_tearable(atomic64_t *v)
> {
> /*
> * See the comment in arch_atomic_read() on why we use
> * __READ_ONCE() instead of READ_ONCE_NOCHECK() here.
> */
> return __READ_ONCE(v->counter);
> }
> --cut here--
>
> Thanks,
> Uros.

Yeah, something of that shape.

Having thought for a bit longer, it's probably better to use '_torn' rather
than '_tearable' (i.e. name this arch_atomic64_read_torn()).

It'd be nice if we could specify the usage restrictions a bit more clearly,
since this can only be used for compare-and-swap loops that implement
unconditional atomics. (e.g. arch_atomic64_and(), but not
arch_atomic_add_unless()).

So I'd suggest:

/*
* Read an atomic64_t non-atomically.
*
* This is intended to be used in cases where a subsequent atomic operation
* will handle the torn value, and can be used to prime the first iteration of
* unconditional try_cmpxchg() loops, e.g.
*
* s64 val = arch_atomic64_read_torn(v);
* do { } while (!arch_atomic_try_cmpxchg(v, &val, val OP i);
*
* This is NOT safe to use where the value is not always checked by a
* subsequent atomic operation, such as in conditional try_cmpxchg() loops that
* can break before the atomic, e.g.
*
* s64 val = arch_atomic64_read_torn(v);
* do {
* if (condition(val))
* break;
* } while (!arch_atomic_try_cmpxchg(v, &val, val OP i);
*/
static __always_inline s64 arch_atomic64_read_torn(atomic64_t *v)
{
/* See comment in arch_atomic_read() */
return __READ_ONCE(v->counter);
}

Mark.