Re: [PATCH v2 02/11] this_cpu_cmpxchg: ARM64: switch this_cpu_cmpxchg to locked, add _local function

From: Marcelo Tosatti
Date: Thu Mar 02 2023 - 17:18:54 EST


On Thu, Mar 02, 2023 at 03:53:12PM -0500, Peter Xu wrote:
> On Thu, Feb 09, 2023 at 12:01:52PM -0300, Marcelo Tosatti wrote:
> > Goal is to have vmstat_shepherd to transfer from
> > per-CPU counters to global counters remotely. For this,
> > an atomic this_cpu_cmpxchg is necessary.
> >
> > Following the kernel convention for cmpxchg/cmpxchg_local,
> > change ARM's this_cpu_cmpxchg_ helpers to be atomic,
> > and add this_cpu_cmpxchg_local_ helpers which are not atomic.
>
> I can follow on the necessity of having the _local version, however two
> questions below.
>
> >
> > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
> >
> > Index: linux-vmstat-remote/arch/arm64/include/asm/percpu.h
> > ===================================================================
> > --- linux-vmstat-remote.orig/arch/arm64/include/asm/percpu.h
> > +++ linux-vmstat-remote/arch/arm64/include/asm/percpu.h
> > @@ -232,13 +232,23 @@ PERCPU_RET_OP(add, add, ldadd)
> > _pcp_protect_return(xchg_relaxed, pcp, val)
> >
> > #define this_cpu_cmpxchg_1(pcp, o, n) \
> > - _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> > + _pcp_protect_return(cmpxchg, pcp, o, n)
> > #define this_cpu_cmpxchg_2(pcp, o, n) \
> > - _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> > + _pcp_protect_return(cmpxchg, pcp, o, n)
> > #define this_cpu_cmpxchg_4(pcp, o, n) \
> > - _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> > + _pcp_protect_return(cmpxchg, pcp, o, n)
> > #define this_cpu_cmpxchg_8(pcp, o, n) \
> > + _pcp_protect_return(cmpxchg, pcp, o, n)
>
> This makes this_cpu_cmpxchg_*() not only non-local, but also (especially
> for arm64) memory barrier implications since cmpxchg() has a strong memory
> barrier, while the old this_cpu_cmpxchg*() doesn't have, afaiu.
>
> Maybe it's not a big deal if the audience of this helper is still limited
> (e.g. we can add memory barriers if we don't want strict ordering
> implication), but just to check with you on whether it's intended, and if
> so whether it may worth some comments.

It happens that on ARM-64 cmpxchg_local == cmpxchg_relaxed.

See cf10b79a7d88edc689479af989b3a88e9adf07ff.

This patchset maintains the current behaviour
of this_cpu_cmpxch (for this_cpu_cmpxch_local), which was:

#define this_cpu_cmpxchg_1(pcp, o, n) \
- _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+ _pcp_protect_return(cmpxchg, pcp, o, n)
#define this_cpu_cmpxchg_2(pcp, o, n) \
- _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+ _pcp_protect_return(cmpxchg, pcp, o, n)
#define this_cpu_cmpxchg_4(pcp, o, n) \
- _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
+ _pcp_protect_return(cmpxchg, pcp, o, n)
#define this_cpu_cmpxchg_8(pcp, o, n) \
+ _pcp_protect_return(cmpxchg, pcp, o, n)

> > +
> > +#define this_cpu_cmpxchg_local_1(pcp, o, n) \
> > _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> > +#define this_cpu_cmpxchg_local_2(pcp, o, n) \
> > + _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> > +#define this_cpu_cmpxchg_local_4(pcp, o, n) \
> > + _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
> > +#define this_cpu_cmpxchg_local_8(pcp, o, n) \
> > + _pcp_protect_return(cmpxchg_relaxed, pcp, o, n)
>
> I think cmpxchg_relaxed()==cmpxchg_local() here for aarch64, however should
> we still use cmpxchg_local() to pair with this_cpu_cmpxchg_local_*()?

Since cmpxchg_local = cmpxchg_relaxed, seems like this is not necessary.

> Nothing about your patch along since it was the same before, but I'm
> wondering whether this is a good time to switchover.

I would say that another patch is more appropriate to change this,
if desired.

> The other thing is would it be good to copy arch-list for each arch patch?
> Maybe it'll help to extend the audience too.

Yes, should have done that (or CC each individual maintainer). Will do
on next version.

Thanks.