Re: [PATCH RFC cmpxchg 3/8] ARC: Emulate one-byte and two-byte cmpxchg

From: Paul E. McKenney
Date: Tue Apr 02 2024 - 16:52:18 EST


On Tue, Apr 02, 2024 at 10:06:14AM -0700, Paul E. McKenney wrote:
> On Tue, Apr 02, 2024 at 10:14:08AM +0200, Arnd Bergmann wrote:
> > On Mon, Apr 1, 2024, at 23:39, Paul E. McKenney wrote:
> > > Use the new cmpxchg_emu_u8() and cmpxchg_emu_u16() to emulate one-byte
> > > and two-byte cmpxchg() on arc.
> > >
> > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> >
> > I'm missing the context here, is it now mandatory to have 16-bit
> > cmpxchg() everywhere? I think we've historically tried hard to
> > keep this out of common code since it's expensive on architectures
> > that don't have native 16-bit load/store instructions (alpha, armv3)
> > and or sub-word atomics (armv5, riscv, mips).
>
> I need 8-bit, and just added 16-bit because it was easy to do so.
> I would be OK dropping the 16-bit portions of this series, assuming
> that no-one needs it. And assuming that it is easier to drop it than
> to explain why it is not available. ;-)
>
> > Does the code that uses this rely on working concurrently with
> > non-atomic stores to part of the 32-bit word? If we want to
> > allow that, we need to merge my alpha ev4/45/5 removal series
> > first.
>
> For 8-but cmpxchg(), yes. There are potentially concurrent
> smp_load_acquire() and smp_store_release() operations to this same byte.
>
> Or is your question specific to the 16-bit primitives? (Full disclosure:
> I have no objection to removing Alpha ev4/45/5, having several times
> suggested removing Alpha entirely. And having the scars to prove it.)
>
> > For the cmpxchg() interface, I would prefer to handle the
> > 8-bit and 16-bit versions the same way as cmpxchg64() and
> > provide separate cmpxchg8()/cmpxchg16()/cmpxchg32() functions
> > by architectures that operate on fixed-size integer values
> > but not compounds or pointers, and a generic cmpxchg() wrapper
> > in common code that can handle the abtraction for pointers,
> > long and (if absolutely necessary) compounds by multiplexing
> > between cmpxchg32() and cmpxchg64() where needed.
>
> So as to support _acquire(), _relaxed(), and _release()?
>
> If so, I don't have any use cases for other than full ordering.

Nor any use cases other than integers. (In case another thing you are
after here is good type-checking for non-integers combined with allowing
C-language implicit conversions for integers.)

Thanx, Paul

> > I did a prototype a few years ago and found that there is
> > probably under a dozen users of the sub-word atomics in
> > the tree, so this mostly requires changes to architecture
> > code and less to drivers and core code.
>
> Given this approach, the predominance of changes to architecture code
> seems quite likely to me.
>
> But do we really wish to invest that much work into architectures that
> might not be all that long for the world? (Quickly donning my old
> asbestos suit, the one with the tungsten pinstripes...)
>
> Thanx, Paul