Re: [PATCH RFC cmpxchg 3/8] ARC: Emulate one-byte and two-byte cmpxchg
From: Arnd Bergmann
Date: Thu Apr 04 2024 - 07:58:21 EST
On Tue, Apr 2, 2024, at 19:06, Paul E. McKenney wrote:
> On Tue, Apr 02, 2024 at 10:14:08AM +0200, Arnd Bergmann wrote:
>> On Mon, Apr 1, 2024, at 23:39, Paul E. McKenney wrote:
>> > Use the new cmpxchg_emu_u8() and cmpxchg_emu_u16() to emulate one-byte
>> > and two-byte cmpxchg() on arc.
>> >
>> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
>>
>> I'm missing the context here, is it now mandatory to have 16-bit
>> cmpxchg() everywhere? I think we've historically tried hard to
>> keep this out of common code since it's expensive on architectures
>> that don't have native 16-bit load/store instructions (alpha, armv3)
>> and or sub-word atomics (armv5, riscv, mips).
>
> I need 8-bit, and just added 16-bit because it was easy to do so.
> I would be OK dropping the 16-bit portions of this series, assuming
> that no-one needs it. And assuming that it is easier to drop it than
> to explain why it is not available. ;-)
It certainly makes sense to handle both the same, whichever
way we do this.
>> Does the code that uses this rely on working concurrently with
>> non-atomic stores to part of the 32-bit word? If we want to
>> allow that, we need to merge my alpha ev4/45/5 removal series
>> first.
>
> For 8-but cmpxchg(), yes. There are potentially concurrent
> smp_load_acquire() and smp_store_release() operations to this same byte.
>
> Or is your question specific to the 16-bit primitives? (Full disclosure:
> I have no objection to removing Alpha ev4/45/5, having several times
> suggested removing Alpha entirely. And having the scars to prove it.)
For the native sub-word access, alpha ev5 cannot do either of
them, while armv3 could do byte access but not 16-bit words.
It sounds like the old alphas are already broken then, which
is a good reason to finally drop support. It would appear that
the arm riscpc is not affected by this though.
>> For the cmpxchg() interface, I would prefer to handle the
>> 8-bit and 16-bit versions the same way as cmpxchg64() and
>> provide separate cmpxchg8()/cmpxchg16()/cmpxchg32() functions
>> by architectures that operate on fixed-size integer values
>> but not compounds or pointers, and a generic cmpxchg() wrapper
>> in common code that can handle the abtraction for pointers,
>> long and (if absolutely necessary) compounds by multiplexing
>> between cmpxchg32() and cmpxchg64() where needed.
>
> So as to support _acquire(), _relaxed(), and _release()?
>
> If so, I don't have any use cases for other than full ordering.
My main goal here would be to simplify the definition of
the very commonly used cmpxchg() macro so it doesn't have
to deal with so many corner cases, and make it work the
same way across all architectures. Without the type
agnostic wrapper, there would also be the benefit of
additional type checking that we get by replacing the
macros with inline functions.
We'd still need all the combinations of cmpxchg() and xchg()
with the four sizes and ordering variants, but at least the
latter should easily collapse on most architectures. At the
moment, most architectures only provide the full-ordering
version.
Arnd