Re: [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'

From: Ingo Molnar
Date: Tue Mar 15 2016 - 08:37:30 EST



* Ingo Molnar <mingo@xxxxxxxxxx> wrote:

>
> * Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> > But IMHO this really highlights a fundamental weakness of all this macro magic,
> > it's all way too fragile.
> >
> > Why don't we introduce a boring family of APIs:
> >
> > cmpxchg_8()
> > cmpxchg_16()
> > cmpxchg_32()
> > cmpxchg_64()
> >
> > xchg_or_32()
> > xchg_or_64()
> > ...
> >
> > ... with none of this pesky auto-typing property and none of the
> > macro-inside-a-macro crap? We could do clean types and would write them all in
> > proper C, not fragile CPP.
> >
> > It's not like we migrate between the types all that frequently - and even if we
> > do, it's trivial.
> >
> > hm?
>
> So if we are still on the same page at this point, we'd have to add a pointer
> variant too I suspect:
>
> cmpxchg_ptr()
> xchg_ptr()
>
> ... whose bitness may differ between architectures(subarches), but it would still
> be a single variant per architecture, i.e. still with pretty clear type
> propagation and with a very clear notion of which architecture supports what.
>
> It looks like a lot of work, but it's all low complexity work AFAICS that could be
> partly automated.

Btw., if we do all this, we could still add auto-type API variants, but now they
would be implemented at the highest level, with none of the auto-type complexity
pushed down to the architecture level. Architectures just provide their set of
APIs for a given list of types, and that's it.

I hate to see all the auto-typing complexity pushed down to the arch assembly
level:

/*
* Atomic compare and exchange. Compare OLD with MEM, if identical,
* store NEW in MEM. Return the initial value in MEM. Success is
* indicated by comparing RETURN with OLD.
*/
#define __raw_cmpxchg(ptr, old, new, size, lock) \
({ \
__typeof__(*(ptr)) __ret; \
__typeof__(*(ptr)) __old = (old); \
__typeof__(*(ptr)) __new = (new); \
switch (size) { \
case __X86_CASE_B: \
{ \
volatile u8 *__ptr = (volatile u8 *)(ptr); \
asm volatile(lock "cmpxchgb %2,%1" \
: "=a" (__ret), "+m" (*__ptr) \
: "q" (__new), "0" (__old) \
: "memory"); \
break; \
} \
case __X86_CASE_W: \
{ \
volatile u16 *__ptr = (volatile u16 *)(ptr); \
asm volatile(lock "cmpxchgw %2,%1" \
: "=a" (__ret), "+m" (*__ptr) \
: "r" (__new), "0" (__old) \
: "memory"); \
break; \
} \
case __X86_CASE_L: \
{ \
volatile u32 *__ptr = (volatile u32 *)(ptr); \
asm volatile(lock "cmpxchgl %2,%1" \
: "=a" (__ret), "+m" (*__ptr) \
: "r" (__new), "0" (__old) \
: "memory"); \
break; \
} \
case __X86_CASE_Q: \
{ \
volatile u64 *__ptr = (volatile u64 *)(ptr); \
asm volatile(lock "cmpxchgq %2,%1" \
: "=a" (__ret), "+m" (*__ptr) \
: "r" (__new), "0" (__old) \
: "memory"); \
break; \
} \
default: \
__cmpxchg_wrong_size(); \
} \
__ret; \
})

it makes things harder to read, harder to debug and harder to optimize.

Thanks,

Ingo