Re: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff

From: Vineet Gupta
Date: Mon Aug 03 2015 - 09:02:07 EST


On Monday 03 August 2015 05:11 PM, Peter Zijlstra wrote:
> On Mon, Aug 03, 2015 at 03:33:07PM +0530, Vineet Gupta wrote:
>> +#define SCOND_FAIL_RETRY_VAR_DEF \
>> + unsigned int delay = 1, tmp; \
>> +
>> +#define SCOND_FAIL_RETRY_ASM \
>> + " bz 4f \n" \
>> + " ; --- scond fail delay --- \n" \
>> + " mov %[tmp], %[delay] \n" /* tmp = delay */ \
>> + "2: brne.d %[tmp], 0, 2b \n" /* while (tmp != 0) */ \
>> + " sub %[tmp], %[tmp], 1 \n" /* tmp-- */ \
>> + " asl %[delay], %[delay], 1 \n" /* delay *= 2 */ \
>> + " b 1b \n" /* start over */ \
>> + "4: ; --- success --- \n" \
>> +
>> +#define SCOND_FAIL_RETRY_VARS \
>> + ,[delay] "+&r" (delay),[tmp] "=&r" (tmp) \
>> +
>> +#define ATOMIC_OP(op, c_op, asm_op) \
>> +static inline void atomic_##op(int i, atomic_t *v) \
>> +{ \
>> + unsigned int val, delay = 1, tmp; \
> Maybe use your SCOND_FAIL_RETRY_VAR_DEF ?

Right - not sure how I missed that !

>
>> + \
>> + __asm__ __volatile__( \
>> + "1: llock %[val], [%[ctr]] \n" \
>> + " " #asm_op " %[val], %[val], %[i] \n" \
>> + " scond %[val], [%[ctr]] \n" \
>> + " \n" \
>> + SCOND_FAIL_RETRY_ASM \
>> + \
>> + : [val] "=&r" (val) /* Early clobber to prevent reg reuse */ \
>> + SCOND_FAIL_RETRY_VARS \
>> + : [ctr] "r" (&v->counter), /* Not "m": llock only supports reg direct addr mode */ \
>> + [i] "ir" (i) \
>> + : "cc"); \
>> +} \
>> +
>> +#define ATOMIC_OP_RETURN(op, c_op, asm_op) \
>> +static inline int atomic_##op##_return(int i, atomic_t *v) \
>> +{ \
>> + unsigned int val, delay = 1, tmp; \
> Idem.

OK !

>> + \
>> + /* \
>> + * Explicit full memory barrier needed before/after as \
>> + * LLOCK/SCOND thmeselves don't provide any such semantics \
>> + */ \
>> + smp_mb(); \
>> + \
>> + __asm__ __volatile__( \
>> + "1: llock %[val], [%[ctr]] \n" \
>> + " " #asm_op " %[val], %[val], %[i] \n" \
>> + " scond %[val], [%[ctr]] \n" \
>> + " \n" \
>> + SCOND_FAIL_RETRY_ASM \
>> + \
>> + : [val] "=&r" (val) \
>> + SCOND_FAIL_RETRY_VARS \
>> + : [ctr] "r" (&v->counter), \
>> + [i] "ir" (i) \
>> + : "cc"); \
>> + \
>> + smp_mb(); \
>> + \
>> + return val; \
>> +}
>> +#define SCOND_FAIL_RETRY_VAR_DEF \
>> + unsigned int delay, tmp; \
>> +
>> +#define SCOND_FAIL_RETRY_ASM \
>> + " ; --- scond fail delay --- \n" \
>> + " mov %[tmp], %[delay] \n" /* tmp = delay */ \
>> + "2: brne.d %[tmp], 0, 2b \n" /* while (tmp != 0) */ \
>> + " sub %[tmp], %[tmp], 1 \n" /* tmp-- */ \
>> + " asl %[delay], %[delay], 1 \n" /* delay *= 2 */ \
>> + " b 1b \n" /* start over */ \
>> + " \n" \
>> + "4: ; --- done --- \n" \
>> +
>> +#define SCOND_FAIL_RETRY_VARS \
>> + ,[delay] "=&r" (delay), [tmp] "=&r" (tmp) \
> This is looking remarkably similar to the previous ones, why not a
> shared header?

I thought about it when duplicating the code - however it seemed that readability
was better if code was present in same file, rather than having to look up in a
different header with no context at all.

Plus there are some subtle differences in two when looked closely. Basically
spinlocks need the reset to 1 quirk which atomics don't which means we need the
delay reset to 1 in spinlock inline asm (and a different inline asm constraint).
Plus for atomics, the success branch (bz 4f) is folded away into the macro while
we can't for lock try routines, as that branch uses a delay slot. Agreed that all
of this is in the micro-optim realm, but I suppose worth when u have a 10 stage
pipeline.


>> +static inline void arch_spin_lock(arch_spinlock_t *lock)
>> +{
>> + unsigned int val;
>> + SCOND_FAIL_RETRY_VAR_DEF;
>> +
>> + smp_mb();
>> +
>> + __asm__ __volatile__(
>> + "0: mov %[delay], 1 \n"
>> + "1: llock %[val], [%[slock]] \n"
>> + " breq %[val], %[LOCKED], 1b \n" /* spin while LOCKED */
>> + " scond %[LOCKED], [%[slock]] \n" /* acquire */
>> + " bz 4f \n" /* done */
>> + " \n"
>> + SCOND_FAIL_RETRY_ASM
> But,... in the case that macro is empty, the label 4 does not actually
> exist. I see no real reason for this to be different from the previous
> incarnation either.

Per current code, the macro is never empty. I initially wrote it to have one
version of routines with different macro definition but then it was getting
terribly difficult to follow so I resorted to duplicating all the routines, with
macros to kind of compensate for duplication by factoring out common code in
duplicated code :-)

for locks, I can again fold the the bz into macro, but then we can't use the delay
slot in try versions !
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/