Re: [PATCH] locking/osq: Use more optimized spinning for arm64
From: Waiman Long
Date: Fri Jan 10 2020 - 09:13:39 EST
On 1/10/20 5:06 AM, Peter Zijlstra wrote:
> On Thu, Jan 09, 2020 at 10:38:31AM -0500, Waiman Long wrote:
>
>> --- a/kernel/locking/osq_lock.c
>> +++ b/kernel/locking/osq_lock.c
>> @@ -134,6 +134,27 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>> * cmpxchg in an attempt to undo our queueing.
>> */
>>
>> + /*
>> + * If vcpu_is_preempted is not defined, we can skip the check
>> + * and use smp_cond_load_relaxed() instead. For arm64, this
>> + * could lead to the use of the more optimized wfe instruction.
>> + * As need_sched() is set by interrupt handler, it will break
>> + * out and do the unqueue in a timely manner.
>> + *
>> + * TODO: We may need to add a static_key like vcpu_is_preemptible
>> + * as vcpu_is_preempted() will always return false with
>> + * bare metal even if it is defined.
>> + */
>> +#ifndef vcpu_is_preempted
>> + {
>> + int locked = smp_cond_load_relaxed(&node->locked,
>> + VAL || need_resched());
>> + if (!locked)
>> + goto unqueue;
>> + return true;
>> + }
>> +#endif
> Much yuck :-/
>
> With ARM64 being the only arch that currently makes use of this; another
> approach is doing something like:
>
> That is also rather yuck, and definitely needs a few comments sprinked
> on it, but it should just work for everyone.
>
> It basically relies on an arch having a spinning *cond_load*()
> implementation if it has vcpu_is_preempted(), which is true today.
>
> ---
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 6ef600aa0f47..6e00d7c077ba 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -133,18 +133,10 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> * guaranteed their existence -- this allows us to apply
> * cmpxchg in an attempt to undo our queueing.
> */
> + if (!smp_cond_load_relaxed(&node->locked, VAL || need_resched() ||
> + vcpu_is_preempetd(node_cpu(node->prev))))
> + goto unqueue;
>
> - while (!READ_ONCE(node->locked)) {
> - /*
> - * If we need to reschedule bail... so we can block.
> - * Use vcpu_is_preempted() to avoid waiting for a preempted
> - * lock holder:
> - */
> - if (need_resched() || vcpu_is_preempted(node_cpu(node->prev)))
> - goto unqueue;
> -
> - cpu_relax();
> - }
> return true;
>
> unqueue:
>
Yes, that will work for now. We do need to document that in where
smp_cond_load_relaxed() is defined.
In the future, if vcpu_is_preempted() is defined for ARM64, it will
break. How about defining a variant like smp_cond_load_vcpu_relaxed(p,
cond, vcpu)? With that, we can make sure that the code will be properly
updated when vcpu_is_preempted() is defined for ARM64. I know it is
still kind of ugly, but it is a safer approach.
Cheers,
Longman