Re: [PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support forlinux guests running on KVM hypervisor

From: Raghavendra K T
Date: Tue Jul 16 2013 - 14:39:24 EST

Next message: Willy Tarreau: "Re: [Ksummit-2013-discuss] When to push bug fixes to mainline"
Previous message: Srinivas Pandruvada: "Re: [PATCH] Thermal: Fix lockup of cpu_down()"
In reply to: Raghavendra K T: "Re: [PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support forlinux guests running on KVM hypervisor"
Next in thread: Gleb Natapov: "Re: [PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support forlinux guests running on KVM hypervisor"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 07/16/2013 11:32 AM, Gleb Natapov wrote:

On Tue, Jul 16, 2013 at 09:07:53AM +0530, Raghavendra K T wrote:
On 07/15/2013 04:06 PM, Gleb Natapov wrote:
On Mon, Jul 15, 2013 at 03:20:06PM +0530, Raghavendra K T wrote:
On 07/14/2013 06:42 PM, Gleb Natapov wrote:
On Mon, Jun 24, 2013 at 06:13:42PM +0530, Raghavendra K T wrote:
kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

From: Srivatsa Vaddagiri <vatsa@xxxxxxxxxxxxxxxxxx>

trimming
[...]

+
+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
+{
+ struct kvm_lock_waiting *w;
+ int cpu;
+ u64 start;
+ unsigned long flags;
+
+ w = &__get_cpu_var(lock_waiting);
+ cpu = smp_processor_id();
+ start = spin_time_start();
+
+ /*
+ * Make sure an interrupt handler can't upset things in a
+ * partially setup state.
+ */
+ local_irq_save(flags);
+
+ /*
+ * The ordering protocol on this is that the "lock" pointer
+ * may only be set non-NULL if the "want" ticket is correct.
+ * If we're updating "want", we must first clear "lock".
+ */
+ w->lock = NULL;
+ smp_wmb();
+ w->want = want;
+ smp_wmb();
+ w->lock = lock;
+
+ add_stats(TAKEN_SLOW, 1);
+
+ /*
+ * This uses set_bit, which is atomic but we should not rely on its
+ * reordering gurantees. So barrier is needed after this call.
+ */
+ cpumask_set_cpu(cpu, &waiting_cpus);
+
+ barrier();
+
+ /*
+ * Mark entry to slowpath before doing the pickup test to make
+ * sure we don't deadlock with an unlocker.
+ */
+ __ticket_enter_slowpath(lock);
+
+ /*
+ * check again make sure it didn't become free while
+ * we weren't looking.
+ */
+ if (ACCESS_ONCE(lock->tickets.head) == want) {
+ add_stats(TAKEN_SLOW_PICKUP, 1);
+ goto out;
+ }
+
+ /* Allow interrupts while blocked */
+ local_irq_restore(flags);
+

So what happens if an interrupt comes here and an interrupt handler
takes another spinlock that goes into the slow path? As far as I see
lock_waiting will become overwritten and cpu will be cleared from
waiting_cpus bitmap by nested kvm_lock_spinning(), so when halt is
called here after returning from the interrupt handler nobody is going
to wake this lock holder. Next random interrupt will "fix" it, but it
may be several milliseconds away, or never. We should probably check
if interrupt were enabled and call native_safe_halt() here.

Okay you mean something like below should be done.
if irq_enabled()
native_safe_halt()
else
halt()

It is been a complex stuff for analysis for me.

So in our discussion stack would looking like this.

spinlock()
kvm_lock_spinning()
<------ interrupt here
halt()

From the halt if we trace

It is to early to trace the halt since it was not executed yet. Guest
stack trace will look something like this:

spinlock(a)
kvm_lock_spinning(a)
lock_waiting = a
set bit in waiting_cpus
<------ interrupt here
spinlock(b)
kvm_lock_spinning(b)
lock_waiting = b
set bit in waiting_cpus
halt()
unset bit in waiting_cpus
lock_waiting = NULL
----------> ret from interrupt
halt()

Now at the time of the last halt above lock_waiting == NULL and
waiting_cpus is empty and not interrupt it pending, so who will unhalt
the waiter?

Yes. if an interrupt occurs between
local_irq_restore() and halt(), this is possible. and since this is
rarest of rare (possiility of irq entering slowpath and then no
random irq to do spurious wakeup), we had never hit this problem in
the past.

I do not think it is very rare to get interrupt between
local_irq_restore() and halt() under load since any interrupt that
occurs between local_irq_save() and local_irq_restore() will be delivered
immediately after local_irq_restore(). Of course the chance of no other
random interrupt waking lock waiter is very low, but waiter can sleep
for much longer then needed and this will be noticeable in performance.

Yes, I meant the entire thing. I did infact turned WARN on w->lock==null before halt() [ though we can potentially have irq right after that ], but did not hit so far.

BTW can NMI handler take spinlocks? If it can what happens if NMI is
delivered in a section protected by local_irq_save()/local_irq_restore()?

Had another idea if NMI, halts are causing problem until I saw PeterZ's reply similar to V2 of pvspinlock posted here:

https://lkml.org/lkml/2011/10/23/211

Instead of halt we started with a sleep hypercall in those
versions. Changed to halt() once Avi suggested to reuse existing sleep.

If we use older hypercall with few changes like below:

kvm_pv_wait_for_kick_op(flags, vcpu, w->lock )
{
// a0 reserved for flags
if (!w->lock)
return;
DEFINE_WAIT
...
end_wait
}

Only question is how to retry immediately with lock_spinning in
w->lock=null cases.

/me need to experiment that again perhaps to see if we get some benefit.

So I am,
1. trying to artificially reproduce this.

2. I replaced the halt with below code,
if (arch_irqs_disabled())
halt();

and ran benchmarks.
But this results in degradation because, it means we again go back
and spin in irq enabled case.

Yes, this is not what I proposed.

True.

3. Now I am analyzing the performance overhead of safe_halt in irq
enabled case.
if (arch_irqs_disabled())
halt();
else
safe_halt();

Use of arch_irqs_disabled() is incorrect here.

Oops! sill me.

If you are doing it before

local_irq_restore() it will always be false since you disabled interrupt
yourself,

This was not the case. but latter is the one I missed.

if you do it after then it is to late since interrupt can come

between local_irq_restore() and halt() so enabling interrupt and halt
are still not atomic. You should drop local_irq_restore() and do

if (arch_irqs_disabled_flags(flags))
halt();
else
safe_halt();

instead.

Yes, I tested with below as suggested:

//local_irq_restore(flags);

/* halt until it's our turn and kicked. */
if (arch_irqs_disabled_flags(flags))
halt();
else
safe_halt();

//local_irq_save(flags);
I am seeing only a slight overhead, but want to give a full run to
check the performance.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Willy Tarreau: "Re: [Ksummit-2013-discuss] When to push bug fixes to mainline"
Previous message: Srinivas Pandruvada: "Re: [PATCH] Thermal: Fix lockup of cpu_down()"
In reply to: Raghavendra K T: "Re: [PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support forlinux guests running on KVM hypervisor"
Next in thread: Gleb Natapov: "Re: [PATCH RFC V10 15/18] kvm : Paravirtual ticketlocks support forlinux guests running on KVM hypervisor"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]