Re: [PATCH 0/2] powerpc/kvm: Enable running guests on RT Linux

From: Purcareata Bogdan
Date: Fri Mar 27 2015 - 13:08:29 EST

Next message: Konstantin Khlebnikov: "Re: [PATCH v2 3/4] mm, shmem: Add shmem resident memory accounting"
Previous message: Andreas Gruenbacher: "[RFC 02/39] uapi: Remove kernel internal declaration"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 27.02.2015 03:05, Scott Wood wrote:

On Thu, 2015-02-26 at 14:31 +0100, Sebastian Andrzej Siewior wrote:

On 02/26/2015 02:02 PM, Paolo Bonzini wrote:

On 24/02/2015 00:27, Scott Wood wrote:

This isn't a host PIC driver. It's guest PIC emulation, some of which
is indeed not suitable for a rawlock (in particular, openpic_update_irq
which loops on the number of vcpus, with a loop body that calls
IRQ_check() which loops over all pending IRQs).

The question is what behavior is wanted of code that isn't quite
RT-ready. What is preferred, bugs or bad latency?

If the answer is bad latency (which can be avoided simply by not running
KVM on a RT kernel in production), patch 1 can be applied. If the

can be applied *but* makes no difference if applied or not.

answer is bugs, patch 1 is not upstream material.

I myself prefer to have bad latency; if something takes a spinlock in
atomic context, that spinlock should be raw. If it hurts (latency),
don't do it (use the affected code).

The problem, that is fixed by this s/spin_lock/raw_spin_lock/, exists
only in -RT. There is no change upstream. In general we fix such things
in -RT first and forward the patches upstream if possible. This convert
thingy would be possible.
Bug fixing comes before latency no matter if RT or not. Converting
every lock into a rawlock is not always the answer.
Last thing I read from Scott is that he is not entirely sure if this is
the right approach or not and patch #1 was not acked-by him either.

So for now I wait for Scott's feedback and maybe a backtrace :)

Obviously leaving it in a buggy state is not what we want -- but I lean
towards a short term "fix" of putting "depends on !PREEMPT_RT" on the
in-kernel MPIC emulation (which is itself just an optimization -- you
can still use KVM without it). This way people don't enable it with RT
without being aware of the issue, and there's more of an incentive to
fix it properly.

I'll let Bogdan supply the backtrace.

So about the backtrace. Wasn't really sure how to "catch" this, so what I did was to start a 24 VCPUs guest on a 24 CPU board, and in the guest run 24 netperf flows with an external back to back board of the same kind. I assumed this would provide the sufficient VCPUs and external interrupt to expose an alleged culprit.

With regards to measuring the latency, I thought of using ftrace, specifically the preemptirqsoff latency histogram. Unfortunately, I wasn't able to capture any major differences between running a guest with in-kernel MPIC emulation (with the openpic raw_spinlock_conversion applied) vs. no in-kernel MPIC emulation. Function profiling (trace_stat) shows that in the second case there's a far greater time spent in kvm_handle_exit (100x), but overall, the maximum latencies for preemptirqsoff don't look that much different.

Here are the max numbers (preemptirqsoff) for the 24 CPUs, on the host RT Linux, sorted in descending order, expressed in microseconds:

In-kernel MPIC QEMU MPIC
3975 5105
2079 3972
1303 3557
1106 1725
447 907
423 853
362 723
343 182
260 121
133 116
131 116
118 115
116 114
114 114
114 114
114 99
113 99
103 98
98 98
95 97
87 96
83 83
83 82
80 81

I'm not sure if this captures openpic behavior or just scheduler behavior.

Anyways, I'm pro adding the openpic raw_spinlock conversion along with disabling the in-kernel MPIC emulation for upstream. But just wanted to catch up with this last request from a while ago.

Do you think it would be better to just submit the new patch or should I do some further testing? Do you have any suggestions regarding what else I should look at / how to test?

Thank you,
Bogdan P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Konstantin Khlebnikov: "Re: [PATCH v2 3/4] mm, shmem: Add shmem resident memory accounting"
Previous message: Andreas Gruenbacher: "[RFC 02/39] uapi: Remove kernel internal declaration"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]