Re: [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall\

From: Peter Zijlstra
Date: Mon Sep 25 2017 - 04:58:49 EST

Next message: Quentin Schulz: "Re: [RFC PATCH 4/7] power: supply: axp20x-battery: support AXP803"
Previous message: Chanho Min: "[PATCH] mmc: core: add driver strength selection when selecting hs400es"
In reply to: Marcelo Tosatti: "Re: [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall\"
Next in thread: Thomas Gleixner: "Re: [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall\"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, Sep 24, 2017 at 11:22:38PM -0300, Marcelo Tosatti wrote:
> On Fri, Sep 22, 2017 at 03:01:41PM +0200, Peter Zijlstra wrote:
> > On Fri, Sep 22, 2017 at 09:40:05AM -0300, Marcelo Tosatti wrote:
> >
> > > Are you arguing its invalid for the following application to execute on
> > > housekeeping vcpu of a realtime system:
> > >
> > > void main(void)
> > > {
> > >
> > > submit_IO();
> > > do {
> > > computation();
> > > } while (!interrupted());
> > > }
> > >
> > > Really?
> >
> > No. Nobody cares about random crap tasks.
>
> Nobody has control over all code that runs in userspace Peter. And not
> supporting a valid sequence of steps because its "crap" (whatever your
> definition of crap is) makes no sense.
>
> It might be that someone decides to do the above (i really can't see
> any actual reasoning i can follow and agree on your "its crap"
> argument), this truly seems valid to me.

We don't care what other tasks do. This isn't a hard thing to
understand. You're free to run whatever junk on your CPUs. This doesn't
(much) affect the correct functioning of RT tasks that you also run
there.

> So lets follow the reasoning steps:
>
> 1) "NACK, because you didnt understand the problem".
>
> OK thats an invalid NACK, you did understand the problem
> later and now your argument is the following.

It was a NACK because you wrote a shit changelog that didn't explain the
problem. But yes.

> 2) "NACK, because all VCPUs should be SCHED_FIFO all the time".

Very much, if you want a RT guest, all VCPU's should run at RT prio and
the interaction between the VCPUs and all supporting threads should be
designed for RT.

> But the existence of this code path from userspace:
>
> submit_IO();
> do {
> computation();
> } while (!interrupted());
>
> Its a supported code sequence, and works fine in a non-RT environment.

Who cares about that chunk of code? Have you forgotten to mention that
this is the form of the emulation thread?

> Therefore it should work on an -RT environment.

No, this is where you're wrong. That code works on -RT as long as you
don't expect it to be a valid RT program. -RT kernels will run !RT stuff
just fine.

But the moment you run a program as RT (FIFO/RR/DEADLINE) it had better
damn well be a valid RT program, and that excludes a lot of code.

> So please give me some logical reasoning for the NACK (people can live with
> it, but it has to be good enough to justify the decreasing packing of
> guests in pCPUs):
>
> 1) "Voodoo programming" (its hard for me to parse what you mean with
> that... do you mean you foresee this style of priority boosting causing
> problems in the future? Can you give an example?).

Your 'solution' only works if you sacrifice a goat on a full moon,
because only that ensures the guest doesn't VM_EXIT and cause the
self-same problem while you've boosted it.

Because you've _not_ fixed the actual problem!

> Is there fundamentally wrong about priority boosting in spinlock
> sections, or this particular style of priority boosting is wrong?

Yes, its fundamentally crap, because it doesn't guarantee anything.

RT is about making guarantees. An RT program needs a provable forward
progress guarantee at the very least. It including a priority inversion
disqualifies it from being sane.

> 2) "Pollution of the kernel code path". That makes sense to me, if thats
> whats your concerned about.

Also..

> 3) "Reduction of spinlock performance". Its true, but for NFV workloads
> people don't care about.

I've no idea what an NFV is.

> 4) "All vcpus should be SCHED_FIFO all the time". OK, why is that?
> What dictates that to be true?

Solid engineering. Does the guest kernel function as a bunch of
independent CPUs or does it assume all CPUs are equal and have strong
inter-cpu connections? Linux is the latter, therefore if one VCPU is RT
they all should be.

Dammit, you even recognise this in the spin-owner preemption issue
you're hacking around, but then go arse-about-face 'solving' it.

> What the patch does is the following:
> It reduces the window where SCHED_FIFO is applied vcpu0
> to those were a spinlock is shared between -RT vcpus and vcpu0
> (why: because otherwise, when the emulator thread is sharing a
> pCPU with vcpu0, its unable to generate interrupts vcpu0).
>
> And its being rejected because:

Its not fixing the actual problem. The real problem is the prio
inversion between the VCPU and the emulation thread, _That_ is what
needs fixing.

Rewrite that VCPU/emulator interaction to be a proper RT construct.

Then you can run the VCPU at RT prio as you should, and the guest can
issue all the VM_EXIT things it wants at any time and still function
correctly.

Next message: Quentin Schulz: "Re: [RFC PATCH 4/7] power: supply: axp20x-battery: support AXP803"
Previous message: Chanho Min: "[PATCH] mmc: core: add driver strength selection when selecting hs400es"
In reply to: Marcelo Tosatti: "Re: [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall\"
Next in thread: Thomas Gleixner: "Re: [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall\"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]