Re: [RFC][PATCH 00/16] sched: Core scheduling
From: Peter Zijlstra
Date: Fri Feb 22 2019 - 09:20:45 EST
On Fri, Feb 22, 2019 at 01:17:01PM +0100, Paolo Bonzini wrote:
> On 18/02/19 21:40, Peter Zijlstra wrote:
> > On Mon, Feb 18, 2019 at 09:49:10AM -0800, Linus Torvalds wrote:
> >> On Mon, Feb 18, 2019 at 9:40 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >>>
> >>> However; whichever way around you turn this cookie; it is expensive and nasty.
> >>
> >> Do you (or anybody else) have numbers for real loads?
> >>
> >> Because performance is all that matters. If performance is bad, then
> >> it's pointless, since just turning off SMT is the answer.
> >
> > Not for these patches; they stopped crashing only yesterday and I
> > cleaned them up and send them out.
> >
> > The previous version; which was more horrible; but L1TF complete, was
> > between OK-ish and horrible depending on the number of VMEXITs a
> > workload had.
> >
> > If there were close to no VMEXITs, it beat smt=off, if there were lots
> > of VMEXITs it was far far worse. Supposedly hosting people try their
> > very bestest to have no VMEXITs so it mostly works for them (with the
> > obvious exception of single VCPU guests).
>
> If you are giving access to dedicated cores to guests, you also let them
> do PAUSE/HLT/MWAIT without vmexits and the host just thinks it's a CPU
> bound workload.
>
> In any case, IIUC what you are looking for is:
>
> 1) take a benchmark that *is* helped by SMT, this will be something CPU
> bound.
>
> 2) compare two runs, one without SMT and without core scheduler, and one
> with SMT+core scheduler.
>
> 3) find out whether performance is helped by SMT despite the increased
> overhead of the core scheduler
>
> Do you want some other load in the host, so that the scheduler actually
> does do something? Or is the point just that you show that the
> performance isn't affected when the scheduler does not have anything to
> do (which should be obvious, but having numbers is always better)?
Well, what _I_ want is for all this to just go away :-)
Tim did much of testing last time around; and I don't think he did
core-pinning of VMs much (although I'm sure he did some of that). I'm
still a complete virt noob; I can barely boot a VM to save my life.
(you should be glad to not have heard my cursing at qemu cmdline when
trying to reproduce some of Tim's results -- lets just say that I can
deal with gpg)
I'm sure he tried some oversubscribed scenarios without pinning. But
even there, when all the vCPU threads are runnable, they don't schedule
that much. Sure we take the preemption tick and thus schedule 100-1000
times a second, but that's managable.
We spend quite some time tracing workloads and fixing funny behaviour --
none of that has been done for these patches yet.
The moment KVM needed user space assist for things (and thus VMEXITs
happened) things came apart real quick.
Anyway, Tim, can you tell these fine folks what you did and for what
scenarios the last incarnation did show promise?